DeepMind forges new AI that churns out reinforcement learning algorithms
Written by James Orme Tue 21 Jul 2020

“Meta-learning” framework could dramatically accelerate the process of discovering new reinforcement learning algorithms
DeepMind researchers have developed a new AI technique that generates reinforcement learning algorithms by interacting with environments.
In a study posted on pre-print server Arxiv.org, researchers from the innovative AI firm claimed the algorithms were a dab hand at some of Atari’s most complex games, suggesting the technique could be used to discover generalisable reinforcement learning algorithms from data alone.
Reinforcement learning
Reinforcement algorithms are some of the sharpest tools in modern AI and responsible for several significant breakthroughs in the field.
They effectively enable software agents to learn from their surroundings by trial and error to discover the rules that can help achieve certain objectives, for instance points won in a game over many moves.
Such rules are usually discovered manually through years of research. Which is why DeepMind wanted to investigate if they could coin a more efficient approach – one that automated the discovery of update rules from data generated from interactions with a set of environments.
In theory, this could not only lead to more efficient algorithms but ones that are better adapted to specific contexts — and move us one step closer to general-purpose reinforcement learning algorithms, the researchers said.
To achieve this DeepMind created a “meta-learning framework” that simultaneously discovered what an agent should predict and how those predictions could be used to improve performance.
This involved crafting a loose architecture, which they call Learned Policy Gradient (LPG), which does dictate vector semantics but allows the update rule to decide what vectors should predict.
The framework then identifies a meta-update rule from a set of learning agents which have been let loose in different environments.
The researchers tested LPG on complex Atari games, finding it could adapt to several games “reasonably well” compared with existing algorithms. On top of that, it achieved “super-human performance” on 14 games.
It’s still early days, however. DeepMind said due to the data-driven nature of the proposed approach, the resulting algorithm may capture unintended bias in the training set of environments.
“In our work, we do not provide domain-specific information except rewards when discovering an algorithm, which makes it hard for the algorithm to capture bias in training environments, ” they said. “More work is needed to remove bias in the discovered algorithm to prevent potential negative outcomes.”
Nevertheless, the researchers said it may be feasible to discover a general-purpose reinforcement learning algorithm once a larger set of environments were available for meta-training.
Written by James Orme Tue 21 Jul 2020