The Stack Archive

Google’s hive-mind robot arms learn to negotiate a cluttered world

Wed 9 Mar 2016

Getting robots to pick up objects with the same dexterity and success-rate as a five-year-old child is no minor challenge in the development of flexible automation systems. Amazon, in its determination to end controversies about working conditions (by emptying its warehouses of all humans), is already conducting extensive research into the problem of getting auto-picker robots to identify target objects in a cluttered environment.

Predictably the best-funded tech research entity on the planet is no laggard in this area. A new paper [PDF] led by Google research scientist Sergey Levine details his team’s attempts to leverage a convolutional neural network in order to teach robots how to grasp objects in unpredictable and unordered environments.

The paper, entitled Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection, outlines a lab experiment where 14 robot arms spent thousands of hours attempting to grasp and retain various objects in trays – without the traditional pre-programming of environment variables which makes for smooth automation processes in predictable and ‘baked’ environments such as automotive production lines. Levine writes:

‘At a high level, current robots typically follow a sense-plan-act paradigm, where the robot observes the world around it, formulates an internal model, constructs a plan of action, and then executes this plan. This approach is modular and often effective, but tends to break down in the kinds of cluttered natural environments that are typical of the real world. Here, perception is imprecise, all models are wrong in some way, and no plan survives first contact with reality.’

The objective of the 14-robots experiment is to use feedback from the individual robots’ grasping attempts to inform and develop predictive dexterity in the same way that human children do between the age of 1 and 4 years. The goal is the AI equivalent of hand-eye coordination via the accrued knowledge from continuous feedback regarding the efficacy – or lack thereof – of various previous approaches. In one case it took 800,000 grasp attempts, representing over four months of continuous robotic activity, to develop what Levine describes as ‘the beginnings of intelligent reactive behaviors’.

With such exacting progress times in play, it’s easy to see why the multiple arm approach is necessary, simply to speed up the production of grasping data for the neural network to assimilate.

Levine comments ‘The robot observes its own gripper and corrects its motions in real time. It also exhibits interesting pre-grasp behaviors, like isolating a single object from a group. All of these behaviors emerged naturally from learning, rather than being programmed into the system.’

The work builds partly on 2015 research by Lerrel Pinto and Abhinav Gupta, during which robots undertook 50,000 grasping attempts over 700 hours using ‘one shot grasping’, or open-loop grasp selection, which attempts to evaluate conditions instead of considering the results of previous attempts – and leads to a 34% failure rate.

The continuous feedback method used by Levine’s group cuts the failure rate in half and produces beneficial ancillary corrections into the process.

Levine argues that ‘instead of choosing the cues by hand, we can program a robot to acquire them on its own from scratch, by learning from extensive experience in the real world.’


Google news research robotics
Send us a correction about this article Send us a news tip