Google and MIT building AI that can see and hear
Mon 26 Jun 2017

The Massachusetts Institute of Technology (MIT) and Google have given artificial intelligence another advancement towards independence in a new development together.
Researchers have created separate algorithms which have the ability to use sight, sound and text to communicate with humans and interact in different environments more effectively.
In two public papers, the researchers at Google said: “Deep learning yields great results across many fields, from speech recognition, image classification, to translation. We present a single model that yields good results on a number of problems spanning multiple domains.”
The researchers also speculate that with the algorithms’ new-found abilities, they have the potential of teaching other AI algorithms communication skills using deep learning, in future research studies in the field with little human interference.
An MIT research paper noted: “The goal is to create representations that are robust in another way: we learn representations that are aligned across modality. We believe aligned cross-modal representations will have a large impact in computer vision because they are fundamental components for machine perception to understand relationships between modalities.”
The purpose of the experiments was to see if the algorithms could identify and communicate using human senses.
The experiments were conducted in simple stages so that the algorithms knew clearly what was being instructed whilst identifying various senses and how they could respond to them appropriately in a real-life scenario.
A common example was the AI’s capability of identifying and deconstructing various forms of information it was given so that it could react accordingly. These included sounds and images of vehicles, animals and people with descriptions of their appearances and actions being committed.
The two separate studies adopted similar but different approaches to the research. Google chose to concentrate on AI translation between languages, whereas MIT researched how the AI could construct sentences.
Google researchers expalined: “We believe that this treads a path towards interesting future work on more deep learning architectures, especially since our model shows transfer learning from tasks with a large amount of available data to ones where the data is limited.”
Recently AI researchers at Facebook incidentally discovered that algorithms were able to communicate in a machine-type language until it was rectified back to human. At present, AI does not have the ability to perform more than one of the five human senses in practical tests. But it is hoped that the experiments conducted by MIT and Google have driven forward these next stages of AI development.