Daily Mail and CNN teach AI to read
Thu 18 Jun 2015
A team of researchers at Google’s DeepMind unit is developing [PDF] a technology to teach its artificial intelligence (AI) computing systems to read.
The scientists, led by Karl Moritz Hermann, are training the neural networks using articles published on online news sites CNN, and UK-based tabloid the Daily Mail, the most visited newspaper website in the world with an average of 11.7mn visitors per day.
Both online papers feature bullet point summaries of the ensuing article – a trait which the DeepMind team are using to build a database which the computer systems can analyse to test their understanding of an article by answering related comprehension questions.
“Of key importance is that these summary points are abstractive and do not simply copy sentences from the documents,” explained Hermann.
The AI system is able to use the article as raw data and the summarising notes as annotation. The database currently contains a total of 218,000 Daily Mail articles and 100,000 CNN pieces.
DeepMind has found that the neural systems can answer correctly 60% of questions posed to them. Hermann suggested that the AI machines are able to answer all queries that use a simple structure, while they struggle with more complex structures.
Critics argue that the style and formatting of the MailOnline and CNN online is very specific to the publication and differs greatly to other journalistic outputs, as well as non-journalistic copy.
London-based DeepMind was founded in 2011 and acquired by Google in January last year for $633mn (approx. £400mn).
“We combine the best techniques from machine learning and systems neuroscience to build powerful general-purpose learning algorithms,” reads the company’s mission statement.
Experts predict that AI computers and deep learning will soon affect a great range of industries, including retail, facial and voice recognition, and gaming, with which Google DeepMind has already displayed its capabilities playing the classic video game Space Invaders.