How sports and movie commentaries can speed up the development of artificial intelligence
Wed 25 Nov 2015
A small team of Indian researchers have leveraged the country’s love of the national sport to consider how cricket commentaries could teach neural networks to understand what’s actually happening in a cricket match.
One of the core challenges that recurs in current scientific research using neural networks is the extent to which computers can usefully learn on their own. It’s a subject that crops up frequently in research around self-driving vehicle AIs, which still rely on manually tagged or annotated databases, yet which need to learn to interpret information in a meaningful and useful way, in real time.
In the field of image recognition, and most specifically in video recognition, manual mark-up and annotation is labour-intensive and prone to produce deceptive results. However a growing number of researchers are beginning to consider the possibilities for neural networks to repurpose commentary which has been produced for other reasons, such as the text versions of commentaries for sports events and scene-descriptive (Closed Caption) commentaries for movies.
In Fine-Grain Annotation of Cricket Videos [PDF], a research group led by Rahul Anand Sharma at Hyperabad builds on previous work which has looked at using commentaries for the purposes of event-detection in sports [PDF], and similar ‘weak supervision’ based on other forms of commentary. The group are attempting to teach a neural network to make fine-grained distinctions regarding the content of a cricket video, rather than simply identifying events in themselves.
‘[Current] visual recognition solutions have only seen limited success towards fine-grain action classification. For example, it is difficult to automatically distinguish a “forehand” from a “half-volley” in Tennis. Further, automatic generation of semantic descriptions is a much harder task, with only limited success in the image domain.’
Cricket presents an unusual scenario in using sports footage to train computers, since a relatively short amount of actual ‘action’ occurs in a video that might last hours. The pre-supplied annotation to cricket videos, which may represent only four pages of annotation in hours of actual play, allowed the team to build a retrieval system capable of searching across hundreds of hours of video content to individuate specific actions of only a few seconds’ duration. The group was able to generate a large body of fine-grain-labelled videos capable of being used in themselves to train further NN-based action recognition projects.