Unjamming music with convolutional neural networks
Wed 18 Nov 2015
Since annual turnover in the music industry stands at an estimated $130bn, Music Information Retrieval (MIR) is a growing branch of academic interest. The need for better-targeted music recommendations alone fuels research because of the commercial interest in recommender systems such as Pandora (powered by the Music Genome Project), but the field also takes in automatic music transcription, automatic categorisation and track separation for remixes where the master tape is missing (and for karaoke music extraction).
Interest in Machine Hearing led San Diego’s University of California to establish a Computer Audition Laboratory, which has conducted significant research in database generation for the purposes of music analysis, as well as research into similarity of melodies between songs, the commercial aspect of which is obvious.
But little MIR research has used neural networks, relying instead largely on metadata, human categorisation and semantic features. A new paper from New York University entitled Automatic Instrument Recognition In Polyphonic Music Using Convolutional Neural Networks approaches the problem of identifying instruments by turning a convolutional neural network to the task of musical instrument identification in polyphonic music.
It’s a challenge worthy of a CNN, since not only are certain instruments (such as bassoon, clarinet, flute, guitar, piano, cello and violin) easier to individuate than others, but variations in mix and style combine with line-blurring elements such as attack and sustain to obscure the instrument being targeted by the analysis.
In common with previous non-CNN MIR research, the authors limited themselves to 26162 one-second music clips derived into (musically) meaningless compilations from 122 tracks, and used 80% of the clips for training, with the remaining percentage for testing – and found that the neural network approach was able to exceed identification rates previously achieved by different methodologies. The researchers confirmed the improvement in results by using a random forest/logistic regression approach, a domain-based vector already used in similar prior research.
In common with other researchers turning CNNs to familiar tasks, the research group were not able to establish exactly why the convolutional neural network was able to achieve better results than non-CNN techniques, despite having set up the criteria and training sets themselves. In any case the code is available at Github.
This particular project proceeds from and utilises MedleyDB, from the Center for Digital Music at Queen Mary University, an established musical database wherein metadata and human-labelled categorisation is present. The eventual aim is to establish adequate accuracy to enable instrument identification in uncategorised input.