Bringing local neural network power to Android via the GPU
Tue 18 Oct 2016
Anyone who peruses a large enough number of scientific papers about research into Convolutional Neural Networks (CNNs) can almost feel the scientists’ collective frustration at having to offload data analysis to the cloud because the mobile devices in use are underpowered and underspecced. While the world seems set to shed desktop devices as ballast in the mobile age, the advance of small-form device capacities seems set to lag years behind the possibilities that ‘on-board’ CNN could add to the mobile space – both in the consumer and scientific sectors.
When considering the possibilities of small form-factor drones, truly analytical local systems currently burdened with cloud lag, untethered VR and smartwatch applications, smart devices are currently just not smart enough.
In the desktop and server space there already exists a mature ecostructure of libraries and frameworks to take advantage of the agile and powerful chipset capacities of the GPU in service of deep learning. But fundamental architecture differences between desktop/server and mobile GPU hardware makes it hard or impossible to port these frameworks to the mobile space, except in the most abstract or feature-limited way: Caffe has an Android library; Torch likewise; and there are dedicated mobile CNN projects such as this Python-based Android framework and a facial-recognition repository facilitating local mobile CNNs.
However, none of these few offerings are able to leverage a mobile device’s formidable graphics processing capabilities; all are limited to the CPU.
Now researchers from Tehran and California have developed a mobile CNN library capable of increasing the speed of local deep learning algorithms, achieving 60 times the current available speed with an energy-saving factor of 130% over analogous setups on current mobile devices.
The library is called CNNdroid, an open-source GPU-accelerated resource which supports nearly all CNN layer types, provides compatibility with CNN models trained on tethered desktop environments such as Theano, and can be configured within the Android operating system without the need for any additional software, containers or interpreters.
The researchers evaluated CNNdroid on two devices – the Samsung Galaxy Note 4 and the HTC One M9. Interestingly one test, for AlexNet, demonstrated that the Galaxy Note 4 performed 30% higher than the HTC device, and, in light of recent events at the Samsung mobile camp, the paper’s comments on this are interesting: ‘This can be either the result of lower GPU frequency of HTC One M9 or its aggressive throttling policy in order to prevent overheating issues in long runtimes.’
The trained CNN model is transferred to the mobile device via an SD card, with CNN layers kept selectively in RAM in order to optimise performance. Limits on RAM usage are themselves restricted by Android 5’s prescriptions, which will not allocate more than 512mb to an app.
CNNdroid utilises a modern mobile GPU’s Shader Cores, which are composed of parallel Arithmetic Logic Units (ALUs) which can execute two 64-bit operations simultaneously. However, the setup must operate without an analogous desktop version’s advantage of shared memory in thread blocks, and Android’s RenderScript parallel computing platform does not offer thread synchronisation. But even without the advantage of Nvidia’s CUDA framework, CNNdroid seems an impressive advance that at least attempts to resolve a current conflict between two technological trends.