Many small and medium businesses don’t have the luxury of big data. They need the tools to generate rational and informed insight from small data sets. Unless these small volumes can be exploited there is a risk that AI will fail to deliver on its promise
The Stack recently spoke Dario Garcia-Gasulla, senior researcher of the High Performance Artificial Intelligence research group at Barcelona Supercomputing Center, to explore why the AI and Big Data partnership is just scratching the surface.
Why big data is just the tip of the iceberg
Thanks to exponential improvements in computational power, storage, and the host of sensing devices available on the cheap, we have never had so much data at our disposal. Whether signals from sensors in smartphones and industrial equipment, photos and videos snapped from our mobile cameras, or the data deluge from social media, data is big and it’s getting bigger.
IBM famously estimated that 90% of all the digital data in the world was created within the last two years. That prediction is itself five years old. To all those involved in the artificial intelligence and particularly machine learning – the subset of artificial intelligence that examines and compares large data sets to find common patterns – the big data age is a cause celebre. Data is used to test and refine algorithms, which generate yet more data to test and refine, and so on in a virtuous cycle.
Feed in a torrent of images of benign and dangerous skin legions, and with some testing, you can develop AI that is near-perfect at detecting skin cancer. So perfect that it outperforms dermatologists at staggering rates. By the time an application is achieving these results, it could have digested tens of millions of images.
Good things come in small packages
But, from start to finish this can mean a lengthy, difficult, and expensive process. For starters, data has to be sourced, scrubbed, and seasoned to make it palatable to machine learning algorithms. After rounds and rounds of testing, these expensive projects can often fail to deliver results. Many researchers simply lack the time and money to source data. It may seem hard to believe, but biologists working at the cellular level have to outline the borders and structure of cells by hand.