Responsible use of AI is poised to become a key business differentiator. Techerati spoke to Michael Natusch, global head of AI at Prudential, to learn about the benefits that come with responsible AI development
For smaller companies seeking to replicate the successes that tech giants have had with deep learning, it can be tempting to replicate Google, Facebook and co’s approach to training neural networks – feeding models with untold amounts of raw data. Lacking access to the billions of users from which these giants extract data, building image databases by scraping data may seem like the only option for companies wishing to join the AI brigade.
But why chart this ethically-questionable course, especially in the context of growing awareness of data privacy issues? Michael Natusch, global head of AI at Prudential says obtaining data with meaningful consent is a differentiator that brings in more clients and improves business outcomes.
Data sourcing
Assembling image-sets is the first stage in training neural networks. Organisations can either scrape this data from public sources online, without consent of the individuals to which the data relates, or source the data themselves first hand from individuals who provide active and meaningful consent.
Three common perceptions combine in a perfect storm to encourage organisations to take the first route: First, the perceived wisdom the accuracy of models is proportionate to the amount of data they are fed during the training phase; second, that it’s too time-consuming and costly for organisations to collect data themselves; third, that individuals are unwilling to provide data unless an organisation’s intent is disguised in tombs of T&Cs. Natusch takes aim at all three.
Be a spearfisher, not a trawler
First, relying on masses of scraped data is a lazy approach that can lead to counterproductive outcomes, Natusch says. Too much data can overfit models, producing spurious ‘accuracies’ that are irreflective of reality. If organisations are forced to source data themselves, the demands of efficiency also force them to be more thoughtful about the data required, resulting in models that produce more suitable recommendations, benefiting both organisations and their customers, he says.
“Rather than going on this big fishing exercise, collecting everything and shouting “yippee” if something comes out with a high correlation, I’m a big believer in being forced to think harder about what data really needs to go in [to a model] and why do I really want that data,” he says.
It is also not true that millions of images are required to train competent models. For most organisations, a market-ready application can be developed with 10,000 images, Natusch says.