UK gov: AI growth could reidentify individuals from big data
Wed 9 Nov 2016
A new report from the UK’s Government Office For Science warns that the explosive growth in artificial intelligence, driven by Big Data, could mean that anonymised individuals in datasets might be extraordinarily easy to re-identify, due to the interlinking of vast, semi-supervised systems.
In the report, entitled Research and analysis: Artificial intelligence: an overview for policy-makers, the UK Government’s Chief Scientific Adviser Professor Sir Mark Walport warns that new developments and scaling of AI-driven frameworks permit the inference of private information from public (i.e. anonymised) data:
‘The Information Commissioner’s anonymisation code of practice sets out ways for organisations to manage these risks and prevent the re-identification of individuals from combined anonymised data. As the volume of publically available data increases, however, and more powerful artificial intelligence techniques are developed, what was a ‘remote’ chance of re-identification may become more likely, and organisations will periodically need to revisit the protection they have in place.’
The risk is new because of the accelerating pace of research, but the problem – and its implications for privacy – is very old. Twenty years ago, then-graduate student Latanya Arvette Sweeney – whose career would be defined by the topic of re-identification – succeeded in identifying Massachusetts Governor William Weld from a supposedly anonymised healthcare dataset supplied by the U.S. government’s Group Insurance Commission (GIC).
Sweeney didn’t have the advantage of AI-driven algorithms – but then, that might have been overkill; all that was needed was $20 to purchase the voter rolls from the City of Cambridge, MA, and the patience to match up the three remaining pieces of information that survived the anonymisation process in both the GIC data and the electoral register: zip code, birth date and sex.
It would take an improbable level of forensic will and resources to likewise match up GPS coordinates from anonymised search results histories and satnav data dumps, particularly if one was seeking to re-identify a particular individual; but it’s a casual task for a vector machine trawling and collating NoSQL datasets that are so large as to be not considered ‘human readable’.
Today’s government report tempers this administration’s consistent enthusiasm about the possibilities of the AI-enabled society with cautionary notes about the need for new ethical frameworks and approaches to the integration of AI into government and private structures, and suggest that AI ‘is not a replacement, or substitute for human intelligence. It is an entirely different way of reaching conclusions.’
Additionally, it suggests that government analysts wishing to implement AI solutions should begin their experiments in sandbox environments, where the technology can be explored without harm. It also promotes the current interest in developing codes of conduct for artificial intelligence use.
The report further advises that a chief executive or senior stakeholder be held accountable for the behaviour of artificial intelligence within their purview:
‘Despite current uncertainty over the nature of responsibility for choices informed by artificial intelligence, there will need to be clear lines of accountability. It may be thought necessary for a chief executive or senior stakeholder to be held ultimately accountable for the decisions made by algorithms. Without a measure of this sort it is unlikely that trust in the use of artificial intelligence could be maintained.’