Cracking the citizen: a warning from South Korea about National IDs
Wed 30 Sep 2015
The ‘Queen of re-identification’, Harvard Professor Latanya Arvette Sweeney, has just published an interesting set of findings regarding the vulnerability of the system that South Korea uses to uniquely identify its 50+ million citizens – and believes that those proposing new ‘Citizen ID’ systems in the United States and elsewhere should take note of the fact that she was able to de-cloak a complete set of 23,163 supposedly encrypted South Korean sample IDs using two totally different methods.
Sweeney, also Chief Technologist at the Federal Trade Commission, analysed the logical structure of the South Korean Resident Registration Number, which is a cipher for personal information about the individual, and successfully de-anonymised the IDs, first by human-guided logical reasoning and computer analysis of data collated into an Excel spreadsheet, and secondly through pure machine analysis of identified traits, patterns and relationships within the Resident Registration Number (RRNs). Since all encrypted information conforms to a uniform structure in the cipher system the deanonymisation was 100% successful for the test data.
Sweeney described the encrypted RRNs as ‘vulnerable to almost any adversary’ – and is based on systems and techniques both in use and proposed for use in future ‘citizen ID’ systems in the United States and beyond.
The research was conducted on prescription data where the RRN is intended to shield the subject from being specifically identified. The system under scrutiny is modelled on one used by U.S.-based multinational IMS Health, which collates data on millions of (living) South Koreans.
Sweeney has also published another related set of findings regarding how U.S. states’ obligation to commercialise patient data is exposing patients to deanonymisation – a field of research where she has made significant waves over the last 15 years:
‘The State of Washington sells a patient-level health dataset for $50. This publicly available dataset contained virtually all hospitalizations occurring in the state in a given year, including patient demographics, diagnoses, procedures, attending physician, hospital, a summary of charges, and how the bill was paid. It did not contain patient names or addresses (only five-digit ZIPs, which are U.S. postal codes). Newspaper stories printed in the state for the same year that contain the word “hospitalized” often included a patient’s name and residential information and explained why the person was hospitalized, such as a vehicle accident or assault. A close analysis of four archival news sources focused on Washington State activities from a single searchable news repository studied uniquely and exactly matched medical records in the state database for 35 of the 81 news stories found in 2011 (or 43 percent), thereby putting names to patient records.’
Independent third party sources were able to verify Sweeney’s claims about the deanonymisation technique.
Not all countries use a citizen ID system that in itself contains any information, encrypted or otherwise. In the UK the nine-character National Insurance Number serves, in the controversial absence of a more general national identity scheme, not only to gain access to the National Health Service, but also for employment and tax purposes, and has run in the same format since 1948. Its only use to potential hackers is as a reference to government-held databases, which would have to be separately breached.
South Korea already knew that it had a problem with the security of the RRN scheme after 35 million RRN numbers were compromised in a massive data breach via social network vulnerabilities. The government’s solution, the commercial third-party I-Pin scheme, was also hacked earlier this year, with 750,000 citizens’ details disclosed.
A study [PDF] by the RAND Corporation earlier this month posits that the security risks associated with developing a National Health Information Network (NHIN) which uses a unique patient identifier (UPI) are offset by the advantages.
Professor Sweeney points out that states, as guardians of federal information, are not covered by the 1996 Health Information Portability and Accountability Act (HIPAA), which specifies how doctors, insurers and hospitals can share a patient’s information, and are free to treat that data in any way they see fit: ‘only three states that share statewide hospital data do so in a manner that adheres to HIPAA guidelines; the other 30 do not.’
The RAND researchers are less circumspect than Sweeney in regard to the security risks of new federal ID schemes,:
‘If the UPI were to facilitate the development of a more efficient national network, any potential negative effects of such a network could be ameliorated directly through other aspects of systems architecture, such as encryption, access controls, and audit trails. And use of a UPI would actually improve privacy by limiting the transmission of more sensitive identifiers, such as the combination of names, address, date of birth, and Social Security numbers.’
But the RAND paper admits that current HIPAA legislation is inadequate for the security needs of a new NHIN, opining that ‘strengthening HIPAA rules, not patient identification schemes, should be at the center of the national debate.’