Query brokers: SMCQL allows queries in a zero trust environment
Thu 23 Jun 2016

In a political and business environment that’s currently obsessed with encryption, would it not be useful if query data could pass through networks and alien environments without being of value to anyone who might be trying to steal it – even if they were able to obtain it?
Technically Andy Yao’s Garbled Circuit theory, first proposed in 1986, makes this possible. The classic example for the theory is that of two millionaires, each of which wants to know who is the richer party. Both parties submit their net worth into a joint query via a garbled circuit, which obscures the source data and allows each of the millionaires to extract the information as to whether the other possesses $10 million or less – and the result will be all that can possibly be deduced from a calculation which involved two critically sensitive pieces of information.
In practice, Secure Multiparty Computations (SMC) run over garbled circuits have always run exponentially slowly according to the complexity (i.e. security) of the circuit.
Now a team from Northwestern University have devised a scheme which allows a Private Data Network (PDN) to make garbled circuit queries feasible via the invention of a new SQL-style language, SMCQL, which it will shortly release to open source.
The applications of a workable PDN of this nature is estimated by the team to include data brokers, medical research, human rights data, banking and online advertising. Most of these areas have traditionally required either the development of ‘enforced trust’ via NDAs or other legal or framework-based recourse, or else the often-problematic anonymization of data.
But the Northwestern researchers have brought garbled circuit-based query transactions into the realm of workable processing systems by its query rewrite framework SMCQL, and also by slicing the data, which allows parallelisation and reduced complexity in the SMC circuits.
Additionally the system employs Oblivious RAM (ORAM), which automatically garbles the data’s bits during access, shuffling the array at each read/write. Effectively this latter technique makes all memory accessing appear identical to potential hackers, with no patterns to evaluate and exploit.
The researchers tested the PDN using de-identified medical records from the HeathLNK data repository, which covers seven different healthcare institutions in the Chicago area. The PDN was able to run high performance queries thanks to the fine-grained partitioning of data resolution which SMCQL makes possible.
The Healthcare PDN proved to run fastest with data-slicing, which reduces the complexity of the model’s security, but the team report only ‘modest’ slow-downs with this feature disabled.
The team intends to investigate similar PDN systems with three or more data sources.