Can Privacy Enhancing Technologies (PETs) help us build a new data paradigm for a post-Covid economic recovery?
The pandemic has impacted our way of life and brought whole sectors of our economies to their knees. Covid-19 continues to be a bitter financial pill for many industries, but we must look forward and rebuild.
Leaders must now take a rigorously objective approach to economic recovery, analysing and making focussed decisions on where to invest based on good data. Governments and enterprises must restructure around sustainable, data-driven business models and invest in supporting data infrastructure.
I believe that data governance and Privacy Enhancing Technology (PETs) have a key role in unlocking the insights, and should be part of focussed investment leaders should be making today.
We need a data-driven forward path
Covid-19 has shown all of us why ‘following the science’ matters, and why leaders must remain rational in a global crisis and take a systematic approach to societal and economic solutions. Populism and subjectivity are dangerous in politics.
When leaders and projects draw on experts and use ‘the data’ effectively to inform public policies, fuel innovation and measure results, we collectively have a better chance of neutrality and impact. This in turn gives us the opportunity to move past glory projects, bias, sift fact from opinion and assess success on real impact. Such objectivity, when aligned with rapid decision making and focus might feel like tough love, but it’s an essential ingredient for future growth and survival.
“There is only one route out of this dilemma: data, knowledge and understanding. If countries can gather a more complete picture of the threat the world faces and the environment they are operating in, and if leaders can use this knowledge to make decisions more quickly and coherently, then a faster escape from the crisis is still possible.” (source: Chris Yiu, Tony Blair Institute for Global Change)
A data-driven approach does not mean that our human skills are redundant, to the contrary: there’s never been a greater need for critical thinking, for creativity, for emotional intelligence. When we mesh those human superpowers with transparent processes and trustworthy information, we arrive at real insight.
Insights: access to trusted data required
I have previously written about the compelling need to share data, and explored the many benefits of fairer access to and hurdles of data sharing governance.
However, for the purposes of this article we’ll focus on privacy risks associated with big data research and innovation and consider whether Privacy Enhancing Technologies (PETs), in combination with wider governance, could help us to share, connect, access and utilise data sets whilst preserving privacy, unlocking trusted research and innovation.
Are PETs the panacea for safe big data research?
First, let’s take a step back look at what Privacy Enhancing Technologies (PETs) actually are, and set them in some important data regulation context.
PET is a spectrum of technologies, ranging in sophistication and type, built on the premise that data is a resource with privacy, confidentiality, commercial and security risks associated in releasing its value, which PETs seek to mitigate. They are part of the technological solutions for safeguarding data’s sensitive attributes.
The intention of this article is not explore each PET in depth or analyse use cases, rather it seeks to offer a high level overview of a selection of these evolving technologies and then consider their role in supporting data-driven research and innovation. This overview comes with a few caveats: it’s not intended as a technical analysis of the viability of the PETs nor does it ascribe greater value to the ones selected compares to the PETs left out of the list.
With that in mind, let’s now look at some selected examples of PETs:
Homomorphic encryption
Whilst encryption is not new, homomorphic encryption is a nascent and technique which allows calculations to be performed on encrypted data which could offer enhanced protection by never showing personal data in plain text. This technique can enable analysis without revealing data, which is important for privacy. Let’s take a look at an example:
“Something as simple as looking for a coffee shop when you’re out of town reveals huge volumes of data with third parties as they help you satiate your caffeine craving — the fact that you’re seeking a coffee shop, where you are when you’re searching, what time it is and more. If homomorphic encryption were applied in this fictional coffee search, none of this information would be visible to any of third parties or service providers such as Google. In addition, they wouldn’t be able to see what answer you were given regarding where the coffee shop is and how to get there.” (Source)
Anonymisation
Anonymising data either encrypts or removes identifiable personal data from datasets and ‘anonymous data’ is a high bar under the GDPR. Pseudonymisation is a common data management practice which involves replacing personally identifying data with artificial identifiers.
87% of Americans can be identified by the combination of zip code, sex, and birth date. (Source)
There is an important distinction between pseudonymised and anonymous data; the former remains personal data under the GDPR given data subjects can be re-identified. Re-identification risk is thus an important consideration when evaluating Privacy Enhancing Technology.
PET for anonymising data include:
(Secure) Multi Party Computation is a cryptographic protocol that distributes a computation across multiple parties where no individual party can see the other parties’ data, so inputs are kept private.
Differential Privacy (DP) is a mathematical definition of privacy in the context of statistical and machine learning analysis. DP is a cryptographic algorithm which adds a “statistical noise” layer to the dataset which protects privacy. The video below explains DP and cites a useful example from Netflix’ linkage attack which illustrates ‘Why Anonymous data sometimes isn’t’ (source) due to the mosaic effect of linking seemingly anonymous information.
K-anonymity is a privacy model which also aims to address re-identification risk heightened by linking data sets. It describes a property that types of anonymised data possess; thus “data is said to have the k-anonymity property if the information for each person .. cannot be distinguished from at least k-1 individuals whose information also appear in the release”. (Source). For k-anonymity to be achieved, there need to be at least k individuals in the dataset who share the set of attributes that might become identifying for each individual, which makes the volume of data significant.
Synthetic data
In the context of privacy and data science, synthetic data refers to simulated data generated by an algorithm which can then train a machine learning model. “Training a model on synthetic data and then applying it to real, encrypted data has several advantages: it allows a better understanding of the relationship between the training data and the model, and a minimisation of the use of sensitive data.” (Source).
For data researchers, synthesising data offers the opportunity to achieve greater utility balance with privacy, when compared to anonymising data. This is because:
Synthetic data generators (SDG) use algorithms to generate data that preserves the original data’s statistical features while producing entirely new data points. SDGs offer a naturally private way to generate high-quality data. Among other benefits, they enable users to share data, to work with data in safe environments, to fix structural deficiencies in data, to increase the size data, and to validate machine learning system by generating adversarial scenarios. (Source).
Read more in ‘The Alan Turing Institute’s Guide ‘Synthetic data generation for finance and economics’ here.
Conclusion
The potential of Privacy Enhancing Technologies (PETs) is phenomenal. This umbrella of technologies offer the promise of delivering data’s value whilst preserving its privacy; this could unleash societal, business, economic and health benefits through research and innovation, whilst protecting people’s data rights. This potential could help us deliver substantial benefits for our society and economies as we rebuild and recover post-Covid-19 informed by a data-driven response.
While that optimism, energy and drive to deliver better insights based on good data sharing is vital, it is important to state that there is no silver bullet in the space of big data risk management on the horizon. The regulations and privacy risks are complex, cyber threats continue to increase, scalable data governance remains immature, and data people skills needs investment.
The answer to safeguarding and yet sharing safe data flows and good data research remains a complex balancing act, and yet it is one we can and must solve as we rebuild and restructure around the new data paradigm.
Privacy Enhancing Technology is fast evolving and exciting domain and I believe each of the PETs touched upon may form an important piece of the good data use jigsaw that will help us harness data’s phenomenal potential to help us shape, build and predict the next decade.
Technology doesn’t hold all the answers, but when brought together with the best of human skills of analysis, creativity and curiosity it will help us to forge trusted frameworks for privacy-preserving yet useful innovation and research. Ultimately that will enable greater citizen control, transparency and accountability for big data processing. That’s why leaders, governments and funders should invest in building scaleable data governance capability and finding solutions to the privacy-utility conundrum.
Trace offer privacy and data governance solutions: our platform helps users comply with regulations like the GDPR and our professional ‘DPO’ services include data governance frameworks, data risk management, training and applied Privacy by Design.