This blog post was originally published by Bitfury. It is reprinted here with the permission of Bitfury.
Each of us has been giving away our valuable data, for free, for more than 15 years. We do not have any real control of this personal data, nor how it is being processed, or why, or by whom. We traded away this data in exchange for services like search engines, webmail and software. This brought life-changing web applications to everyone (at little to no cost) and democratized global access to information. It also drove the financial success of companies like Google and Facebook — and still does to this day. Data, unlike any other valuable asset, can be collected, stored, copied, distributed, used and reused at a negligible cost. There is not a single physical asset that can create this level of value at that price point. These companies are, understandably, loathe to lose it.
A public reckoning is now taking place about whether this tradeoff has been worth it and legislators around the world are adopting stronger data protection laws, such as the GDPR, every year in response. We are learning more about how our data is collected, stored, traded — and sometimes abused. Masses of data can be used to extract valuable patterns, new discoveries and more. This can be used for something as benign as advertising, but bad actors can also use it to manipulate ideas and facts, influence geopolitical events and invade personal privacy.
Where do we go from here? I believe we should look to artificial intelligence and blockchain.
There are several exciting advancements in applied cryptography, computing, machine and deep learning technologies that can help us share our data freely while also protecting our privacy. They will help us control our data, anonymize it, process it in privacy-centric ways, and control who can access it. In this way, we could all decide individually whether to sell our data (possibly in exchange for services or money) or keep it private.
The first advancement is Fully Homomorphic Encryption, or FHE. FHE is the “holy grail” of cryptography and data science; This kind of encryption allows for information to be extracted from fully anonymized data. The encrypted output of the model is returned to the data owner, who can decrypt it using a private key. This means that user privacy (and ownership of data) is protected and user can decide to whom grant the access of the extracted information. The idea behind FHE is simple: Data is first encrypted at its source (for example, in a smart-car, or at a doctor’s office) and then is processed through some advanced machine learning techniques to extract information from it. In machine learning, FHE could be used to encrypt machine learning model parameters while still guaranteeing its integrity. This would allow to create decentralized marketplaces where businesses trade their models.
This technology is still evolving, but the demand for private data processing will likely accelerate its time to market.
Second is functional encryption, or FE. Similar to FHE, this is a kind of encryption that enables some information to be extracted from encrypted data, if you are given a private key (or password). The person with this key can learn specific characteristics of the data (for example, the results of a lung scan), but can see nothing else (like the patient’s name). FE is not yet mainstream, but it is the subject of a large study by the European Union that will evaluate the future of the technology. (If you are interested in reading more about the technicalities of FHE and FE, I recommend reading this paper from MIT.)
Another interesting approach to data privacy is federated learning. The most prominent use of this is by Google on smartphones. Google runs a learning algorithm on devices to understand usage and to improve customer experience. Once this data is collected, “noise” is injected that effectively anonymizes the data. Noise is “meaningless data” that will not interfere with actual data but will obscure private details. It is then sent to the cloud, combined with the data from other phones, and then processed to remove the noise and elucidate patterns. This requires a lot of computational power to do at scale, however, so it has not been adopted widely. More information on this can be found here.
Complementary to federated learning is differential privacy, which allows for the collection of data in such a way that a specific contributor’s data cannot be extracted from the aggregated information (indeed, it is impossible to tell if a specific user has contributed to the aggregated information at all). Differential privacy has been adopted by Apple and Google.
Finally, blockchain can enhance all of these technologies. Blockchain can make the data immutable, securely track the access and transfer of the data, enforce smart contracts regarding its use, and distribute the royalties across the supply chain, from the “owner” of the data to the end-users (like businesses, medical researchers, governments, etc.).
It is important for us to not become complacent about the use of our data. By continuing to push for privacy and personal control of our data, we will increase the demand for these technologies. This will make it possible for researchers to advance them faster, for companies like Bitfury to bring them to market sooner, and for governments to make them the standard in the nearest future.
Your data belongs to you. You should be able to freely and safely use it as you wish. At Bitfury, it is our goal to make that your new reality.
Fabrizio Del Maffeo
Head of Artificial Intelligence, Bitfury