Analysis

December 12, 2018

The brave new data-protected world

Europe’s General Data Protection Regime could open surprising new prospects for AI innovation on the continent


Carly Minsky

14 min read

In the near future, every individual will be part of a national project to generate artificial intelligence. Each individual will be paid for their contributions, and yet, they won’t have to make any active effort to participate. They will play their part just going about their everyday activities and interactions.

This prospect, if it were to manifest, could arise from the most unlikely of sources: a regime designed to protect individuals and their personal data. Running counter to complaints that Europe’s new General Data Protection Regulation will stifle technological innovation across the continent, a handful of optimistic entrepreneurs see a huge opportunity to boost Europe’s work in artificial intelligence, while empowering individuals to control their data.

The thesis is certainly arguable, with academics quick to point out potential stumbling blocks and the majority of venture capitalists unwilling to bet on Europe’s ability to compete with China and Silicon Valley on AI innovation. But it’s not merely hypothetical; at least two startups are already laying the foundations for what they believe will evolve into a sophisticated but responsible data market in Europe.

Advertisement

Hedge fund analyst-turned-startup-founder Ben Falk has been supported by both Entrepreneur First and GovTech accelerator Public.io to develop his company, Yo-Da. In the first phase, the company has delivered a tool for automating data subject access requests - the provision under GDPR which enables individuals to ask companies: what personal data is held on them; what the data is used for; which third-parties the data is shared with; and also allows an individual to request a copy of their data, and/or that it be deleted or transferred somewhere else.

The rudimentary service is a perfect example of the consumer-focused tools many predicted would pop up to help individuals exercise their data rights. It’s not revolutionary, but it facilitates what would otherwise be a onerous series of manual requests. It also collects valuable intelligence about a company’s GDPR compliance when it responds to data subject requests - information that internal compliance officers and regulators would like to get their hands on.

But behind the initial strategy are far grander ambitions: to redistribute the invaluable power over data, to advance data-intensive technology innovation in Europe, and to ensure individuals receive the benefits produced by their data.

Excerpt from Yo-Da’s Data Protection Bill of Rights

We believe that personal data is a systemically important raw material in the 21st century, and that by strengthening citizen control over their data, the efficiency and fairness of socioeconomic outcomes will improve.

We believe the current unequal distribution of data will only serve to reinforce current socioeconomic and political inequality as AI proliferates, and we seek to arrest that trend.

We believe in a world where artificial intelligence technology is ubiquitous, data is power.

Falk is not alone in seeing a new opportunity. Another UK-based startup, People.io, is also pitched as a service to give people control over the “access, use and value of their personal data” - a “firewall for people”. Both initiatives use GDPR as the springboard from which data control and value can be systematically returned to individuals, but the methods and the longer-term goals diverge.

People.io limits companies’ access to personal data by gameifying data input on the side of the individual and then offering companies direct targeted access to relevant individuals without supplying any personal data. Nothing in this model of data control suggests broader implications for artificial intelligence, like those referenced in Yo-Da’s manifesto.

The case for Europe’s AI advantage

It’s not immediately obvious how automating data subject access requests and selling compliance intelligence will enhance Europe’s technological innovation overall, but Falk is a visionary. Yo-Da’s GDPR tool is just the first essential step towards setting up what he calls ‘data agencies’ to represent individuals and maximise the value of their personal data.

With Yo-Da pre-emptively positioned as the go-to service for retrieving personal data from for-profit companies, it will evolve into the data-analogue of a publishing house, Falk says. The data agencies will leverage an economy of scope - that is, the ability to combine datasets which commercial rivals currently keep siloed - maximising the value of an individual’s personal data and increasing the utility and quality of data available on the open market.

Quite reasonably, the GDPR discourse so far has highlighted the challenges the regulation has created for companies using personal data to develop better products and services. In the long-run, Falk believes, the trade-off will be worthwhile not only for individuals but for all market participants.

Advertisement

“The GDPR may indeed impede business use of valuable personal data, but that's sort of the point of the legislation,” acknowledges Falk. “Our goal is to maximise consumer use of their own personal data, helping people derive value from the data that refers back to them. So we are approaching the problem differently.

“In the short-run, businesses will absorb much of the cost as we shift to a new, healthier ecosystem. In the long run, businesses will also benefit from a deeper, more efficient, and more liquid market for personal data.”

Creating intelligence requires a huge amount of resources, from hardware and computing power to training and testing data. Even after the successful production and deployment of a tool, it is only useful alongside operations to collect and feed-in relevant data. The true value of intelligence hinges on the power to derive valuable insights from information. And the degree of value in turn relies on the quality, relevance and scope of the data available.

In spite of the world-leading research in artificial intelligence coming out of Europe’s universities and academic institutions, it is largely discounted as real contender in the 'race' for AI supremacy. Kaifu Lee, venture capitalist, former president of Google China, and author of AI superpowers: China, Silicon Valley and the New World Order, told Sifted that Europe has none of the factors it needs to compete with China and the US, namely an effective entrepreneurial and venture capitalist ecosystem; a legacy of technology collaboration across networks; and huge technology companies which can quickly and effectively implement new AI tools and methods.

“I left out Europe [from public commentary] because I didn’t think there was a good chance for it to take even a bronze medal in this AI competition,” he explains.

Lee says that Yo-Da’s plan to take data control away from tech giants and create a new data market is “theoretically possible”, but unlikely. "I do believe it protects people better but I don’t believe it monetises better.” He is convinced that the likes of Google and Facebook are too big to fail - economic forces that European regulators and data activists couldn’t possibly reckon with. Notwithstanding GDPR, Lee rejects Falk’s ambitions to claw back data control from tech giants as ‘naive’, claiming "It’s a capitalistic world, Google and Facebook are money-printing machines.”

But this discounts the strength of Europe’s cultural shift towards personal data ownership and transparency, which in turn has been driving regulatory developments like GDPR and action against tech companies.

“As the race for AI nationalism accelerates, and Europe increasingly sees their personal data market as an advantage, we believe regulators will restrict the access Chinese AI companies have to European data sets in order to preserve that competitive advantage,” counters Falk.

Regulatory resistance

On one hand, data agencies would make life easier for data protection supervisory authorities, providing a smaller number of contact points to monitor all companies. But on the other hand, the European regulatory attitude on data protection is not particularly consistent with Yo-Da’s grand scheme.

The European data protection supervisor Giovanni Buttarelli describes GDPR as a “deliberate attempt to change incentives in the [personal data] market” but explains “we don’t like the idea of a market for personal data for three reasons.”

“First, you cannot trade your data rights away, it remains your personal freedom so you can’t sell to someone without compromising the European Union data protection framework. Second, a set of personal data often does not just concern one person, so an individual cannot assert property rights over data which can also identify other individuals. And third, even if you do allow people to sell personal data, given the enormous and growing imbalances in digital dividends between individuals and big tech companies, I cannot see how fewer market forces will improve the position of individuals.”

The challenge to define data ownership for data about or created by more than one individual is also highlighted by researcher Dr Otto Kässi, who specialises in econometrics of online labour markets, based at the Oxford Internet Institute.

He says: “I do not think that the European legislators have yet properly tested what “personal” data means in the context of online platforms. For example, there is an argument that our data on Facebook and Twitter is not “our” data in the same sense that my bicycle is my bicycle. My interaction with my Facebook friends, on the other hand, is co-created by myself, my friends and Facebook. It is really hard for me to take out “my” part of the data and sell it to someone else. As far as I can see, a large chunk of personal data that is of any value to online data intermediaries is of this variety.”

The problem for Yo-Da is that, according to Kässi and others, online platforms have a reasonable case to claim ownership over this type of personal data, since it is their technology which has enabled and directed individuals’ interactions that created the data.

Falk doesn’t buy the argument. He responds: “While it is true that some data will relate to more than one person, and thus is effectively co-owned, that need not impede the exercisation of consumer rights. Indeed, we have a long standing legal precedent in the field of intellectual property where copyrights on other information goods are co-owned... All the people who contributed to the creation of that information good have some ownership and control over its use.

“So while it is accurate that until now, there has not been clear ownership over personal data, the GDPR changes that completely.”

Nonetheless, questions remain over both the viability and regulatory openness to the data agencies which Falk hopes to set up, and which he believes will catapult Europe into the AI race.

The European data protection supervisor does endorse a number of proposed solutions for managing personal data ownership, including a radical initiative launched by founder of the web and data activist, Tim Berners-Lee. His new project - titled Solid - aims to create an alternative web, where data protection is built into it by design through “data vaults” - allowing users to directly switch on and off data access permissions as they use applications. Chinese VC Kaifu Lee is also marginally more optimistic about the prospect of a totally disruptive innovation like this eventually creating a sophisticated European data economy and boosting AI innovation, but insists that no project will do so by trying to undermine the power concentration in tech giants.

Ultimately, Buttarelli is happy to acknowledge that securing Europe’s place as a world-leader in AI will always be a lower priority than protecting individuals and fundamental rights.

“We need to define what we mean by artificial intelligence,” he says. “If it is artificial intelligence which gives more power to individuals to take control of their digital selves and to benefit wider society then perhaps Europe is ready to be a leader. But if on the other hand artificial intelligence means concentrating power in the hands of fewer companies, then this is not a route that the EU should go down to be a leader.”

How important is the local data market?

Even if Yo-Da did succeed in radically changing the infrastructure around the personal data market in Europe, the hypothesis that this will generate a significant advantage for Europe’s artificial intelligence sector is still up for debate.

Alex van Someren, an early stage investor with global VC firm Amadeus Capital Partners, contests the assumption that data aggregation performed by data agencies would significantly improve AI results in Europe.

“The implication that somehow more data equals victory and that AI algos are going to be magically better for example in China because there are more people there… it’s technically known as false,” he argues. “The fact is that AI algorithms are increasingly evolving to be effective on smaller volumes of data all the time, and furthermore, very large volumes of data tend to refine the very last decimal places of accuracy of algorithms but they don’t significantly affect the first digit or two. It’s just fear, uncertainty and doubt - this idea that China will dominate because it simply can aggregate all the data. It is a soft power politics point rather than a factual reality.”

In fact, this line of reasoning may support Falk’s thesis afterall, since his claim is not that Yo-Da’s automated data request tool and eventually data agencies will produce more data, but rather that it will optimise the value of data already collected.

Most stakeholders agree that GDPR may result in less data collected and processed on European citizens, but the datasets that are created will be higher quality. Nicholas Borsotto, lead economist in The Good Technology Collective based in Berlin, explains that this favourable result is likely to be realised whether or not Yo-Da succeeds in its data agencies vision.

“Article 5 from GDPR puts special importance on keeping your data updated, organised and properly collected,” he points out. “Most important companies have to prove this, which will definitely lead to not only better quality data but more streamlined databases and data collection pipelines. With this in mind (and the further constraints on using data collected by third parties) it will lead to better quality data being the standard...We will be moving to less intensive but more detailed databases being exchanged. Deeper not Broader.”

With only one exception, all the experts interviewed for this article - from Oxford professors to venture capitalists - agreed that the quality and the depth of local data available in Europe would have a significant impact on innovation in certain AI subsectors where algos need to be trained or tested in region-specific contexts. Visual data for autonomous vehicles was the most commonly named example, while healthcare data was noted as a particular challenge in Europe, since it is an area where there is a real need for local data, but the use of personal data restricted due to sensitivity.

In contrast, one venture capitalist arrived at a wholly different conclusion: that GDPR really has opened a viable opportunity for data agencies and similar initiatives, and that these could generate a higher quality data market in Europe, but that the quality of local data is not hugely relevant to AI innovation.

“I think they [data agencies] are likely to proliferate,” says Simon King, a VC at Octopus Ventures. “But in terms of data quality, and how it is affected by them - to some extent I am less sure that it matters. One of the interesting trends in AI and machine learning is the ability to take advantage of unstructured data, where so much of the time has to be spent creating new synthetic data to plug the gaps in existing data, and cleaning the data in the first place.”

Most personal data collected or created online is in the form of discrete structured data, including metadata about online behaviour and user-inputted information. King believes that access to high quality data of this sort is already less important than the ability to reliably process unstructured data. GDPR and Yo-Da’s data agencies may enforce a new standard of data accuracy and quality for personal data collected online, but it won’t suddenly improve the usability of unstructured data.

With technology trends accelerating, evaporating and turning sharp corners all the time, any hypothetical claim about a single game-changing development, like a tool for personal data ownership, has to be met with a healthy dose of scepticism. A key takeaway, however, from all the responses to Yo-Da’s master plan, is that the claim that Europe’s data protection regime will stifle innovation also deserves interrogation.