Machine learning complicates effects of new EU rules on personal data

You may perhaps have become aware of the General Data Protection Regulation (GDPR). The office of the Irish Data Protection Commissioner, via business briefings and media advertising, is increasingly highlighting this new European Union regulation, which comes into effect on May 25th.

The GDPR preamble asserts: "The protection of natural persons in relation to the processing of personal data is a fundamental right." The key theme is that each of us owns our own data. Any company must therefore explicitly request permission to use any of our personal data, explaining why it would like to do so, and for how long. If we so agree, we can later withdraw our permission at any time. We also can ask any company to tell us whether they already have any personal information about us and, if so, why and for what purpose. All of these rights must be provided to us by each company free of charge.

One consequence is that each company must know, and document, what information (if any) they have about each individual. This may be a particular challenge for large, established corporations, since data about individuals may be spread across different business units and multiple databases, spreadsheets, offsite backup copies, and even paper archives. For some time, many companies have struggled to build a “unified view” of each of their customers, even though achieving this would facilitate cross-selling of products from different divisions. Now they will have a legal requirement to do so.

Temporary advantage

For start-ups and young companies, GDPR compliance may thus provide a temporary advantage over larger competitors. Having a relatively smaller number of customers, and having less complex databases, such companies should find the tracking of individuals’ information to be less of a challenge.

Start-ups will have strong business motivations for compliance, in addition to the legal requirement. If they sell to other companies, then GDPR compliance will likely become a condition of doing business. If they intend to ultimately exit by acquisition, then poor GDPR compliance will threaten the exit price offered by an acquirer.

Some industries may be particularly challenged by GDPR. The digital advertising industry has led the global adoption of ‘big data’ technology and ‘mining’ of data. Software algorithms attempt to predict customer purchasing behaviour from various scraps of data snatched across multiple sources. Even if the fragments are anonymous, when assembled together they may be sufficient to digitally fingerprint an individual.

With GDPR, it may be challenging to explain to an ordinary consumer why and how their personal data might be used to train algorithms to infer outcomes for others

GDPR threatens such corporate behaviour since personal data cannot be shared without our prior consent. But GDPR also offers an alternative for advertisers: obtain consumer consent, and you can then target individual advertisements based on factually accurate data.

Will consumers be more loyal, and buy more, from companies which ask permission to use personal data, than from those companies which are not fastidiously GDPR compliant? The EU believes GDPR will nurture more purchasing loyalty, but one wonders whether lower prices, and especially free services, will always trump personal concerns about privacy.

Will the various data-protection offices across the EU, and their oversight European Data Protection Board, aggressively pursue non-GDPR compliant firms? If so, will fissures then open up in various industry segments as established firms struggle to comply, whilst nimble GDPR-compliant start-ups exploit the gaps? We should expect the national regulators to start showing their hands later this year.

Interestingly, the GDPR provisions also apply to companies outside the EU which process information about individuals inside the EU. Thus multinational companies cannot escape GDPR obligations merely by operating outside the EU and operating over the internet into the EU.

Even more intriguingly, GDPR does not stipulate nationality. Thus as an example, it would appear that a US citizen who is in the EU could bring a case against a company in the US offering a service or product over the internet if that company was not GDPR compliant. GDPR also does not appear to stipulate residency: our example US citizen might be in the EU only temporarily.

Machine learning

Meanwhile, the entire global technology industry is being transformed by machine learning. Machine-learning algorithms require training. A very large number of examples, with the corresponding correct outcomes, are given to a nascent algorithm which then tunes itself to recognise patterns in this data. Thereafter, if the algorithm has sufficiently learned by example it can then apply its deductions and inferences to new data which its encounters. Depending on the application, personal data – even if anonymous – may well have been used in training for machine learning.

With GDPR, it may be challenging to explain to an ordinary consumer why and how their personal data might be used to train algorithms to infer outcomes for others. Even if the consumer then consents, the right to withdraw this permission may require a machine-learning algorithm “unlearn” and so forget how an individual’s specific data adapted its learning.

Learning can also be biased by the precise collection of training examples used, subsequently leading to unexpected prejudice in the inferences deduced. GDPR asserts that machine processing must be fair to all individuals, regardless of background, and thus any hidden predilections could cause legal exposure.

Each of us owning our own data sounds simple. The consequences, however, are subtle.