Attempts to keep online data private a futile endeavour

Our relatively recent and rapid transformation into a data-driven, data-devouring society comes as no big surprise to Jeffrey Ullman, a pioneering figure in the field of data science.

"There's a sense in which all of this was inevitable from the time that Gordon Moore described Moore's Law," says Ullman, emeritus Stanford W Ascherman Professor of Computer Science at Stanford University in California, whose students have included Google founder Sergey Brin.

"Throughout my career, key parameters have been doubling every two years, which led to unimaginably large capabilities. When I started out, my first computer had, I think, 8,000 bytes of memory. Today Facebook has close to a petabyte of main memory."

He pauses, then concedes: “It’s kind of unimaginable.” He adds, “Of course, what was very hard to grasp was what the consequences of the numbers were.”

Some of the consequences have resulted in new computing sectors such as cloud computing, and Ullman's far-reaching knowledge of data science prompted the National College of Ireland to appoint him chair of the advisory board to its Cloud Competency Centre. That role sees him occasionally visit Dublin – as he did earlier this summer – for meetings, talks and debates.

When he started his research and then academic career following engineering degrees from Columbia and Princeton, “we didn’t have, for example, a really good estimate of how fast the machines have to be and how much memory has to be available in order for, say, a computer to beat a grand master at chess. It turns out it took by today’s standards, not too much,” he says.

“We didn’t really have an idea of how much progress could be made with how much capability, but it looks like we’re on a very good trajectory. There’s a lot of very interesting things becoming possible with today’s capabilities and no reason to believe 10 years from now when things may be 30 to 40 times better and faster, even more amazing things will be possible.”

Interesting things

As the field has grown and expanded, Ullman notes defining exactly what the term “data science” encompasses has become more difficult. He likes to keep it broad and simple, seeing data science as a subset of computer science, not a separate area.

“To me it is just applications of the stuff computer scientists have been doing all along. Often it’s defined as applications to the classical areas of science ranging from physics, biology, medicine the social sciences – and a lot of interesting things have been happening there.”

While data science uses mathematics and statistics as tools, it isn’t part of those fields. “Data science itself to me is simply one of the interesting branches of computer science.”

For better or for worse – and Ullman comes down firmly and sometimes controversially on the side of better – we all live in a data-saturated world full of those consequences, many of which make regular headlines, at times giving data science a bad rap.

On the plus side, the ability to gather and analyse mind-boggling levels of data enables the boom in artificial intelligence, the growing capabilities of autonomous vehicles, and exciting advances in using personalised medicine to better attack cancer.

On the downside, line up the risk of data breaches, the increased capabilities for secretive surveillance, the power of social media companies and the misuse and abuse of personal data, such as happened when Facebook data ended up with Cambridge Analytica and potentially affected the outcome of national elections.

But Ullman says that in some ways, he can’t see what all the fuss was about with Cambridge Analytica.

“Except that it had the unfortunate outcome of electing Trump, the thing makes perfect sense. People respond to certain arguments and knowing which argument to get them to vote your way or see things your way – why not?”

Attempts to try and keep online data private "is King Canute trying to hold back the tide, once you have the capability of finding out what people care about."

He argues that “advertisers have from day one been trying to understand what makes people respond to their viewpoint. It’s done all the time and the advances in computer capability and the kinds of facilities it enables, like Google and Facebook, is just making more of this possible.

“But it’s not really a new phenomenon that people try to influence other people, and do that often by trying to know about them. It’s just become easier to do what people have been trying to do all along.”

His own views on privacy come down hard on the side of the businesses that wish to use data.

“My personal preference would be to redefine privacy. Basically, you have to assume that anything about you will be known and used by people around you. Its sort of the same way things were in a village 300 years ago, right? Your neighbours knew what you were up to. You really had no privacy in the sense that was only possible with the rise of large cities where people could move and be anonymous.”

Right to be forgotten

He especially hates the notion of a “right to be forgotten” which – as the European Court of Justice has mandated – allows links to certain types of untrue or outdated information be removed from search engines such as Google or Bing.

"The right to be forgotten is a very stupid idea. For example, in the United States we have a government database of sexual predators. Well what if the predators don't want to be on that database and say, 'take it down, Google'? It's ridiculous [note: it is questionable whether the ECJ interpretation of a "right to be forgotten" would permit the removal of this type of information].

“I understand if there’s some video in the past of someone getting drunk, well I can see that . . . but the counterargument is, well there are things that people should know about you. If you have defaulted on a loan, I should have the right to know that before I give you money.

“There’s going to be safety in the herd. I mean everybody is probably going have a video of them doing something embarrassing somewhere and if I am basing whether I hire somebody on that, my choices are going to be pretty limited.”

He adds: “I don’t think it’s quite the problem that some people are claiming. It is a trade-off. Google is a very powerful tool for improving people’s lives and if you don’t let them have access to all the facts . . . what the right to be forgotten does is it enables people to alter the facts and in particular forces Google to reflect a world that isn’t factual so that is, I think, a serious mistake.”

He says he has “mixed feelings” about industry regulation, but comes across as mostly against it.

“Regulation is a mixed bag. I don’t generally like the idea of coming down on the side of privacy in the way we have interpreted it since the rise of large cities. Even in the Cambridge Analytica case, while it led to a bad outcome I think the principle of trying to understand what people think and what they respond to and learning about this – I don’t really have a problem with that.”

Even though by signing up to do Cambridge Analytica’s personality quiz, people also – mostly unknowingly – gave the company access to their contacts’ data without their permission?

“I understand that many Facebook users didn’t understand what they were giving consent to. But you never have to respond to someone’s argument that you don’t want to. If they were able to learn something about you, even if you didn’t want them to be able to learn about it, and they present you with an argument, you can always say I don’t agree with that.”

Regulatory bodies

There are many counterarguments to this, including that people were not told how their data was being used, that advertisements were not (as required by law) tagged as having been funded by political lobby groups, that third parties had their data gathered without their permission, all of which have been of concern to national regulatory bodies in the US, UK and EU . . . but Ullman – whose Stanford web page carries a number of his own self-described “old curmudgeon” opinion pieces – won’t be swayed from his viewpoint.

“Regulation in general is tricky, especially when it is done by people who don’t understand as much as the people who are running the companies that are being regulated. For example, our loss of net neutrality could be a disastrous matter. It gives too much power to the dumbest companies – the ones whose role is basically transporting bits from one place to another, and less power to the Netflixes and the Googles and the Facebooks that are actually changing our world, mostly for the better.”

Changing the subject, then: what developments and applications in data science seem most promising, and intrigue him most?

“I would say probably the most valuable thing that could happen would be targeted medicine”, such as bespoke treatments for cancer. “It’s a very exciting area, targeting people’s genome. It requires access to large amounts of data because if it only works on 1 per cent of people, you need to have thousands and thousands of genomes available [to understand how the individual can be targeted],” he says.

“That’s the interesting thing about technology – you never even imagine something is possible until somebody does it.”