Big data is watching you

Businesses are mining huge volumes of data to find out where we are, what we’re like and what we’re doing. In the first of a series, we ask if big data is a benefit or a threat

Graphic: Dearbhla Kelly/ITPM. Sources: Peter Sondergaard/Gartner, EMC and IDC

Graphic: Dearbhla Kelly/ITPM. Sources: Peter Sondergaard/Gartner, EMC and IDC


The friendly face of big data lives in upstate New York and is named Watson. It is an IBM computer, the size of a room, that beat two champion humans at the television quiz show Jeopardy in 2011. The computer entered the game with the ability to evaluate the equivalent of 200 million pages of information – that’s about a million books’ worth, including encyclopedias, dictionaries, Wikipedia, journals and literary works – in seconds. It was also able to understand questions phrased in natural language.

An artificial-intelligence phenomenon such as Watson is possible because we are at an intersection where computers are incredibly powerful and software can now analyse enormous volumes of data to uncover useful meaning.

At the same time, we live in a world where endless digital sources of information – desktop PCs, smartphones, tablet computers, sensors, cameras and other devices – produce a nonstop tsunami of data.

All this information contains detail, with patterns, anomalies and connections, that could have practical applications across everything from business to medicine, from computer security to crime prevention, from climate prediction to financial markets, and from social services to personal services. But it also can be revealing – perhaps too revealing, or incorrectly revealing – about each of us.

“It’s a very new way to think about data and information,” says Prof Mark Keane, the head of computer science at University College Dublin and a former director of Science Foundation Ireland. Big data “really forces a new way to think about things”.

The sheer volume of data we now generate and store is mind-boggling.

“Stored digital content is doubling every two years, reaching one zettabyte” – a billion billion megabytes – “last year. Just think about that for a moment. That’s the equivalent of 4.9 quadrillion books,” Art Coviello, the chairman of the security company RSA, told a conference in San Francisco this year. (A quadrillion is a one followed by 15 zeros.)

He noted that this year more than a billion things, including cars, sensors, PCs, smartphones and even vending machines, all producing raw data, will be connected to the internet. In less than a decade, analysts suggest, as many as 200 billion objects will be online.

Much of that data flow gets stored by businesses, research centres, utility companies, health facilities, government departments and other organisations, but less than 1 per cent of it is currently being analysed, according to the technology analysis company IDC. That is set to increase, launching the era of big data.

The challenge, Keane notes, is to get at the nuggets of useful knowledge hidden in all that detail. That need has created an increase in new software and services in the information-technology industry, with all the big technology firms, including Microsoft, Oracle, EMC and IBM, as well as many smaller ones, lining up their own solutions.

This in turn is expected to generate millions of jobs worldwide for people with the training to help organisations mine all that data. Peter Sondergaard, the head of research at the technology analysis company Gartner, predicts big data will spur the creation of 4.4 million jobs by 2015.

Face to face
Everyone who uses the internet has already come face to face with big data every time they type a search query into Google. The company’s algorithms, or mathematical formulas, have been refined to the point where unimaginable quantities of data records can be searched in an instant. You enter your search term, and a list of websites, images, blogs, documents and other items is returned in a blink. That’s big data in action.

In many ways, says Keane, Google has pioneered the way people think about big data by turning upside down many assumptions about how to find answers.

It used to be that computer scientists would direct computers at problems by trying to define what they should look for: an answer that is either black or white.

A big data approach looks for shades of grey, a computing challenge that until now would have been considerably more difficult. Google decided that searching for patterns in huge amounts of data – the more the better – could produce greater insights. The more data that could be collected and examined, the greater the number of meaningful patterns that could be detected – which, in the case of a search engine, could indicate whether a web page was more useful, and thus given a higher rank, or less useful.

Big-data projects in business, science, medicine or social sciences all work the same way: take huge amounts of data and hunt for patterns and anomalies. The trick that makes it all work is in creating data analytics – software tools that can find and understand those patterns. Only now is the combination of computing power and cheaper hardware making this possible as a general tool rather than as something limited to governments, very big business and researchers.

“When you get to certain amounts of data you start to see things you don’t see with smaller bits of data,” says Keane. He found this held true in an experiment to see whether the frequency with which certain words appeared on news sites could predict movements in financial markets. To his surprise, he found they could: verbs and nouns used by financial commentators in about 18,000 articles converged to align with changes in stock markets.

Scientists are using similar pattern-searching techniques to explore our DNA to better understand illnesses and find treatments; to analyse the atomic collisions in the Large Hadron Collider at Cern to find mysterious particles, such as the Higgs boson, that might help us understand how our universe was formed; and to examine complex weather patterns to improve storm forecasting and so perhaps save lives.

Big-data analytics are behind IBM’s Smarter Cities initiative in Dublin, where researchers hope that looking at traffic patterns, or electricity and water usage, or the movement of people across the city, might enable the development of more efficient public transport and energy usage, or the management of a flu epidemic.

Businesses want to use big data to provide better customer service by understanding customer behaviour on a website or in a shop, deliver advertisements that people will respond to, or track and monitor objects in warehouses and transport.

Healthcare services hope big data can improve patient care in hospitals through detailed monitoring, enable better analysis of scans and X-rays, improve diagnosis and enable long-distance care.

Big data is spurring interest, and a wide range of well-funded projects, at EU level as well as within Ireland. The EU’s Seventh Framework Programme for Research and Technological Development, for example, has allocated €50 million to numerous projects, including many in the big-data and analytics field.

Ireland’s research and business communities are involved with several EU-level initiatives, including a new €3 million project called the Big Data Public Private Forum, aimed at developing a big-data strategy for Europe. Researchers at the Digital Enterprise Research Institute at NUI Galway, which aims to understand the rapidly expanding big-data aspects of the internet, are heading Irish participation.

Big data and analytics also make up a strategic focus of the Government’s new €300 million investment in seven research centres here, with the goals of making Ireland a global leader in big-data technologies and generating many high-level jobs.

International leader
On the other hand, at the business rather than the research level in Ireland, a recent survey by the Irish-based data-centre services provider Interxion showed we may have a way to go to be an international leader in the area. Just 56 per cent of IT professionals surveyed here said that big data would become a business priority in the next five years, compared with an EU average of 76 per cent. And twice as many IT departments in Ireland (42 per cent) said that they found big data a “significant challenge” compared with UK counterparts (21 per cent).

There is a dark side to big data, too. Privacy advocates are alarmed at how much detailed information is produced by and about individuals and organisations, and held by companies, government divisions or law-enforcement agencies.

A mobile phone, for example, is a tracking device for an individual. Correlate all that data as the phone checks in with local phone masts and you know not just what one person was doing during the day but what virtually the entire population, including children, were doing.

And every time we use the web, especially with the advent of social media, where we leave highly detailed, personal information about ourselves and others, we leave a trail of digital footprints that can be collected and analysed with increasingly sophisticated tools.

Such databases can be merged and correlated to reveal even more detailed information as well as patterns, of varying accuracy, that describe what a person or organisation is actually doing versus what it might appear to be doing.

Even when data is anonymised, or separated from any immediate and obvious defining detail that could link it to an individual, journalists and researchers have been able to identify individuals once numerous sets of data were correlated. That suggests our understanding of big-data technologies and their potential uses lags behind their implementation. Individually and collectively, we are already producing streams of daily data that could be used in ways we cannot foresee.

“There are going to be good things and bad things about big data,” Prof Mark Keane says. That makes it all the more important for us to understand its concepts and realities, so we can approach this rush of information with an informed balance of confidence and caution.