Insights into the trails we leave in the digital world

Book review: Big Data In Practice by Bernard Marr

Big Data in practice
Author: Bernard Marr
ISBN-13: 9781119231387
Publisher: Wiley
Guideline Price: €35.99

Everything we do in the digital world leaves a trail, information that corporations find invaluable in determining our habits. Assessing information quickly or increasingly, in real time, can have profound effects on the success of a business as it aligns its products or service offerings to meet the requirements and preferences of consumers.

Data is growing at an exponential rate. According to the author of this insightful book, by 2020 it is predicted that about 1.7 megabytes of new information will be created every second, for every human being on the planet. This includes not only data from the tens of millions of messages we send via email, text and social media but also from the one trillion digital photos we take each year.

On top of that, consider the amount of data we have from all of the sensors that surround us. Many smartphones now have sensors to tell us where we are (GPS) how fast we are moving (accelerometers) what the weather is like (barometers) and even what force we are using to press the touch screen (touch sensors).

Turning this information into insight is big business and Marr explores how advances in technology have facilitated this. In the past, there were practical limits to the amount of data that could be processed on one site. The more data there was, the slower the system became. Now, so-called distributed computing means huge amounts of data can be stored and analysed between different servers, each performing a small part of the analysis.

Google a pioneer

Google was a pioneer of this technology. About 1,000 computers, we are told, are involved in a single search query which takes an average of 0.2 seconds to complete. We currently search 3.5 billion times a year on Google alone.

Distributed computing tools such as Hadloop manage the storage and analysis of Big Data across connected databases and servers and is available to rent via the software-as-a-service model, bringing powerful analytics into the hands of businesses on modest budgets.

Algorithms are growing more powerful too. They can now understand spoken words and translate them into written text for content, meaning and sentiment.

This is the context Marr presents for the main body of his book, which contains 45 case studies of how companies are using analytics to drive results. The studies range from retailers like Walmart and Amazon, to manufacturers such as Rolls Royce and GE, to public services such as Transport for London. Almost all of the studies are of either US or UK corporations.

We learn about Walmart’s Social Genome Project, which monitors public social media conversations to attempt predictions of what people will buy. An initiative called Shopycat predicts how people’s shopping habits are influenced by their friends, also using social media data.

Walmart’s so-called Data Café in Arkansas monitors 200 streams of internal and external data in real time. Timely analysis means the retailer can respond rapidly to emerging trends. An unexpected drop in sales of one particular line, for example, quickly highlighted a pricing error that might otherwise have gone undetected for some time. The corporation claims the Data Café has led to a reduction in the time it takes to identify a problem and propose a solution from more than two weeks to just 20 minutes.

Accommodation website Airbnb uses a machine-learning platform called Aerosolve, which incorporates dynamic pricing tips that mimic hotel and airline pricing models. The platform analyses images from the host's photos. Listings with cosy bedrooms are more successful than those with stylish living rooms and the platform automatically divides cities into micro-neighbourhoods.

The insights gained from the feedback allows Airbnb to ensure it concentrates efforts on signing up landlords in popular destinations at peak times and structure pricing so the use of its global client properties is optimised.

Among the other interesting case studies, we learn how the introduction of the Oyster smartcard ticketing system in London has enabled a huge amount of data to be collected about precise journeys that are being taken. Transport for London now has a clearer picture than ever of how people are moving around the city, right down to individual journeys. This creates insights about load profiles, planned interchanges and potential retail offerings at stations.