Chris Horn: Predictions using big data a hot theme for the tech sector

Francis Beaufort from Navan invented the scale of wind strength now named after him in 1805. He had been shipwrecked at the age of 15, and devoted his career to the development of maritime charts.

In 1831 a naval captain, Robert Fitzroy, sought Beaufort's suggestions on which scientist to bring with him on a charting expedition to South America. Beaufort introduced the young captain of HMS Beagle to Charles Darwin, who subsequently wrote On The Origin of Species based on his research during the voyage.

Fitzroy was a keen observer of the weather. He invented a barometer and wrote a widely used maritime manual on weather indicators. In 1854 Fitzroy was appointed as chief of a new government department to collect and analyse weather data at sea. In due course this became the UK Meteorological Office.

Fitzroy fastidiously built a nationwide network of weather observers: local harbour masters, lighthouse keepers and ships captains. Each day he collated by hand literally hundreds of weather reports to produce a three-day maritime forecast for Ireland and Britain.

Tragically, in 1859 captain Thomas Taylor of the passenger steam clipper Royal Charter chose to ignore Fitzroy's predictions and technology. His ship sunk in a storm with major loss of life off the coast of Anglesey, having just left Dún Laoghaire for Liverpool, near the end of a long voyage which had originated in Melbourne, Australia.

Other ships had heeded Fitzroy’s warnings for that evening on the Irish Sea, and avoided the storm. The resultant public outcry did much to enhance Fitzroy’s status as a weather forecaster.

Fitzroy’s weather predictions are an early example of “big data” and Fitzroy was arguably the first ever “data scientist”. While weather forecasting was the first application of big data, it was soon followed by political opinion polling.

Famously, the US general interest magazine Literary Digest conducted straw polls amongst its readership, by the simple expediency of returning a postcard, on a series of US presidential elections from 1920 to 1932.

These polls always accurately predicted the winner. Buoyed by their success the editors decided that for the 1936 presidential election they would go really “big” so as to obtain the most data possible.

Not only did they poll their own readership, but also registered automobile and telephone users, reaching some ten million potential respondents. Their forecast then proved disastrously incorrect! Worse still, their competitor Gallup accurately predicted the election based on a comparatively tiny sample of just 50,000. The Literary Digest methodology had concentrated only on the wealthy and middle classes, missing blue collar workers who had suffered during the Great Depression.

Big data is not always good data.

Political data scientists today are increasingly accurate. Nate Silver, a US baseball league analyst, successfully predicted the outcome in each of the 50 US states during the last presidential election in 2012. His simultaneously published book, The Signal and the Noise, explains how mathematical analysis can extract meaningful trend indicators and became a best nationwide seller.

Nevertheless just a couple of weeks ago, he publicly blogged on how he had “screwed up on Donald Trump”, and had wrongly predicted Trump’s demise as the Republican candidate for the forthcoming US presidential election.

Predictions using big data are a hot theme for the technology industry. Digital advertising is the largest application. The preferences and behaviour of literally hundreds of millions of people – including you and I – are fastidiously tracked by computers in the hope that judiciously placed adverts on our individual smartphones and devices will influence our purchasing behaviour.

But there are other application areas. For example, IBM’s Watson prediction technology is being applied to areas such as healthcare, the legal profession, and scientific research. Big data is big money.

However, big data, data science and prediction have yet to be successfully applied to the technology industry itself. Thomas J Watson, after whom IBM Watson and its prediction technology are (perhaps curiously!) named, predicted in 1943 that the world would need maybe five computers at most.

Modern technology market predictions frequently are almost as misleading. In 2012 a number of market analysts confidently predicted that the Microsoft Windows phone would rapidly become the dominant smart phone. In late 2012, International Data Corporation confidently asserted that worldwide PC shipments would rebound in the second half of 2013.

In 2014, Gartner predicted for 2015 growth of 2.4 per cent on the overall spend on information technology from $3.8 billion. In practice, 2015 spend on IT was down 5.8 per cent to $3.5 billion.

Accurate predictions of technology trends are critical since considerable investment and confidence in the sector are very strongly influenced by such market analysts. Funding for start-ups and innovation are driven by published market trends.

The industry almost expects the predictions of analysts to be self-fulfilling: because analysts make specific assertions, the industry responds by tuning its investment strategies accordingly. But predictions can still be horribly incorrect.

“Eat your own dog food” is a well known adage to test your own products. Perhaps an Irish data scientist can successfully apply big data to accurately analyse the way the wind blows in the global technology industry and so produce more accurate forecasts for the industry itself.