Reviewing your data quality is more important now than ever
What many companies don’t seem to understand is that when data is of bad quality, it quickly becomes a liability, not an asset
Research finds almost half of newly-created data records were of bad quality. Photograph; istock
There is no doubt that data is one of the most valuable assets an organisation can have. Some argue that it has now surpassed oil as being the world’s most valuable resource.
This makes it impossible to ignore the potential of big data trends like artificial intelligence and machine learning, as they aim to disrupt the fabric of our social and working lives. Already, companies know us more than we know ourselves, police departments have futuristic capabilities to predict crime, and new disruptive business models are emerging that focus on data not on physical products.
However, under the glossy veneer, there is a not so much a hidden but an ignored reality about this data-driven future. You expect to hear of vulnerabilities on data privacy or data security but much more fundamental and exponentially more destructive is the fact that most data is of bad quality.
Take a Good Hard Look at your Data
In one of the first studies of its kind, Dave Sammon, Tom Redman and I worked with 75 executives to measure the quality of their company data. Recently published in the Harvard Business Review, our research found that almost half of newly created data records were of bad quality and had at least one critical error.
Furthermore, only 3 per cent of participating companies had data quality scores that were rated as “acceptable”. How would your company or department rank?
For those wishing to tackle data quality within their business, one very simple place to start is the Friday Afternoon Measurement. Created by Dr Redman, it can be completed within two hours – even on a Friday afternoon when everybody is rushing out the door. And it will give you a very good indicator of your exposure to poor quality data and the money being wasted in the process.
Here’s how to do it: First, assemble the last 100 data records your group used or created. This could be anything from customer orders to engineering drawings. Identify 10-15 critical data points within the record and create a new spreadsheet with this data, adding a column for perfect records.
Secondly, ask two or three colleagues familiar with the data to join you for a two-hour meeting. These typically take place on a quiet Friday afternoon – hence the name.
Inspect the data, record by record, marking obvious errors in red. Aim to spend a maximum of 30 seconds per record. Tally the number of perfect records: this is your data quality score.
This score can be used to evaluate just how much your data is costing you. Our study found that data quality scores vary widely, from 0-99 per cent, and that no sector or government agency is immune to the threat of poor data quality.
If your employees spend their time manually correcting data from one system and inputting it to another, consider this your wake-up call.
What many companies don’t seem to understand is that when data is of bad quality, it quickly becomes a liability, not an asset. From a financial standpoint, bad data has been estimated to cost the US economy $3 trillion each year.
More worrying is that bad data can and does cost lives. On the morning of March 14th, 2017, Ireland woke up to the news that the coast guard helicopter, Rescue 116, had crashed into an island costing the lives of all four crew on board. The Air Accident Investigation Unit preliminary accident report highlighted the fact that the island into which Rescue 116 crashed was not in the database of a key safety system in use on Irish coast guard helicopters.
Data completeness is a major factor in data quality and this tragic instance calls into question the quality of official map data in use by Irish coast guard helicopters. What is worse is the fact that this concern about data quality was logged some four years prior to the tragedy .
While this may sound like a one-off tragedy, the fact is that we are exposed to bad data on a daily basis. When you check your bills, how often have you been incorrectly charged? How many times have letters destined for you end up in addresses you have left years ago?
No wonder your instinct is to second guess reports at work. We need to ask ourselves why we have such a high tolerance for bad data and for dealing continually with the fallout, rather than proactively working to prevent it?
For some companies, data quality is a hard sell. When competing for resources against projects such as AI, data quality has had very little success. This is counterintuitive as the success of an AI project will be fully dependent on the quality of your data, not to mention that the quality project would guarantee a higher return.
Many businesses just don’t know where to start. Look out for cases where the responsibility for data has been misplaced in the hands of technology people. They will do what they were hired to do and implement new technologies, which more often than not fail. Data should be treated first and foremost as an asset.
There are so many new data technologies available, but they are all built on the premise that your data is worth using. Reviewing your data quality is more important now than ever. If you want to add your data to our study and gain expert support, by all means send us your results.
Dr Tadhg Nagle is a lecturer in information systems at Cork University Business School, UCC, and director of data business in the Irish Management Institute