Big Data: Don’t put the chart before the horse

Before jumping into data science, business executives need to work out what they’re trying to accomplish with big data, and data scientists need to think more like business execs

What's the character trait which sets apart good data scientists from bad? One might assume analytical skills, curiosity, even a head for numbers. In fact, according to EMC's CTO Bill Schmarzo – aka the "Dean of Big Data" – it's humility.

“Everyone is asking the same question: what does a data scientist look like?” he says. “I can send a person to school, teach them statistics, computer programming, maths, and a lot of different analytic techniques, but they have to be curious about the world around them in order to be a good data scientist.”

Even more important, however, says Schmarzo, is one’s ability to listen. “We find the number one trait distinguishing good data scientists from average ones is humility. Humble people tend to listen more and talk less. When a good data scientist is engaging with business users, he is asking questions and truly listening to what the business users are saying, instead of trying to show them how smart he is.”

Data scientists have a tendency to get so enamoured with the analytical models that they build that they don’t think through what it might mean for a business.


“We try to help the data scientists think about presenting their results in a way that’s meaningful,” says Schmarzo. “In many cases that means making things much simpler and thinking like a business exec. What is the business trying to accomplish? What decisions are trying to be made and how do the results of a data analysis help them do that? Instead of presenting a chart with all kinds of models and scores first, the question must first be asked: what do you want to find out?”

Big questions

There are still a lot of mistakes being made by businesses jumping into data science without properly thinking it through. It is still a burgeoning discipline and, like the dawn of the internet age when every company decided they needed a blog only to discover they had nothing to say, firms are becoming conscious of the potential value of big data before asking what exactly it can do for them.

“Companies are making a lot of mistakes,” says Schmarzo. “And some of those mistakes are almost necessary. In the past, I have come out pretty harsh with some firms that start with the technology before they know what to do with it, but in reality there are benefits to being familiar with the tech before knowing how to leverage it to your advantage.”

Still, Schmarzo stresses it is better doing things the other way around.

“If you don’t know what it can do, you won’t know what to do with it, right? A lot of companies waste time and money installing expensive software, putting data on it, hiring data scientists and hoping for some magic to happen. When it doesn’t they realise they’ve learnt a lot more about the tech but nothing new about their business.”

The right way to approach this is to figure out what decisions are trying to be supported and what data is needed to do that.

Every company thinks they need a big data strategy. “We tell them no, you don’t. What you need is a business strategy that incorporates big data. When you flip that bit the conversation becomes more productive.”

Big data of Things

According to Schmarzo, the Internet of Things, in its simplest sense, is just new data.

“IoT is a truly exciting new realm for tech and it will help make more informed business decisions in so many cases – predictive maintenance on parts for trucks, cars, jet engines, wind turbines, decisions about load balancing, demand forecasting, capacity planning etc. The list of things we’ll be able to do goes on and on.

“There are a bunch of really solid business-use cases where that wealth, or granularity, of data will help us do things better, faster, and more accurately. But ultimately, the IoT is just another data source to be considered and if you don’t know what kind of decision you’re trying to make, gathering the data will be a total waste of time. If you know what kind of decisions you’re trying to make, then I can tell you what data you’ll need and what analytical models to use, not the other way around.”