Is Data the New Oil?
Here is a guest blog by Kevin Gray .
Kevin has been a marketing scientist for more than 25 years and my background covers dozens of product and service categories and over 50 countries. In addition to Advanced Analytics and R&D, his responsibilities have encompassed business development, questionnaire design, report writing, management and various other facets of marketing research. He is also President of Cannon Gray, a marketing science and analytics consultancy. He has more than 30 years’ experience in marketing research with Nielsen, Kantor, McCann and TIAA-CREF.
We’re told that data are the new oil. But oil is useless unless we know how to use it.
I take the position that there are no “natural” resources. Raw materials are raw materials until they’re put into use, and how they are used determines their value. This, in turn, can change over time.
“Data are different!” we may hear. Not in my opinion. Nor is the use of data to make decisions new. Crows can count and use this innate ability to make some decisions. Paleoanthropologists have discovered evidence of humans keeping records thousands of years ago. Ancient civilizations used data and the pyramids could not have been built without data. Data have been a part of military decision-making for centuries and merchants have always used data, even if crudely and unsystematically. Even astrology has some empirical basis – there are stars, after all.
There are many ways data can be used to make decisions. One way is to study change over time. In a business context, we may study stock prices, sales, and operating profit of a company for the past decade. Physicians are also interested in keeping an eye on our weight, blood pressure and cholesterol levels and how they vary over time. Physicians and exercise enthusiasts can expand greatly on this list. Whether a trend is upward, downward or flat can be good, bad or irrelevant, and this depends on the circumstances and how knowledge of the trend will be used by decision makers.
Another way to use data is to compare groups we believe to be important on the basis of numbers we believe indicate important things about these groups, whether they be consumers, companies, or other entities. Believe is significant here since beliefs most strongly held are often the least accurate. We tend to become ego-involved and emotional when cool heads and detachment should prevail.
Though there are deterministic models grounded in first principles, most data cannot be fully explained, thus the error terms in statistical models. We can only guess at the mechanism(s) that gave rise to the data. Statisticians often are loose in their phrasing and attribute errors in their model to chance (“random error”) but what this really means is they don’t know why the results of their model depart from expectation in certain ways. Measurement error, omitted variables, model misspecification and a whole host of other reasons can cause lack of fit or render the model useless when applied to new data.
Of course, a lot of data aren’t analyzed in depth and not all have to be. For example, bad economic news may cause the stock market to suddenly tank. We may want to have a closer look but we might be reasonably certain that the bad news caused the sudden drop. But not always. Making these sorts of snap judgment can prove very costly, especially when we’re making a lot of them. We need to consider the costs of a more thorough investigation and the costs of being wrong.
Estimating the financial impact of renovating a plant may not require rocket science, but judgments regarding the effectiveness of our marketing activities, for example, are are seldom clear cut. There are many highly correlated variables whose independent effects are difficult or impossible to pry apart. A relationship between two variables may depend on other variables (moderation). It may go through other variables (mediation). Relationships may be curvilinear, not straight line (linear in the vernacular). There may be lagged relationships in which the effect of one variable on another does not become evident until much later. Cause and effect can be hard to disentangle. For instance, marketing budgets are driven in part by past sales, but past sales are, in part, driven by past marketing activity.
Moreover, consumer behavior is not always easy to predict. Remember that perfect birthday gift to a close friend or loved one that totally flopped? People can behave in similar ways for similar reasons. But they can also do the same things for different reasons, different things for the same reasons and different things for different reasons. It is also important to understand that neuro-scientists are still unsure how humans make decisions.
Consumer behavior can differ by product category and purchase occasion (e.g., for self or other family member). It can also change over time, as events such as a job promotion, relocation, marriage or birth of a child can have a dramatic impact on consumers’ behavior. Marketing can influence behavior and that, of course, is its purpose. However, it can influence it ways we hadn’t intended and that are undesirable, encouraging bargain hunting and eroding brand equity, for example. Marketers seldom know exactly why individual consumers behave the way they do even when they are able to predict it quite accurately, but knowing the why can be immensely helpful to marketers. New Product Development is one example.
Let’s return to management decisions. In the heat of battle, many of us have gotten into bad habits, and one is confusing tactical and strategic decisions. Not all decisions must be made instantaneously, despite what we hear from salespeople in the blogosphere. Haste can make tons of waste. Marketing research, in fact, has historically made its main contributions at the strategic level rather than tactical level, thus the word research. In my opinion, to become the new oil data will need to be more fully utilized at the strategic level of decision making, not just used more extensively for automated machine-to-machine communication (M2M) purposes.
The data | information | knowledge | wisdom hierarchy still makes plenty of sense.
The human brain is essentially the same as it was 25,000 years ago. Hadoop and Support Vector Machines have yet to play a role in natural selection. We are cave people with computers and balance sheets, and have a lot more to learn about how to use data to make decisions.
Generally speaking, when assessing an argument I use two broad criteria:
Is the argument internally consistent?
Is it supported by actual empirical evidence?
Some positions that appear compelling refute themselves upon examination. Claims that seem logical are also frequently thought to have been proven. This is a common mistake. Logic is not evidence, which is why scientists test hypotheses against evidence. Managers will need to think more like scientists if data are really to become the new oil…and not the new snake oil.
I’d like to offer a few specific tips on how managers can use data and analytics to make better decisions, based on my experience as a marketing researcher and data scientist. I’ve phrased them as questions so you can use them as a checklist if you wish.
Who will be using the data, how will they be used and when will they be used?
How clean are the data?
Do I know what they really mean, i.e., do I clearly understand what the rows and columns in the data represent?
How current are the data?
What other data do I need to provide a more complete picture? There will nearly always be something important missing!
Is a pattern (e.g., difference or trend) I’ve spotted strong enough to have decision relevance?
How do I know pattern isn’t a fluke? How confident am I that I truly understand what caused it?
If it is important from a business perspective, which decisions might it affect and how would it affect them?
How do I later assess the quality (e.g., profitability) of my decisions? Related to this, are the KPIs I’m tracking really meaningful? Do they actually predict business performance or have I merely assumed they do? Again, logic is not evidence.
If I’m looking at the results of a mathematical or statistical model, do I have a good enough grasp of how it works to be able to interpret it correctly? Are there competing models that have been considered or should be considered?
Can the model results be used in simulations, for instance, sales forecasts under different hypothetical marketing scenarios?
Am I a binary thinker or have I learned to think in terms of conditional probabilities? Whaaaat??? An example of the former is “Do we go with this or not?” and an example of the latter is “If we do A and B, then C will probably happen, but we cannot be certain.”
How well do we (e.g., senior management) truly understand how we make decisions?
How might the data or results of analytics be used by others politically within my organization?
There are long lists of cognitive biases and there has been a lot of hype about behavioral economics and non-conscious decision-making. Loss aversion and confirmation bias, in particular, have generated a lot of buzz. I suspect many of these human imperfections would have been of little surprise to cave people of earlier generations, and eminent psychologist Gerd Gigerenzer has a different take on decision heuristics. These reservations aside, used judiciously, awareness of these biases can help decision makers become better decision makers. So, my final question is: Have you considered how some of these biases may have influenced your answers to my earlier questions?
I hope this has been interesting and helpful!
Link: Is Data the New Oil?