Data Isn’t Everything
Guest blog by Dr. Prashanth Southekal. Prashanth is an experienced technology professional who understands what it takes to run efficient technology based solutions, processes, and organizations. He brings over 20 years of experience in Information Management from companies such as SAP AG, Accenture, Deloitte, P&G, and General Electric. Prashanth has published 2 books on Information Management and he is currently working on his 3rd book – “Data for Business Performance”.
The current wave of excitement about data and data related technologies might lead one to think that data is panache for poor organizational performance. Despite all the attention on data including millions of dollars spent on data management, business intelligence, and analytics projects, many organizations still struggle to gain value from the investment. According to a survey by the Economist, 73% of respondents said they trust their intuition over data when it comes to decision-making. Many enterprises believe having a Chief Data Officer (CDO) in their organization makes then a data driven enterprise. But a recent Gartner report is that only 50% of CDOs are successful in their posts.
While data definitely has the potential to improve organizational performance, it has some limitations too. Hence, it would be prudent to know some of the limitations of data or rather situations where even quality data doesn’t add much value to the enterprise.
Data is normally obscured and can be biased
Most data that is analyzed in enterprises is structured data that is stored in databases. The data that is stored in these databases is transformed from the unstructured natural format into a structured format after the raw data is gathered, curated, and finally stored. This structured format is driven either by the application (including the database) or by individual’s predispositions and experience. For example in activity based costing (ABC) analysis, if the application (and the database) can only capture only the start and end time, but not the actual effort of the activity, then reporting and analytics on the activity effort would never be possible. So the data context is either pre-determined or distorted. This means “raw data” that is captured, curated and stored is not only obscure, but can also be biased. According to leading statistician Nate Silver, “There is no such thing as unbiased data. Bias is the natural state of all data”.
Data doesn’t always translate into actions and results
Even if the data quality is good and unbiased, translating data into insights, strategies, decisions, and actions depends on the organization structure, proper training, empowerment of staff to take actions, among other aspects. While data relies on logic, decisions are often made based on emotion. This means pure logic alone can never drive actions from insights.
The image in this post is a real example from one of the main streets in Calgary, AB, Canada, where gasoline price per liter at Shell (at 87.9c) is about 6 cents less than the gasoline price in Esso (at 93.9c). While the competitor’s data for the attendant in the Esso gas station is right in front of him, he is not able to change the gasoline prices due to the approval he needs from his manager. Nothing is more frustrating than having the most timely and accurate data and insights, but still not able to take any action. No value is created by data and insights if they are not acted upon. If the insights are not put into action, then analytics is not providing any value. According to Thomas Edison – “the value of an idea lies in the using of it”. Benjamin Franklin said- “never confuse motion with action”.
Relevancy of data is invariably a function of time and space
Quality data today may not be relevant at a future time in a different space or jurisdiction primarily due to changing business needs and government regulations. For example say within an enterprise shipment in plant A could be based on delivery priority while shipment in plant B could be based on customer type. So for plant B, delivery priority data is irrelevant or unnecessary. However most often the relevancy of data is misunderstood and many organizations spend lot of time and effort in managing data that is unnecessary. This is a perennial problem in information management and researchers Martha Feldman and James March reported way back in 1981 that managers often ask for data and information that they don’t use.
Data has the potential to cause analysis-paralysis
Presently we have capabilities to generate, capture, and process huge amounts of data. According to Eric Schmidt, Chairman of Google, every two days we create as much data as we did from the dawn of civilization up until 2003. According to IBM, every day, we create 2.5 quintillion bytes of data (1 quintillion = 1 followed by 18 zeros). This situation will result in more challenges in getting quality data and ultimately deriving meaningful information. According to William McKnight, author of Strategies for Gaining a Competitive Advantage with Data, “It’s not just spitting out information for the sake of it, it’s actually trying to connect the dots between previous transactions, current transactions, and potential future transactions.” In a survey by Oracle, over 300 C-level executives said their organization is collecting and managing 85% more business data today than it was two years ago. However 47% of them said their organization cannot interpret and translate the information into actionable insights. While data is important, it is the right data that matters.
Not everything that can be counted counts, and not everything that counts can be counted.
Albert Einstein, Physicist
Stakeholder’s perceptions typically precede metadata ontology
A data entity might be consumed in different ways by different stakeholders. For example, while the telephone number field might be used by the sales agent to make customer calls, the tax analyst might potentially use the area code within the telephone number to get the tax rates as per jurisdictions. This means the actual use of the telephone number field is more than its intended use thereby making metadata ontology challenging. In addition, in most cases the boundaries between data and information are not always clear. What is data to one person might be information to someone else and vice versa. To a crude oil commodities trader for example, slight changes in the sea of values coming from the exchanges might act as information for taking appropriate action. But to anyone else, they would look like raw meaningless data.
Data management is expensive and time consuming
While businesses strive for quality data to derive insights, getting and managing quality data is expensive. Data is created, stored, processed, shared, aggregated, cleansed, replicated (to DR sites) and archived and all these activities take time and money. According to the research done by Dr. Howard Rubin of MIT, 92% of the cost of running business in the financial services sector is related to data. Once the data quality is improved, the data quality must be governed in the entire data lifecycle as it is estimated that data quality degrades at 7% per annum. So if organizations need quality data, then the data management initiative should be seen as a continuous improvement initiative at enterprise level (not at LoB or function level) and not as a one-off project. Data management is a marathon, not a sprint.
Data might distort innovation
Data sheds light on the past events which no one has any control to change. Seth Kahan, author of “Getting Change Right”, uses the analogy of driving a car in data driven decision making. He says, “Making decisions just on data is like driving your car only by looking in the rear view mirror. During tough times, leaders tend to depend upon the past to make their decisions as they want to be certain about what they are doing. The more certainty an organization wants, the more they go backwards. But the past only shows where you’ve been, not where you are going or should be going”. According to Lara Lee and Daniel Sobol of Harvard Business School “Data reveals what people do, but not why they do it. Understanding the why is critical to innovation”.
Data is never real-time
Though many companies talk about performing perform real-time analytics on data, data is never real time. The term real-time analytics is an oxymoron. Why? There is always a time lag between data origination and capture. This time lag can be a few microseconds in plant/SCADA/PoS systems or it might be even months before the data is formatted, cleansed, validated, curated, and committed in databases of IT/OLTP systems. On top of this, data is consolidated from diverse systems and aggregated before analytics operations are performed on the BI/OLAP data set. So the time lag is further extended – from microseconds to months or even years. Even though aggregated data analytics is quite different from analytics on streaming-data i.e. data originating from Social media, IoT and sensors’ data, there is still a time lag between data origination and analysis in both cases. With aggregated data the time lag might be days/weeks/months before you can analyze while with streaming-data it could be minutes/hours. Inherently data is “historical” (and unstructured) and the term “real-time analytics” is an oxymoron and doesn’t exist. Finally, even if you manage to get data in real-time, the analysis will be on a single record. To perform trend/prescriptive/predictive analytics, you need significant data records and this means, meaningful data analytics can never be real time.
Data has no relevance for first time events
In today’s uncertain and volatile business situations, businesses are forced to take many actions for the first time. So naturally there will be no data available for these first time events. For example, a company in US might try to enter a new market in Asia, say India. There is no data available for this company if its products or services will work in the Indian market. For example, Wal-Mart did a thorough analysis of the South Korean market before it entered the country in 2006. While Wal-Mart marketed items like electronics, the South Koreans prefer to spend their money on food and drinks. Another example of lack of data for first time events is when an enterprise decides to outsource its IT services. An IT service provider might have a great track record of delivering services from India. Analyzing the data sets of the engagement the IT service provider has with other enterprises to take a decision to outsource does not help much your enterprise because the service model for the business enterprises is very contextual and unique. Basically there is no reliable data for first-time business ventures and business basically have to rely on intuition, consultation with people familiar with the environment, and computer “what-if” simulations.
Data can Mislead Decision Making
Data can be used to mislead decision making in 3 main ways: KPIs, Graphs and Sample size. Firstly incomplete KPIs are a common source of getting misled. Analytics invariably works effectively at an enterprise level. This means, analytics in the context of a business enterprise need to be implemented with a core set of enterprise wide LoB dependent KPIs. This is important given that LoBs within an enterprise often have conflicting goals and any KPIs using the data at the LoB level might provide a distorted picture of the performance of the enterprise. For example, the marketing LoB might present a KPI that shows percentage increase in customer loyalty. While this KPI might be a positive indicator of performance for the marketing LoB, due to the increased campaign costs, the financial KPIs will definitely have an adverse impact. So data in KPIs might not give a complete and accurate picture of the performance of the enterprise. Basically data and KPIs can be used mislead the organization, if the enterprise doesn’t have a core set of KPIs that are LoB dependent.
The second place of being misled by data comes from graphs. Graphs are sometimes deliberately used to mislead due to various reasons in the company. This may happen because the designer chooses to give readers the impression of better performance or results than is actually the situation. In other cases, the reader may be misled by a poor choice of chart selection where the graphs can be manipulated with scales, axis, infographics, and so on. The third common source of misleading comes from the data source or sample size selection. “Tell me what you want to hear, and I will provide data appropriately” is a common joke in business transformation projects. So data and statistics is often used to twist a argument in its favour. A 2009 investigative survey at the University of Edinburgh, UK found that 33.7 percent of scientists surveyed admitted to questionable research practices, including modifying results to improve outcomes, subjective data interpretation, withholding analytical details and dropping observations because of gut feelings. There are also numerous cases where data is manipulated for business gains. For example, in 2007, Colgate was ordered by the Advertising Standards Authority (ASA) of the UK to abandon their claim: “More than 80 percent of Dentists recommend Colgate”. The claim, which was based on surveys of dentists and hygienists carried out by the manufacturer, was found to be misrepresentative as it allowed the participants to select one or more toothpaste brands. The ASA stated that the claim “… would be understood by readers to mean that 80 percent of dentists recommend Colgate over and above other brands, and the remaining 20 percent would recommend different brands.” But upon further analysis ASA understood that another competitor’s brand was recommended almost as much as the Colgate brand by the dentists surveyed. Finally, ASA concluded that the claim was misleadingly as it implied 80 percent of dentists recommend Colgate toothpaste in preference to all other brands. The ASA also claimed that the scripts used for the survey informed the participants that the research was being performed by an independent research company, which was inherently false.
So What is the Advice?
In today’s interconnected world, nobody makes decisions in vacuum. So data very much matters to business and data is the fuel in which today’s enterprises run. But as explained above, there are also cases when investing time and effort in building a data driven enterprise might be a futile effort or even harmful. So when data management initiatives are pursued, the effort could be more ineffective when:
There is no senior management commitment on valuing data as a shared enterprise asset (not at LoB or function level);
There is no enterprise wide vision for running and sustaining a data management initiative,
The insights from data cannot be translated into actions quickly,
The relevancy of data changes constantly depending on time, space, and stakeholder preferences
There is a need for unbiased details of the business processes and activities,
Thank you very much for your time. As always, I am sharing my thoughts to learn from my network. Let me know your thoughts and feel free to share this article in your network if you deem fit.
Prashanth Southekal, PhD, PMP
My First Book in Amazon
My Second Book in Amazon
Link: Data Isn’t Everything