Improving Your Data's Health is Nothing to Sneeze At!
The hype around data-driven decision making grows each year, thanks to the wide availability of data and analytics methods to glean better associations, predictions and recommendations.
However, one of the largest hurdles to successful implementation is the difficulty in obtaining reliable data needed for analytics. A New York Times (NYT) article on the challenges of data wrangling noted that, “Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in the mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets."
In the operations arena, many strides have been made to consolidate master data through Enterprise Resource Planning (ERP) systems and Business Intelligence (BI) solutions, allowing for a single source of truth and providing more real-time updates. Still, companies struggle with data quality and consistency. We see this time and again, regardless of the size of the client, their industry, or the system(s) they use.
So how do you know if the data at your firm is indeed unruly and unhealthy? The list below outlines three telltale signs:
1. Inconsistency
The first data question on any analytics project should be “Where can I get the data?” Here is where data inconsistency often rears its ugly head. If your business has inconsistent data, you will get different results depending on: (1) who you ask about the source of the data, (2) the timing of the data pull, and (3) the source of the data pull. Intra-data source consistency is just as important as inter-data source consistency.
2. Inaccessibility
While it may be easy to identify the source of your data, can you actually touch the data and sample it whenever you need it? It is of no benefit to your business to have good clean data held under lock and key, or if you have to jump through hoops and loops to get to it. Data may be outdated by the time the process completes. Master and transactional data should be easily accessible to a wide base of users, with appropriate rights, who can gain value out of it. If not, it is often the consequences of a heavy-handed IT department.
3. Incompleteness
Imagine you’ve found the right source of data and you have access to it. There is often a moment of truth when you start analyzing and reviewing the data and you find that what is supposed to be there isn’t there. Master and transactional data sets with large chunks missing are almost as worthless as no data at all. In our experience, we find that companies have this issue when there are no identified responsible parties.
Deriving business insights from unhealthy data can be very disastrous for an organization. Consider the following two tactical and strategic initiatives that can be encumbered by data issues: Advanced Analytics and Control Towers. Nothing can more easily deflate any one of these high-floating initiatives than data health issues.
Advanced Analytics: These projects require heavy-duty efforts to solve complex data-driven problems such as predictive analytics, supply chain simulation (what-ifs), supply chain optimization (what’s best cost-wise, service-wise, etc.), and other mathematical and statistical modeling (demand forecasting, etc.). The volume and quality of data required to embark on these efforts are massive. Firms that spend more time questioning the data of these projects will not get the insights necessary until they solve their data quality problems.
Control Tower: The data revolution allows companies to monitor, in near real time, the execution of operations. Poor data will limit the effectiveness business leaders crave when implementing a control tower. Creating visibility into reporting of metrics, KPIs, KPPs, etc. has no worth if the data cannot be trusted. Additionally, disparate data sources create an "apples vs. oranges” situation in which you cannot trust the global view. If you attempt to implement a control tower or monitoring system with more than one single source of data, you won’t have success with its information. Solve the data health issues before achieving your insights.
To improve the health of our data, we need to get back to our basics. It’s important to look at how data is treated across the business; this includes making sure that data is being stored correctly and that the right information is being entered in the first place.
It is also necessary to create, maintain and use a data dictionary – a valuable tool for documenting your data. You can identify and define all data elements and specify validation rules for each data field. In addition to these, processes need to be in place for documenting and reviewing all identified data quality issues. Finally, clear data owners—both the single person and the single system—must be clearly documented and disseminated. These small steps to improving the health of data can lead to a huge leap forward in trusting the business decisions made with it.
Author Mike Romeri is a managing partner at OPS Rules, a leading analytics and optimization firm focused on helping companies identify and capture hidden opportunities in their manufacturing operations and supply chains.