Data Dictionaries: Unsung Heroes of Digital Transformation
I was working with a cross-functional information technology and operational technology team from one of the world’s largest makers of home goods. We were discussing how to drive a digital transformation by delivering detailed insights on production data to optimize processes. They wanted to push their data into our manufacturing productivity platform quickly and get started. They asked, “When can we start using data and seeing results?”
My response? Let’s build a data dictionary first.
After working with hundreds of manufacturers across industries at various stages in their digital transformation journey, I have learned that the data dictionary is an unsung hero. A data dictionary can decrease time to value, increase trust in data, provide traceability, and improve maintainability. Creating a data dictionary can take a week or up to a month, depending on the complexity of the data and the pace of decision-making. That said, not taking the time to create a data dictionary can sabotage a digital transformation by building the entire transformation effort on an unstable data foundation, which undermines both the authority and accuracy of data-driven decisions.
Why Do I Need One?
A data dictionary for a manufacturing digital transformation project is a document that serves as a crosswalk and provides essential context. A good data dictionary includes tags to asset mapping, expected values, computed fields, other metadata, and term definitions. These elements are required to effectively and efficiently integrate data from the factory floor into a data platform and make strategic use of data.
The data dictionary provides the foundation for transparency and traceability by containing all key mapping information, definitions, formulas, metadata, and expectations. In a sense, a data dictionary combines elements of a factory data flow schematic, a data schema, and a glossary.
The size and complexity of a data dictionary depend on how many data inputs you need to track, the format of the data inputs (streaming versus discrete), and the relationships between the data inputs.
Sounds complicated? We build ours in simple spreadsheets and use terms anyone in the manufacturing company can understand.
More than just a list of tags, assets, and terms, data dictionaries serve multiple purposes:
Creating a shared understanding: When a group creates a data dictionary together, the process aligns everyone's understanding of what data is important and why.
Capturing “tribal knowledge”: A data dictionary captures “tribal knowledge” about manufacturing operations that might otherwise have been locked inside people’s heads.
Accelerating user onboarding: The dictionary can also make it easy to add new people to data teams and data projects by giving them a single document with all the information they need to understand the foundations of the project.
Simplifying course corrections and definition changes: A data dictionary serves as a solid baseline, making it easier for everyone to understand where changes have been made in data definitions, metrics, goals, and workflow schematics.
Identify gaps in existing data: Building a data dictionary forces teams to identify data gaps early in a project to ensure that the necessary KPIs are possible to calculate or start the process of capturing and correctly formatting missing data sources and types.
Enabling validation: A data dictionary is an essential artifact for validating key hypotheses of a digital transformation process and resulting outcomes, providing a baseline for comparing the actual (what was implemented) versus the expected (what was planned).
Enabling scale: By creating an easy to replicate blueprint for data to drive a digital transformation, a data dictionary significantly simplifies and accelerates scale out to multiple plants. (E.g., ensuring a single tagging structure for all PLC data)
Enabling maintenance at scale: A data dictionary makes it possible to maintain and continuously improve the data-side of digital transformation by allowing for the creation of new tags or fields and automatically sharing out these updates across the entire manufacturing company.
How to Build a Data Dictionary
Building a data dictionary requires upfront effort and commitment, but reduces rework, facilitates validation, and increases the value of data assets. Representatives from OT, IT, and process engineering, quality management, and data science must all be involved and have bought into the process.
Here are the key steps we take to build a data dictionary:
Identify and gather the right people: Our most successful projects begin by gathering the individuals across IT and OT. We have learned that working only with the individual(s) who can provide data access is a recipe for failure. You need to include people who will use the data to make decisions. By bringing the right people together at the start, you are less likely to miss user requirements or need to make significant changes to your data architecture.
Decide what you need to measure: Select specific, precise measurements that can influence critical outcomes and work backward. If you want to measure shift productivity, data required might be the volume of goods produced, downtime, quality levels, and volume of scrap or breakage. By deciding what you need to measure and breaking it down into data elements, you can then work backward to gather it.
Write clear definitions of required data: Agree on clear definitions for each type of data you want to collect. Include a description of the data and context, such as where data is obtained and why it is crucial. Different people inside a manufacturing company may have different ideas about how to define processes and measurements. For example, we frequently see teams at five similar plants, each having their own definition of OEE. Creating consistent definitions ensures everyone agrees on how to describe the data.
Map out data sources of the data and how that data is collected and reported: PLCs, historians, MES, quality control systems all collect different types of data in various formats. Many plants have undertaken IIoT projects, adding different sensors to the mix. It is crucial to understand all the sources of the data you need and identify gaps in which you must add sensors or data collection capabilities.
Tag to asset mapping: A vital part of this portion of the exercise is mapping the relationship between individual tags (e.g., a field such as temperature) and the machine associated with the tag. A well-defined and well-maintained tag list and hierarchy are ideal, and creating the data dictionary can reveal where gaps or conflicts exist in tag mapping. More importantly, creating the data dictionary can reveal where teams can start creating value with their data while the incomplete parts become available.
Define your data architecture: Once you understand what pieces of data are essential for your digital transformation project, then you can define how all those pieces of data must fit together to capture and calculate the information you want to measure. This step often requires iteration. The data dictionary is essential for keeping track of how data from different sources is related, edge cases discovered during data projects, and new fields or required data transformations.
Create a glossary of terms: The glossary must include what is measured but also important terms that add context but are not measured. This should provide a better contextual understanding of the project and what you hope to achieve with the project.
Share and solicit feedback and reviews: By exposing your data dictionary to a broader audience, you will expose overlooked problems and generate suggestions for improvements. Sharing the data dictionary educates the rest of your team about the digital transformation project and how it will be run. Ambitious, curious employees will want to learn new things and see the project as a way to get better at their jobs.
A Data Dictionary Is a Living Document
Every manufacturing company changes all the time. Sometimes changes are sudden. Sometimes they are planned or gradual. Conditions on the shop floor change. A mechanical failure that used to be rare may become more commonplace as machines age. A new piece of software may be installed that then must be mapped into the dictionary. A plant may upgrade the firmware on systems, changing the output format of data. New employees may join and need to quickly understand a data project.
For these reasons and more, data dictionaries must be living documents that are reviewed and revised on a regular cadence. To be effective, data dictionaries must walk the fine line between providing authority and guidance and being responsive to changes.
Beyond the words, the data definitions, the tagging, and the schemas, building data dictionaries is a powerful way to prepare a manufacturing organization for the cultural challenges essential for the success of digital transformation or data-driven projects. Building a shared understanding of what you want to measure is the first step toward building a consensus on how to improve those measurements. If you want your digital transformation project to succeed; if you want to minimize risks of failure; if you want to create a positive working environment around your project—create a data dictionary before you do anything else.
Beth Crane is vice president of data, Sight Machine.