Mark Twain is quoted, “Everybody talks about the weather, but no one ever does anything about it.” A similar sentiment might be said about junk data. It is a blight which perpetually limits our ability to make data driven decisions which will improve our businesses. Being informed by quality data is more imperative in business than ever to remain competitive and profitable. If your data is ‘junk’, you can never become a data driven organization.
Sometimes junk data accumulates because organizations lack disciplined business processes. Additionally, they may not have the analytics framework in place to monitor those processes and ensure they are followed. Most dimensional or categorical data is generated manually by human input or by automated systems. When these fail, data is assigned inconsistent values and reporting on inconsistent junk data is nearly impossible.
Another cause of junk data is the lack of a planned governed decision framework. Many companies’ analytics efforts are focused on one specific ‘use case’ at a time. Each project becomes a silo built for a specific user or purpose. These independent repositories of data are not built using a common standard which leads to data structures varying widely from one solution to the next.
Alternatively, a governed decision framework will eliminate siloed solutions and have these characteristics:
1. User friendly
You shouldn’t need to be a data scientist to be able to use and understand the data.
2. Formatting
Dates, numbers, percentages, and dimensional values should be formatted the same throughout the data set. If you use ‘96.5%’ in one column, another percentage column in the same data set should not read ‘.965’.
3. Trustworthy
Aggregations of data should be able to be validated against the detail transactions so that end users know they can trust the information and insights obtained.
4. Completeness
If you have dimensional data such as a ‘region’ code, every row of data should always have a value. If the value doesn’t apply, show something like ‘N/A’ instead of leaving it blank. Blank values don’t tell the end user if the field was accidently or purposefully left out.
Preferred Strategies is often told by prospective clients that they want to ‘clean up’ their data before they start an analytics initiative. Our experience has been that until the prospective client embarks on an analytics initiative (and experiences the frustration of junk data), they will never have the process discipline required to enforce the creation of reliable, trustworthy data. If your company is ready build a governed decision framework which will help to clean up your data and provide a solid foundation for analytics excellence, contact us to see how we can help. That is, don’t just talk about the weather, do something about it.