Are You Ready to Tackle Your Junk Data?

By Ben Harrison  |  September 23, 2021

Share this post:

Mark Twain is quoted, “Everybody talks about the weather, but no one ever does anything about it.” A similar sentiment might be said about junk data. It is a blight which perpetually limits our ability to make data driven decisions which will improve our businesses. Being informed by quality data is more imperative in business than ever to remain competitive and profitable. If your data is ‘junk’, you can never become a data driven organization.


Sometimes junk data accumulates because organizations lack disciplined business processes. Additionally, they may not have the analytics framework in place to monitor those processes and ensure they are followed. Most dimensional or categorical data is generated manually by human input or by automated systems. When these fail, data is assigned inconsistent values and reporting on inconsistent junk data is nearly impossible.


Another cause of junk data is the lack of a planned governed decision framework. Many companies’ analytics efforts are focused on one specific ‘use case’ at a time. Each project becomes a silo built for a specific user or purpose. These independent repositories of data are not built using a common standard which leads to data structures varying widely from one solution to the next.


Alternatively, a governed decision framework will eliminate siloed solutions and have these characteristics:


1. User friendly

You shouldn’t need to be a data scientist to be able to use and understand the data.


2. Formatting

Dates, numbers, percentages, and dimensional values should be formatted the same throughout the data set. If you use ‘96.5%’ in one column, another percentage column in the same data set should not read ‘.965’.


3. Trustworthy

Aggregations of data should be able to be validated against the detail transactions so that end users know they can trust the information and insights obtained.


4. Completeness

If you have dimensional data such as a ‘region’ code, every row of data should always have a value. If the value doesn’t apply, show something like ‘N/A’ instead of leaving it blank. Blank values don’t tell the end user if the field was accidently or purposefully left out.


Preferred Strategies is often told by prospective clients that they want to ‘clean up’ their data before they start an analytics initiative. Our experience has been that until the prospective client embarks on an analytics initiative (and experiences the frustration of junk data), they will never have the process discipline required to enforce the creation of reliable, trustworthy data. If your company is ready build a governed decision framework which will help to clean up your data and provide a solid foundation for analytics excellence, contact us to see how we can help. That is, don’t just talk about the weather, do something about it.


About the Author

Ben Harrison

Ben is an experienced business analyst with a demonstrated history of working in the construction and process industries.

Related Articles

July 07, 2021
Power BI Updates for QuickLaunch Customers: Q2 2021

The Power BI team has been releasing new features in full force. Q2 has seen a steady stream of updates to all areas of the platform.

Read More >
March 23, 2021
Power BI Updates for QuickLaunch Customers: Q1 2021

Power BI was named a leader in the Gartner magic quadrant again this year. (Be sure to check out our interactive analysis of the Magic quadrant here.) A major factor of Power BI’s success has been the dedication by Microsoft to continuously improve the Power BI product suite with updates that are demand-driven and consistent.

Read More >
March 08, 2021
Interactive Gartner Magic Quadrant for Analytics Trend Analysis

In this article we showcase an interactive version of the Gartner Magic Quadrant for Analytics and Business Intelligence (ABI) built using Microsoft’s Power BI platform. Using this interactive view of the data, you can see how vendors have trended over the past 11 years.

Read More >