Why does data quality matter? Overall, it can be the difference between a good business decision and a bad business decision based on flawed data. Computers only perform the actions they are programmed for, they cannot think for themselves. The well known computer science concept, if you put garbage in you get garbage out definitely applies here.
There are a number of reasons why a company might record some data. Undoubtedly one of the key reasons is so you can use data to make effective business decisions. In order to use data in effective decision making it must be trusted. That’s worth repeating. You must have trust and confidence in your data. That confidence and trust can be lost with just a few data quality issues. If that data is not trusted, it won’t be believed and it will not influence your business decisions.
In addition another reason for focusing on data quality is that it makes that data easier to analyse. Scientists and analysts should always be checking the data they include in their output. However when they find a problem, it can take time to find what’s wrong and fix the problem. As a result, they are not spending time on analytical output.
Here are some of the questions we like to ask when trying to gauge the level of data quality:
- Do you know for sure that all the data you have captured is correct? If there have been any bugs introduced, do you go back and fix or delete that data?
- Does your business have a consistent and easy to understand labelling and hierarchy for all the events you capture?
- For events that are generated by external sources, do you have a consistent and easy to understand hierarchy. Are all of the external events labelled as such? Do you have the lowest level of detail required for each source?
- Are ALL the meaningful events for your business being recorded and generating descriptive data?
- When labelling the same piece of data do you always use the same rules, techniques and hierarchies?
- If you have different hierarchies based on the same data, Are they always labelled correctly?
- Do you have unique names for ALL of your KPIs? If you have different variations of a KPI do they have different names?
- Have you created comprehensive documentation on the data you capture?
If any of your answers were no to the above questions it’s possible your data is not 100% accurate. In the best case scenario your analysts/scientists may already be aware and are applying fixes to your data already. Your analysts and scientists are an expensive resource. You are effectively wasting that resource if they’re having to perform this work on a regular basis. Not to mention the inaccurate KPIs, reports and analysis they may be producing with data they aren’t aware is incorrect.
How can this be fixed? Here are some of our tips:
- Rule based consistency. Whenever you’re creating descriptive data around an event, always apply rules around the data that’s being captured. This makes analysis and extraction of this data for your analysts and scientists so much easier.
- Help the teams around your business that are generating data. An example of this is marketing teams generating the descriptive data for events from external marketing channels. Create something that can programmatically create the descriptive data for them and can check it for quality issues. This makes the marketeers job easier and reduces the chance of data quality issues.
- Whenever there is a change to the way data is captured, get a data team member involved. They can make recommendations on how to apply the change. They’re informed of when it’s going to happen and they can be part of the QA process for the change.
- Create some reports that alert whenever there are notable changes in your data. If there is an error that gets through, these reports should alert you.
- Document. Try to keep comprehensive documentation on the events you capture and the descriptive data around them.
- Maintain a log of data quality issues that details what the fix was and when it was applied.