Why data quality matters - Data Companion

Why does data quality matter? Overall, it can be the difference between a good business decision and a bad business decision based on flawed data. Computers only perform the actions they are programmed for, they cannot think for themselves. The well known computer science concept, if you put garbage in you get garbage out definitely applies here.

There are a number of reasons why a company might record some data. Undoubtedly one of the key reasons is so you can use data to make effective business decisions. In order to use data in effective decision making it must be trusted. That’s worth repeating. You must have trust and confidence in your data. That confidence and trust can be lost with just a few data quality issues. If that data is not trusted, it won’t be believed and it will not influence your business decisions.

In addition another reason for focusing on data quality is that it makes that data easier to analyse. Scientists and analysts should always be checking the data they include in their output. However when they find a problem, it can take time to find what’s wrong and fix the problem. As a result, they are not spending time on analytical output.

Here are some of the questions we like to ask when trying to gauge the level of data quality:

Do you know for sure that all the data you have captured is correct? If there have been any bugs introduced, do you go back and fix or delete that data?
Does your business have a consistent and easy to understand labelling and hierarchy for all the events you capture?
For events that are generated by external sources, do you have a consistent and easy to understand hierarchy. Are all of the external events labelled as such? Do you have the lowest level of detail required for each source?
Are ALL the meaningful events for your business being recorded and generating descriptive data?
When labelling the same piece of data do you always use the same rules, techniques and hierarchies?
If you have different hierarchies based on the same data, Are they always labelled correctly?
Do you have unique names for ALL of your KPIs? If you have different variations of a KPI do they have different names?
Have you created comprehensive documentation on the data you capture?

If any of your answers were no to the above questions it’s possible your data is not 100% accurate. In the best case scenario your analysts/scientists may already be aware and are applying fixes to your data already. Your analysts and scientists are an expensive resource. You are effectively wasting that resource if they’re having to perform this work on a regular basis. Not to mention the inaccurate KPIs, reports and analysis they may be producing with data they aren’t aware is incorrect.

How can this be fixed? Here are some of our tips:

Rule based consistency. Whenever you’re creating descriptive data around an event, always apply rules around the data that’s being captured. This makes analysis and extraction of this data for your analysts and scientists so much easier.
Help the teams around your business that are generating data. An example of this is marketing teams generating the descriptive data for events from external marketing channels. Create something that can programmatically create the descriptive data for them and can check it for quality issues. This makes the marketeers job easier and reduces the chance of data quality issues.
Whenever there is a change to the way data is captured, get a data team member involved. They can make recommendations on how to apply the change. They’re informed of when it’s going to happen and they can be part of the QA process for the change.
Create some reports that alert whenever there are notable changes in your data. If there is an error that gets through, these reports should alert you.
Document. Try to keep comprehensive documentation on the events you capture and the descriptive data around them.
Maintain a log of data quality issues that details what the fix was and when it was applied.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_5K9XFZ6MMP	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_222714424_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Here are some of the questions we like to ask when trying to gauge the level of data quality:

How can this be fixed? Here are some of our tips:

Leave a Comment Cancel Reply