As an analytical management consulting company, we spend a lot of time helping clients validate data during data warehouse implementations, predictive modeling projects, or other analytics initiatives. Data validation is an important part of any software project, but we have seen clients spend unnecessary hours (or days) chasing down the last mile for data perfection against validation sources that are deemed "The Gold Standard." The Gold Standard is usually an internal report that everyone throughout the organization knows, loves, and trusts. It has gained mythical powers of correctness and no one can question its tyrannical rule. The new data model will not be production ready unless it matches this report line-for-line. If you want to use the Gold Standard, consider the following:
- Does the Gold Standard use the same source data system(s)? Believe it or not we have seen cases where the Gold Standard does not even use the same source system. It may go through several manual "massages" via MS Access or Excel before it is delivered to the enterprise. Or it might come from a "frozen" data set that is much different from how the data appears in the authoritative system that the new data model uses today.
- Does the Gold Standard implement the same business logic vetted by cross-functional users for the new data model? Many times one person put together the Gold Standard, but he or she never really vetted the business logic through an appropriate quorum of cross-functional business data owners, e.g. Data Governance. It may not have been sufficiently documented either, so you may need to reverse engineer the report before deciding to use it.
- What is an acceptable margin of error? We have mentioned this before (See Missing the Point), but if the reports aren't a true apples-to-apples comparison, what is an acceptable margin of error? 1%, 5%, 10%, 20%? We have seen data validation reports with 1000's of lines that are a few percentage points off. We all know that there are differences in how the reports are created, yet the client still wants an explanation for each line that is different. This can take days to do and is it really worth the effort?
- Does the Gold Standard have a business purpose that informs decisions today? A lot of times these Gold Standards take on a life of their own and re-implementing them in the new data model to test for validity is somewhat pointless in itself – think TPS reports from the movie Office Space. In other words, if this information isn't really that valuable, should you really be validating against it in the first place?
Ultimately there can be no Gold Standard for data. Instead of pegging the value of data to a manual data process or business logic that might no longer be relevant nor clearly understood, organizations need to embrace the notion of free-floating data currency. Data's value is only as important as the current strategic initiatives and business processes of the organization that will ultimately help make better decisions that reduce costs and/or tedious labor, generate more revenue, or better service clients.