Data Quality Is More Important Than Ever

Data quality, once the focus of just a few data stewards, has become a business and IT challenge of unprecedented scale.  Not only must business users be more confident than ever in the data they use, IT must now address data quality everywhere in the enterprise, including:

  • Systems of Record – Transaction and other source systems
  • Consolidated Data Stores  – Purpose-built data warehouses, marts, cubes and operational data stores
  • Virtualized Data  – Shared views and data services
  • Visualization and Analysis Solutions  – Business intelligence, reporting, analytics, etc.

Data Quality Strategy Reaches Far Beyond ETL and the Data Warehouse

Traditionally, data quality efforts have been focused on the consolidated data alone using a number of tools and techniques to “clean up” the data on the way into the warehouse.  However, this is no longer good enough.

Gartner points out this wider need in their 2010 Data Quality Magic Quadrant where they sized the market for data quality tools at nearly three quarters of a billion dollars annually in 2009 and forecasted double digit growth a year over the next five years.

Data Virtualization Improves Quality of Virtualized Data

The Composite Data Virtualization Platform improves the quality for your virtualized data, complementing the data quality strategies, processes, and tools you use for your systems of record, consolidated data stores and visualization and analysis solutions.

The Composite Data Virtualization Platform provides a number of mechanisms and techniques to improve the quality of your virtualized data including validation, standardization, cleansing and enrichment, and more as seen below.

Data Quality

Data Virtualization Improves Quality of Virtualized Data

Data Virtualization Eliminates Many Root Causes of Poor Data Quality

In his white paper, Effecting Data Quality Improvement through Data Virtualization David Loshin, president of Knowledge Integrity, Inc, (http://www.knowledge-integrity.com/) and a recognized thought leader and expert consultant in the areas of data quality, master data management, and business intelligence describes how data virtualization helps overcome the four major causes of poor data quality including:

  • Structural and semantic inconsistency – Differences in formats, structures, and semantics presumed by downstream data consumers may confuse conclusions drawn from similar analyses.  Composite data virtualization lets you transform sources so your consumers get normalized data with common semantics, eliminating confusion caused by structural and semantic inconsistencies.
  • Inconsistent validations  – Data validation is inconsistently applied at various points in the business processes, with variant impacts downstream.  Composite data virtualization lets all your applications share the same validated, virtualized data so you get the consistency you need.
  • Replicated functionality  – Repeatedly applying the same (or similar) data cleansing and identity resolution applications to data multiple times increases costs but does not ensure consistency.  Composite data virtualization lets you develop and share common data quality rules so you don’t have to reinvent the wheel with each new requirement.  And by centrally controlling these rules, Composite lets you avoid building governance systems to automate the controls that Composite provides automatically.
  • Data entropy – Multiple copies of the same data lead to more data silos in which the quality of the data continues to degrade, especially when levels of service for consistency and synchronization are not defined or not met.  Composite data virtualization reduces the number of data copies required thereby mitigating data entropy.

By eliminating the root causes, data virtualization enables you to meet your data quality goals more effectively.