The quality of the data records in your datasets can be express according to six key criteria, and a useful quality management system will allow you to assess the quality of your data in areas such as below:
♦ Completeness. Apprehensive with missing data, that is, with fields in your dataset that have been left unfilled or whose default values have been missing unchanged. (For example, a date field whose default setting of 01/01/1900 has not been edited.)
♦ Conformity. Disturbed with data values of a similar type that have been entered in a puzzling or impracticable manner, e.g. numerical data that includes or omits a comma separator ($1,000 versus $1000).
♦ Accuracy. Concerned with the general exactness of the data in a dataset. It is typically verified by comparing the dataset with a reliable reference source, for example, dictionary files contain product reference data.
♦ Consistency. Concerned with the occurrence of dissimilar types of data record in a dataset created for a single data type, e.g. the combination of personal and business information in a dataset intended for business data only.
For Informatica Data Quality Online Training Contact info@VirtualNuggets(dot)Com
♦ Integrity. Concerned with the recognition of significant associations among records in a dataset. For example, a dataset may contain records for two or more individuals in a household but provide no means for the organization to recognize or use this information.
♦ Duplication. Concerned with data records that duplicate one another’s information, that is, with identifying outmoded records in the data set.
(The list above is not absolute; the characteristics above are sometimes described with other terminology, such as redundancy or timeliness.)
The accuracy factor differs from the other five factors in the following respect: whereas (for example) a pair of duplicate records may be visible to the naked eye, it can be very difficult to tell simply by “eye-balling” if a given data record is incorrect. Data Quality’s capabilities include difficult tools for recognize and resolving cases of data inaccuracy.
The data quality issues above relate not simply to poor-quality data, but also to data whose value is not being maximized. For example, duplicate householder information may not require any amendment per se — but it may indicate potentially profitable or cost-saving relationships among consumers or product lines that your organization is not exploiting.
Every organization’s data needs are different, and the prevalence and relative priority of data quality issues will differ from one organization and one project to the next. These six characteristics represent a critically useful method for measuring data quality and resolving the issues that prevent your organization from maximizing the potential of its data.