International Metadata Standards and Enterprise Data Quality Metadata Systems
Speaker
Ted Habermann
The HDF Group
Well-documented data quality is critical in situations where scientists and decision-makers need to combine multiple datasets from different disciplines and instrumentation to address scientific questions or difficult decisions. Standardized data quality metadata could be very helpful in these situations. Many efforts at developing data quality standards falter because of the diversity of approaches to measuring and reporting data quality. The “one size fits all” paradigm does not generally work well in this situation.
The ISO data quality standard (ISO 19157) was recently endorsed by the U.S. Federal Geographic Data Committee. Rather than seeking to align different quality measurement systems (a daunting task), the standard focuses on systematically describing how data quality is measured. ISO 19157 also introduces the idea of standard data quality measures that can be well documented in a shared repository and used for consistently describing how data quality is measured across an enterprise. The standard includes recommendations for properties of these measures that include unique identifiers, references, illustrations and examples. Metadata records can reference these measures using the unique identifier and reuse them along with details (and references) that describe how the measure was applied to a particular dataset.
A second important new feature of ISO 19157 is the inclusion of citations to existing papers or reports that describe quality of a dataset. This capability allows users to find this information in a single location, i.e. the dataset metadata, rather than searching the web or other catalogs. This presentation will describe these and other capabilities of ISO 19157 with examples of how they can be used to describe data quality and also compare these approaches with other standards.