Empowering Earth Science Communities to Share Data Through Guided Metadata Improvement
Speakers
Matthew Jones
NCEAS
Matt directs the Informatics program at NCEAS, which focuses on both supporting efficient synthesis through scientific computing and on building new advanced infrastructure to support data sharing, preservation, analysis, and modeling. Matt is the Director of the DataONE program, a global network of interoperable data repositories, and of the NSF Arctic Data Center. In addition to data infrastructure work at NCEAS, Matt also helps to build the NCEAS Learning Hub through an emphasis on data science and reproducible research teaching.
Matt’s career has focused on improving data science infrastructure to support cross-disciplinary and synthetic science, principally through the development of open source software for data repositories, metadata systems, and reproducible analysis and modeling.
Matt has a M.S. in Zoology from the University of Florida that focused on the ecology of plant-animal interactions, and a B.A. from Dartmouth College.
Lindsay Powers
The HDF Group
Ted Habermann
The HDF Group
Earth Science communities can improve the discoverability, use and understanding of their data by improving the completeness and consistency of their metadata. Despite the potential for a great payoff, resources to invest in this work are often limited. We are working with DataONE Member Repositories to quantitatively evaluate their metadata and to identify specific strategies to improve the completeness and consistency of their metadata. We have developed an iterative, guided process intended to efficiently improve metadata to better serve their own communities, as well as share data across disciplines. The community specific approach focuses on community metadata requirements, and also provides guidance on adding other metadata concepts to expand the effectiveness of metadata for multiple uses, including data discovery, data understanding, and data re-use. The end goal of this work is to help communities improve their metadata based on their own requirements through time.
We will present the results of a baseline analysis of more than 25 diverse metadata collections from established data repositories representing communities across the earth and environmental sciences. The baseline analysis describes the current state of the metadata in these collections and highlights areas for improvement. We compare these collections to demonstrate exemplar practitioners that can provide guidance to other communities.
In addition, we are building web-based tools based on a common metadata evaluation library that can be incorporated into community tools such as metadata editors and repository platforms, as well as form the core of a metadata completeness reporting service that is integrated within specific partner information systems such as the DataONE Coordinating Repository services and the Mercury Online Metadata Editor. This aspect of the project is forthcoming and we will discuss the plans for the future.