Community for Data Integration Data Management Working Group (CDI DMWG)

The U.S. Geological Survey produces a vast number of valuable datasets every year used to advance science. Thousands of scientists in every science strategy work to develop, analyze, and publish papers on data collected by the Survey. However, the lifecycle of a dataset does not end with a given scientist or project. The ability to integrate multiple datasets for analysis and reuse expands the reasons for which a single dataset was originally collected.

Data collection and analysis is only part of the foundation of science. Data integration is another key component needed to answer more complicated questions in science. However, before data integration can be undertaken, it requires the data to meet certain standards that define the data lifecycle.

There is an underlying assumption in USGS that the majority of data are available and poised for integration. This is simply not the case for most data. In most offices and programs, scientists and managers lack guidelines and standards to help ensure that relevant and critical documentation is collected before, during, and after data are collected.

Scientists spend needless time and money reproducing datasets that have already been collected, because they are unable to locate pre-existing collections. Historical analyses are unable to be conducted because relevant datasets are missing necessary contextual information. In addition, the USGS lacks critical measures that oblige the scientists, who work for the public sector, to make datasets available.

In the current business model, it is difficult to find data within the Survey, much less to access and understand them. The promotion process for research grade scientists emphasizes publishing, yet overlooks the critical notion that the data are of enormous value and should be preserved, described, and made available.

Good data management is a prerequisite for data integration, and the CDI DMWG will develop mechanisms for incorporating data management into USGS science and develop ways to educate scientists of its value. The group seeks to elevate the practice of data management so that it is seen as a critical partner in the pursuit of science in USGS.