USGS Data Management

blank space
Manage Quality
U.S. Geological Survey Data Lifecycle Diagram Plan Acquire Process Analyze Preserve Publish/Share Manage Quality Describe (Metadata, Documentation) Backup & Secure The USGS Science Data Lifecycle
U.S. Geological Survey Data Lifecycle Diagram

Data Management: Manage Quality

Data-quality management is a process where protocols and methods are employed to ensure that data are properly collected, handled, processed, used, and maintained at all stages of the scientific data lifecycle.

QA and QC

Key Points

  • QA refers to defect prevention, whereas QC refers to defect detection and repair; a 'defect' in the Manage Quality context is any data issue that negatively affects quality or fitness for use.
  • Quality Assurance Plans (QAPs) are used to set data-quality objectives and establish criteria for validation, including training, methods, equipment and software, data structure requirements, and individual value assessments.
  • Use well-documented methods for data acquisition and set quality criteria for all data.
  • BEFORE data collection begins, develop the data schema that defines the structure and properties of the data that will be captured or entered, edited, and stored.
  • Implement data domains as lookup tables to standardize acceptable values.
  • Perform periodic data assessments during the project cycle to discover errors prior to project completion.
  • Data quality indicators should be stored with the data values, in separate fields, to allow potential data users to determine which values are fit for specific uses.
  • Document quality management in quality assurance or data management plans for the project, as well as in the metadata record for the data.

The widely used acronyms QA (Quality Assurance) and QC (Quality Control) are often used interchangeably, but they mean very different things. QA refers to defect prevention, whereas QC refers to defect detection. In a data context, a 'defect' is any data issue that negatively affects fitness for use, such as a numeric value error, incorrect classification term, gaps in data series, or failed data transformations. Generally, QA is considered and applied before and during data collection or acquisition, whereas QC is applied after the data are in hand.

Quality Assurance Plans

Yes, you can plan ahead for high-quality data! A Quality Assurance Plan (QAP) is used to define the criteria and processes that will ensure and verify that data meet specific data-quality objectives throughout the Data Lifecycle. Some agencies and organizations require a QAP as part of a research proposal, before funding a project (for example, USEPA). Like the DMP, the QAP (if a separate document) would be revised as needed during a project timeline to reflect the reality of the data workflow and activities.

Quality Assurance (QA) - Preventing Data Issues

Preventing the creation of defective data is the most effective means of ensuring the ultimate quality of your data products and the research that depends upon that data. QA refers to utilizing written criteria, methods and processes that will ensure the production of data that meet a specified quality standard.

Quality Control (QC) - Detecting and Repairing Data Issues

Quality control (QC) of data refers to the application of methods or processes that determine whether data meet overall quality goals and defined quality criteria for individual values. In order to determine whether data are 'good' or 'bad' - or to what degree they are so - one must have a set of quality goals and specific criteria against which data are evaluated. Rapid data scanning methods can be used to tag records or sets of records that meet or fail to meet a particular criterion. Remember that QC is a partner to QA, because when errors are found, a way to prevent them via QA might also be revealed.

Documenting Data Quality

Describing your data, like managing quality, is a cross-cutting element of the USGS Science Data Lifecycle. In addition to using data quality indicators within your dataset, quality-management documentation may take the form of a QAP or sections within the DMP about specific quality goals and criteria, along with any quality assessment summaries and notes on massaging data to meet the content needs of your project. The FGDC metadata standard includes sections specifically reserved for Data Quality Information.

Responsibilities for Data and Information Quality

Responsibilities for quality work and work products are reflected within the Code of Conduct for Department of Interior staff (poster), specifically to ensure the highest level of data quality in scientific and scholarly information products:

"I will be responsible for the quality of the data I use or create and the integrity of the conclusions, interpretations, and applications I make. I will adhere to appropriate quality assurance and quality control standards, and not withhold information because it might not support the conclusions, interpretations, and applications I make."

As stated in the USGS Information Quality Guidelines:

"The USGS provides unbiased, objective scientific information upon which other entities may base judgments. Since its inception in 1879, the USGS has maintained comprehensive internal and external procedures for ensuring the quality, objectivity, utility, and integrity of data, analyses, and scientific conclusions. ... Information Quality ... covers all information produced by the USGS in any medium, including data sets, web pages, maps, audiovisual presentations in USGS-published information products, or in publications of outside entities."

What the U.S. Geological Survey Manual Requires:

General Policies that apply to Data Quality within the USGS [Links Verified November 30, 2017]

USGS Fundamental Science Practices [Links Verified November 30, 2017]

Recommended Reading

References