Data Management: Manage Quality
Data-quality management is a process where protocols and methods are employed to ensure that data are properly collected, handled, processed, used, and maintained at all stages of the scientific data lifecycle.
QA and QC
- QA refers to defect prevention, whereas QC refers to defect detection and repair; a 'defect' in the Manage Quality context is any data issue that negatively affects quality or fitness for use.
- Quality Assurance Plans (QAPs) are used to set data-quality objectives and establish criteria for validation, including training, methods, equipment and software, data structure requirements, and individual value assessments.
- Use well-documented methods for data acquisition and set quality criteria for all data.
- BEFORE data collection begins, develop the data schema that defines the structure and properties of the data that will be captured or entered, edited, and stored.
- Implement data domains as lookup tables to standardize acceptable values.
- Perform periodic data assessments during the project cycle to discover errors prior to project completion.
- Data quality indicators should be stored with the data values, in separate fields, to allow potential data users to determine which values are fit for specific uses.
- Document quality management in quality assurance or data management plans for the project, as well as in the metadata record for the data.
The widely used acronyms QA (Quality Assurance) and QC (Quality Control) are often used interchangeably, but they mean very different things. QA refers to defect prevention, whereas QC refers to defect detection. In a data context, a 'defect' is any data issue that negatively affects fitness for use, such as a numeric value error, incorrect classification term, gaps in data series, or failed data transformations. Generally, QA is considered and applied before and during data collection or acquisition, whereas QC is applied after the data are in hand.
Quality Assurance Plans
Yes, you can plan ahead for high-quality data! A Quality Assurance Plan (QAP) is used to define the criteria and processes that will ensure and verify that data meet specific data-quality objectives throughout the Data Lifecycle. Some agencies and organizations require a QAP as part of a research proposal, before funding a project (for example, USEPA). Like the DMP, the QAP (if a separate document) would be revised as needed during a project timeline to reflect the reality of the data workflow and activities.
Quality Assurance Plans: Recommended Practices and Examples
How does a QAP compare with a DMP?
Quality Assurance (QA) - Preventing Data Issues
Preventing the creation of defective data is the most effective means of ensuring the ultimate quality of your data products and the research that depends upon that data. QA refers to utilizing written criteria, methods and processes that will ensure the production of data that meet a specified quality standard.
Quality Assurance (QA) - Preventing Data Issues: Recommended Practices and Examples
Quality by Design
Having a plan for how to store, enter, edit, and manipulate data BEFORE data collection will save time and directly affect your ability to use those data. By starting with a conceptual design (or schema) of the data you can ensure that you have considered all of the data you intend to store, the data types they represent, the relationships between different chunks of data, and the data domains that will support the primary data you collect.
Quality by Design: Recommended Practices and Design Constraint Examples
Domain Management and Reference Data
Terms used to classify or describe data elements can help or hurt the usefulness of the dataset. Data domains and Reference data are often implemented as lookup tables or drop-down boxes on forms and define the allowable values for an attribute. Terms that are descriptive (such as color and size) are relative, whereas terms that are used for classification are more discrete (ecoregion, land use category).
Domain Management: Recommended Practices and Examples
Quality Control (QC) - Detecting and Repairing Data Issues
Quality control (QC) of data refers to the application of methods or processes that determine whether data meet overall quality goals and defined quality criteria for individual values. In order to determine whether data are 'good' or 'bad' - or to what degree they are so - one must have a set of quality goals and specific criteria against which data are evaluated. Rapid data scanning methods can be used to tag records or sets of records that meet or fail to meet a particular criterion. Remember that QC is a partner to QA, because when errors are found, a way to prevent them via QA might also be revealed.
Quality Control (QC) - Detecting and Repairing Data Issues: Recommended Practices and Examples
Data Quality Assessment and Review
Project staff should perform periodic data-assessments during the project cycle to discover errors prior to project completion. These reviews do not need to be overly complicated, but instead serve as an opportunity to keep your data management plan, quality goals and metrics, and metadata up to date, and to generate documentation about adherence to your quality plan. Data from outside sources need to be assessed for quality issues prior to use. Real-time and streaming data processes include some level of quality control.
Data Quality Assessment and Review: Recommended Practices and References
Using Data Quality Indicators
The quality of individual measurement or observation data should not be hidden in metadata or documentation associated with a dataset. Rather, indicators of quality or usability can and should be stored with the data themselves in separate fields or columns. That allows potential data users to avoid validating unusual data that have already been justified, and to determine which values are fit for specific uses.
Using Data Quality Indicators: Examples
Documenting Data Quality
Describing your data, like managing quality, is a cross-cutting element of the USGS Science Data Lifecycle. In addition to using data quality indicators within your dataset, quality-management documentation may take the form of a QAP or sections within the DMP about specific quality goals and criteria, along with any quality assessment summaries and notes on massaging data to meet the content needs of your project. The FGDC metadata standard includes sections specifically reserved for Data Quality Information.
Documenting Data Quality: Considerations
Responsibilities for Data and Information Quality
Responsibilities for quality work and work products are reflected within the Code of Conduct for Department of Interior staff (poster), specifically to ensure the highest level of data quality in scientific and scholarly information products:
"I will be responsible for the quality of the data I use or create and the integrity of the conclusions, interpretations, and applications I make. I will adhere to appropriate quality assurance and quality control standards, and not withhold information because it might not support the conclusions, interpretations, and applications I make."
As stated in the USGS Information Quality Guidelines:
"The USGS provides unbiased, objective scientific information upon which other entities may base judgments. Since its inception in 1879, the USGS has maintained comprehensive internal and external procedures for ensuring the quality, objectivity, utility, and integrity of data, analyses, and scientific conclusions. ... Information Quality ... covers all information produced by the USGS in any medium, including data sets, web pages, maps, audiovisual presentations in USGS-published information products, or in publications of outside entities."
What the U.S. Geological Survey Manual Requires:
General Policies that apply to Data Quality within the USGS [Links Verified November 30, 2017]
USGS Fundamental Science Practices [Links Verified November 30, 2017]
- Dilbert on Data Quality: Scott Adams offers serious insight into TQM [Link Verified November 30, 2017]
- Chapman, A.D., 2005, Principles of Data Quality, version 1.0 ( pdf)
- Helsel, D.R. and R. M. Hirsch, 2002, Statistical Methods in Water Resources Techniques of Water Resources Investigations, Book 4, chapter A3. U.S. Geological Survey. 522 pages. [Link Verified November 30, 2017]
- DataONE education modules. [Link Verified July 17, 2017]
- Hook, Les A., Suresh K. Santhana Vannan, Tammy W. Beaty, Robert B. Cook, and Bruce E. Wilson. 2010. Best Practices for Preparing Environmental Data Sets to Share and Archive. Available online from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A. [Link Verified July 17, 2017]
- A. D. Chapman, "Principles of Data Quality: Report for the Global Biodiversity Information Facility" (Global Biodiversity Information Facility, Copenhagen, 2004). [Link Verified July 17, 2017]