Metadata describe information about a dataset, such that a dataset can be understood, re-used, and integrated with other datasets. Information described in a metadata record includes where the data were collected, who is responsible for the dataset, why the dataset was created, and how the data are organized. Metadata generally follow a standard format, making it easier to compare datasets and to transfer files electronically.
Why Do We Need Metadata?
- Data are not complete without a metadata record.
- Use metadata to understand and re-use data.
- Document everything about the data in the metadata record.
- Use mandated Federal metadata standards and tools to create metadata.
- Validate metadata to ensure they follow metadata standards.
- Share metadata with catalogs to improve discovery and access to the data.
- Metadata are an important component of a USGS data release.
Metadata are crucial for any potential use or reuse of data; no one can responsibly re-use or interpret data without accompanying metadata that explains how the dataset was created, why, where it is geographically located, and details about the structure and meaning of the data.
There are many uses for metadata, even beyond the simple discovery of datasets. Metadata can be used for understanding data, analysis and synthesis, maintaining longevity of a dataset for an organization, tracking the progress of a research project, and demonstrating the return on investment for research at an institution.
For more information about metadata as it pertains to the USGS data release process, visit Metadata for Scientific Data FAQs.
How to create a metadata record:
- Getting started
- Creating metadata records
- Validating metadata records
- My metadata is created, what’s next?
1. Getting started
Gather content for the metadata record
- Understand what goes into a metadata record (e.g. title, abstract, methods, keywords, etc.).
- Use the Metadata Questionnaire [PDF] or Metadata in Plain Language to gather content for building a metadata record or use metadata creation tools which will ask you the same questions about your data.
What does a metadata record look like?
Federal agencies are mandated by Executive Order 12906 to use metadata standards endorsed by the Federal Geographic Data Committee (FGDC) below:
Both FGDC-CSDGM and ISO require metadata to be formatted in Extensible Markup Language (.xml) although a stylesheet can be applied over the XML to make it easier to read. Learn more about XML for Advanced Users.
Examples of metadata records in FGDC-CSDGM for different types of information products. View the metadata record in its native XML code or with a stylesheet applied to be easier to read.
An example of a metadata record in ISO 19115-2. Please note that it may contain only certain sections of the ISO standard.
- Alaska Data Integration Working Group (ADIwg) [XML]
2. Creating metadata records
The following free tools create or edit FGDC CSDGM metadata in XML. For a wider selection of tools see the FGDC Metadata Tools. For a list of tools for the ISO metadata standard, refer to the FGDC ISO Metadata Editor Review.
- USGS Online Metadata Editor (OME) - An online form for USGS staff to create FGDC-CSDGM by answering simple questions about your data. Best for biological and non-biological datasets. Login to start new records or upload and edit existing ones. Save completed or ongoing records for later or download directly to your computer.
- USGS MetadataWizard - A Python toolbox in Esri ArcGIS Desktop for creating FGDC-CSDGM metadata for geospatial datasets. The tool ingests geospatial files and through a semi-automated workflow, creates and updates metadata records in Esri’s 10.x software. Best for geospatial data (e.g. raster and shapefiles) and tabular data (e.g. Esri geodatabase or database file). Comma separated value files can be used but must first be converted into Esri formats.
- USGS MetadataWizard 2.x - a cross-platform, desktop application modeled off of the original MetadataWizard to create CSDGM metadata. This version of the MetadataWizard does not have Esri dependencies and provides support for additional tabular data file formats.
- USGS TKME - A Windows platform tool for creating FGDC-CSDGM which can be configured for Biological Data Profile and other extensions. The software program is closely aligned with the Metadata Parser, and can be configured for French and Spanish.
- USDA Metavist - A desktop metadata editor for creating FGDC-CSDGM for geospatial metadata. Includes the Biological Data Profile (version 1.6). Produced and maintained by the USDA Forest Service. Download the USGS Alaska Science Center (ASC) Metavist User Guide [PDF] to learn more about the tool and ASC best practices for authors.
- Microsoft XML Notepad - A simple intuitive user interface for browsing and editing XML files. Does not automatically produce FGDC-CSDGM records but allows easy editing and validating of existing metadata records. See Advanced Users to learn how to configure this tool.
- Gather all information together, especially if multiple people have information that you need.
- Use information that is already developed.
- Re-use text from grant or funding proposals (e.g. abstract, purpose, date, etc.).
- Reference the data dictionary that was used during data collection and processing to complete the Entity & Attribute section of a CSDGM metadata record.
- Choose a descriptive title for your dataset that incorporates who, what, where, why, and scale.
- Example: Greater Yellowstone Rivers from 1:126,700 U.S. Forest Service Visitor Maps between 1961-1983
- Choose keywords wisely: Consider all of the possible interpretations of your word choices and use a thesaurus to add descriptive terms you may not have otherwise selected.
- Placement of the DOI for the dataset in a CSDGM metadata record
- The DOI should go in the primary <onlink> in the Citation Information section.
- Make sure that the format of the DOI is a URL, (not of the format doi:10.5066/ABCD123). Your DOI should be entered in the format https://doi.org/10.5066/ABCD123. If your DOI is not entered as a URL, your metadata record will be rejected by catalogs such as the USGS Science Data Catalog and Data.gov
- Placement of the DOI for the related publication in a CSDGM record
- The related publication is usually cited as a Larger Work Citation in the metadata. The Larger Work Citation has its own <onlink> field, and this is the correct location for the publication's DOI.
- Make sure that the format of the DOI is a URL, (not of the format doi:10.3133/ABCD123). Your DOI should be entered in the format https://doi.org/10.3133/ABCD123. If your DOI is not entered as a URL, your metadata record will be rejected by catalogs such as the USGS Science Data Catalog and Data.gov.
- Include as many details as you can in the metadata record for future users of the data.
- Review your metadata for completeness and accuracy.
- Ask someone unfamiliar with the project to review your metadata objectively.
- Check for clarity and omissions.
- Use the best practices described in the Systems Level Applications or Collections [PDF] for large data systems or when describing "collections" of datasets.
3. Validating metadata records
You must validate metadata to ensure it has been created properly and all required elements have been filled in. Validation compares the metadata standard to the XML metadata record to ensure it conforms to the structure of the standard. See best practices for Checking Metadata with Data [PDF] with FGDC-CSDGM metadata. Many metadata creation and editing tools (such as OME and MetadataWizard) validate automatically so a second validation may not be necessary.
- USGS Metadata Parser – A tool that validates XML metadata records against the FGDC-CSDGM standard and generates error reports if any. Good for geospatial and non-geospatial datasets. Users can view XML metadata records in easy-to-read formats (html, text). It is multilingual (English, French and Spanish) and can be configured for the Biological Data Profile and other extensions. For advanced users, learn how to Run MP from the Command Line window [PDF].
- Microsoft XML Notepad – The tool offers the ability to validate records but requires a schema package. See Advanced Users to validate metadata.
4. My metadata is created, what’s next?
- USGS policy requires a formal review of the data and metadata if intended as a USGS data release.
- Package your data and metadata together whenever possible since the metadata record is critical to understanding the data.
- Work with your organization to identify how metadata should be shared or visit Publish and Share for more information. Sharing metadata improves discoverability, access, and reuse of the data. The USGS Science Data Catalog is the approved mechanism for serving USGS metadata to data.doi.gov, data.gov, and geoplatform.gov, etc.
Microsoft XML Notepad - An XML editor that can help create and edit metadata records directly in XML code. The software is free to download but only available for PC systems.
- Instructions for using XML Notepad [PDF]
- Sample Starter Template [XML] - A starter metadata record that can be filled in with content.
- MetadataWizard Stylesheet [XSL]: Use XML Notepad to display metadata in an easy to read form with the stylesheet. See Section 5 of the PDF, "Instructions for using XML Notepad."
- Find and Correct Errors: Use a schema package to ensure the metadata record is correct according to the FGDC-CSDGM standard. Once downloaded, schemas must be reconfigured in XML Notepad to point to the file location of the schema on your local computer. While the schemas help identify some errors, you must use a validation tool for the final metadata record.
EML to CSDGM-BDP Transform [XSL] - This transform file can transform metadata in the Ecological Metadata Language (EML) standard to FGDC-CSDGM Biological Data Profile. After transformation, validate the metadata record and check to ensure content was adequate transferred.
What the U.S. Geological Survey Manual Requires:
The USGS Survey Manual chapter SM 502.7 Fundamental Science Practices: Metadata for USGS Scientific Information Products including Data provides metadata requirements for USGS scientific information products and scientific data that are Bureau-approved for release.
SM 502.7 further specifies metadata must accompany all USGS scientific data and other information products. Metadata records are to be developed in a standardized way that enables users to understand the context and to evaluate the usefulness of the data or information product. Metadata records for scientific data must comply with standards such as the FGDC Content Standard for Digital Geospatial Metadata, the International Organization for Standardization suite of standards, or other USGS endorsed FCDC standards. A minimum of one metadata review by a qualified reviewer is required for all USGS scientific data and other information products approved for release.
The USGS Survey Manual chapter SM 502.8 Fundamental Science Practices: Review and Approval of Scientific Data for Release discusses when metadata requirements apply for release of scientific data.
SM 502.8 further specifies scientific data approved for release must comply with the metadata requirements as described in SM 502.7, and the metadata must be deposited in and shared through the USGS Science Data Catalog. Reviews of the data and the associated metadata are required, and these reviews must be documented in the internal USGS Information Product Data System (IPDS).
For additional guidance, please refer to the Fundamental Science Practices FAQ: Metadata for USGS Scientific Data.
- Chatfield, T., Selbach, R. February, 2011. Data Management for Data Stewards. Data Management Training Workshop. Bureau of Land Management (BLM).
- DataONE education modules. Accessed June 13, 2012.