USGS Data Management

blank space
Describe / Metadata > Data Citation
U.S. Geological Survey Data Lifecycle Diagram Plan Acquire Process Analyze Preserve Publish/Share Manage Quality Describe (Metadata, Documentation) Backup & Secure The USGS Science Data Lifecycle
U.S. Geological Survey Data Lifecycle Diagram

Data Citation

Data citation refers to the process of citing a dataset in the same way that books or journal articles are referenced in research publications. Historically, the practice of citing a dataset as a source reference in itself has been generally inconsistent or not practiced at all. However, researchers and institutions are beginning to realize the importance of data citation as more data producers are beginning to cite other people's datasets as well as their own datasets. In general data citation is a good practice that benefits the researcher, data repositories and stewards, the scientific community, and the general public.

Why Cite Your Data?

Key Points

  • Data citation is the emerging practice of giving a reference to a specific dataset.
  • Gives credit and accountability to the data producer and the data steward.
  • Reduces the danger of plagiarism.
  • Allows others to easily locate and access the dataset.
  • Increases the chance of discovery and potential reuse of the dataset.
  • Allows the impact of the dataset to be easily tracked through publications.
  • Main components of citing a dataset are the author(s), year, title, archive/distributer, access date, version number, and a persistent identifier or locator.
  • Examples of persistent identifiers are UUID, OID, LSID.
  • Examples of locators are URL addresses, directories, or registered locators (DOIs, ARKS, or Handles, URL, PURL, XRI).

Data citation is important for a number of reasons. First, citing datasets gives the researcher proper credit and serves as recognition of scholarly effort. It also gives credit to data stewards and repositories who manage the data presumably for the long term. Data citation also creates accountability for creators and stewards of the dataset and reduces the danger of plagiarism once the dataset itself has been properly cited.

Second, data citation allows others to more easily locate and access a researcher's dataset for the purposes of replicating or verifying their results, which is good scientific practice. Additionally, easy location and access can facilitate discovery and encourage possible reuse of the dataset.

Lastly, the practice of data citation creates a formalized system of recognition and reward to data producers as a citable contribution to the scientific community. Data citation allows the impact of the dataset to be easily tracked through publications that cite the dataset. This system of citing data formally in publications can increase the transparency of data production as well as encourage the production of more high quality datasets.

Data Citation Standards

In order to cite data properly, several institutions and organizations have created standards for citing datasets. The mechanics of citing datasets are generally similar to the citation of journal articles and other publications. The author(s), year, title, archive/distributer, and access date are the most obvious components of data citation.

However, datasets can be more difficult to cite because they can be more dynamic in terms of content and version. For example, a dataset can consist of multiple versions of the raw data, or it can be part of a larger dataset. The dataset itself can change over time as researchers modify or add more data. Therefore, a dataset needs a persistent identifier or locator that can be added to the citation in order to better track the dataset.

Persistent Identifiers and Locators

Datasets should have an identifier and a locator. A persistent identifier is a unique Web-compatible, alphanumeric code that points to a specific dataset that will be preserved for the long term. The dataset identifier is an identifier of the dataset such as its title, file name, or even an object ID code. Examples of identifiers are UUID (Universally Unique Identifier), OID (Object Identifier), LSID (Life Sciences Identifier).

A dataset locator helps find the location of the dataset. Examples of locators are URL addresses, directories, or registered locators. A registered locator is a unique code that points to the specific dataset that is usually separate from the metadata. Examples of data locators are DOI (Digital Object Identifier), ARK (Archival Resource Key), Handles, URLs, PURL, XRI. See Preserve > Persistent Identifiers for more information.

Best Practices

Example Data Citations for USGS Released Data

Moschetti, M.P., 2017, Database of earthquake ground motions from 3-D simulations on the Salt Lake City of the Wasatch fault zone, Utah: U.S. Geological Survey data release,

McLeod, J.M., Jelks, Howard, Pursifull, Sandra, and Johnson, N.A., 2016, Characterizing the early life history of an imperiled freshwater mussel (Ptychobranchus jonesi): U.S. Geological Survey data release,

Barber, L.B., Weber, A.K., LeBlanc, D.R., Hull, R.B., Sunderland, E.M., and Vecitis, C.D., 2017, Poly- and perfluoroalkyl substances in contaminated groundwater, Cape Cod, Massachusetts, 2014-2015 (ver. 1.1, March 24, 2017): U.S. Geological Survey data release,

Example Data Citation for Non-USGS Data

The following example of a dataset citation is from the Earth Science and Information Partners (ESIP).

Zwally, H.J., R. Schutz, C. Bentley, J. Bufton, T. Herring, J. Minster, J. Spinhirne, and R. Thomas. 2003. GLAS/ICESat L1A Global Altimetry Data V018, 15 October to 18 November 2003. National Snow and Ice Data Center. dataset accessed 2011-07-21 at doi:10.3334/NSIDC/gla01.

  • A typical data citation format:
    • Core required elements of data citation:
      • Author or Principal Investigator
        • The data creator.
      • Release Date/Year of publication
        • The year of release for a completed dataset.
      • Title of data source
        • The formal title, that should generally describe the dataset.
      • Version/Edition number
        • The version of the dataset used in the publication.
      • Archive and/or distributer
        • The organization that manages the data, ideally over a long period of time.
      • Locator/identifier
      • Access date and time
        • An indication of when the data was accessed as data can be changed or modified over time.
    • Other elements that can be included if relevant:
      • Format of the data
      • 3rd party producer
      • Subset of the data used
      • Editor or contributor
      • Publication place
      • Data within a larger work

  • Best Practices to Support Data Citation
    • Assign persistent identifiers with your datasets.
      • If possible, assign a new identifier with each new version of dataset.
    • Use applications that support metadata creation for your dataset.
      • Good metadata associated with a dataset is important for access and potential reuse.
      • Examples of metadata applications:
      • See Describe > Metadata under "Tools" for more information.
    • Use standardized keywords that describe your data
    • Archive the dataset with journal publishers and data repositories during the publication process
    • When citing a dataset in a paper:
      • Use the citation style required by the editor or publisher. If there is no standard, follow a typical format and adapt it to match the style for textual publications.
      • Notify the data repository that holds the dataset so they can add a link to the dataset in your paper.
    • Encourage other data producers to cite their datasets and make their data available for reuse.


  • USGS - Digital Object Identifier Creation Tool
  • Description:
    USGS Core Science Analytics, Synthesis, and Libraries and DOE’s Oak Ridge National Laboratory Mercury Consortium established a Digital Object Identifier service for USGS and Mercury data/metadata projects. The DOI creation tool is offered through the California Digital Library's UC3EZID, which enables a digital object producer to obtain and manage persistent identifiers for their digital content.

Recommended Reading