FSP FAQs: Release of Scientific Data
Updated February 2018
| FSP FAQs Home | General FAQs | Release of Scientific Data FAQs | Metadata for Scientific Data FAQs | Data Management Planning FAQs |
Note: Terms used in these FAQs will be referred to by their acronyms (in parentheses) as follows: U.S. Geological Survey (USGS), Survey Manual (SM), Office of Science Quality and Integrity (OSQI), Fundamental Science Practices (FSP), Bureau Approving Officials (BAOs), Office of Science and Technology Policy (OSTP), Office of Management and Budget (OMB), Information Product Data System (IPDS), National Water Information System (NWIS), Freedom of Information Act (FOIA), Data Management Plan (DMP).
These frequently asked questions (FAQs) supplement SM 502.8 and apply to USGS scientific data only. Additional guidance and information on release of data products are available at https://www.usgs.gov/datamanagement/share/datarelease.php.
Updates and additions to the FAQs will be posted as they occur (month/year). Questions about FSP policies and procedures that are not addressed here should be directed to the FSP Advisory Committee or a BAO in the OSQI.
- What are the review and approval requirements for releasing scientific data to the public?
- What Federal Government policies require the release of scientific data and how does the USGS intend to meet these requirements?
- How do I cite and reference the data supporting my publication? (Revised February 2018)
- What is a USGS author's obligation when data collected by an outside source are used (with permission) in a USGS information product and have not been publicly released by the data collector, and who is responsible for releasing the data?
- If a non-USGS lead author does not release data collected by using Federal funds, is the USGS coauthor responsible for providing public access to those data?
- Who owns the data collected during research or produced as an information product on behalf of the USGS?
- What are some examples of a USGS dataset and a database?
- What outlets are available for releasing data?
- Can I login and enter data into the USGS ScienceBase before those data have been approved for release?
- How are raw data handled?
- What about using non-Federal data repositories to provide the OSTP/OMB required public access to my data?
- Where do I get a Digital Object Identifier (DOI) for USGS data that will be released?
- What are the policy requirements for USGS authors regarding use of "unpublished data" or "personal communication" (including written, oral or verbal communication) when citing data used to support scholarly publications?
- Where can I find additional guidance related to releasing USGS scientific data?
1. What are the review and approval requirements for releasing scientific data to the public?
Data intended for public release are subject to USGS FSP review, approval, and release requirements. These requirements include one data review and one metadata review followed by Bureau approval documented in the IPDS as described in SM 502.8. Data are never placed in the IPDS—only the documentation of the required metadata review and data review and any necessary reconciliation are placed in the IPDS as part of the approval package. Data are approved for release by Science Center Directors or their designees. USGS scientific data are considered noninterpretive; however, the scholarly publications associated with the data that describe the process used to create data, if interpretive and previously unpublished, must be peer reviewed and are approved by BAOs in the OSQI (refer to SM 205.18). Additional information about USGS scientific data is available at https://www2.usgs.gov/fsp/interpretive_definitions_and_examples.asp.
2. What Federal Government policies require the release of scientific data and how does the USGS intend to meet these requirements?
The OSTP's February 22, 2013, memorandum Increasing Access to the Results of Federally Funded Scientific Research requires that all Federal Government agencies with a research budget greater than $100 million must develop and implement a plan to support increased public access to the results of federally funded research. Agencies must ensure that the public can read, download, and analyze, in digital form, final peer-reviewed manuscripts or final published documents within a timeframe that is appropriate for each type of research conducted or sponsored by the agency. Further, OSTP requires digitally formatted scientific data resulting from unclassified research supported wholly or in part by Federal funding to be stored and publicly accessible to search, retrieve, and analyze. Also refer to SM 502.8, which describes the requirements for review and approval of USGS scientific data prior to release. Additionally, the OMB's May 9, 2013, memorandum M-13-13, Open Data Policy—Managing Information as an Asset requires agencies to collect or create information in a way that supports downstream information processing and dissemination activities, including using machine-readable and open formats, data standards, and common core and extensible metadata for all new information creation and collection efforts.
The Bureau's Web page Public Access to Results of Federally Funded Research at the U.S. Geological Survey provides information related to how the Bureau intends to meet these OSTP and OMB requirements and includes a link to the USGS Public Access Plan. The USGS Public Access Plan requires that digital data, upon which scholarly conclusions in USGS funded publications are based, be made available no later than the time of publication of those scholarly conclusions in conformance with applicable USGS Data Management policies (https://www2.usgs.gov/datamanagement/policyreferences.php).
3. How do I cite and reference the data supporting my publication?
Include one or more introductory statements and an in-text citation in the body section of the publication and a complete bibliographic reference for the data source in the references section of the publication. See detailed guidance for “Data Associated with a Publication” and see example data citations on the USGS Data Management website in the section titled "Citing Your Data."
4. What is a USGS author's obligation when data collected by an outside source are used (with permission) in a USGS information product and have not been publicly released by the data collector, and who is responsible for releasing the data?
Data used in USGS science information products should be made widely available to help ensure the accuracy, validity, and reproducibility of the scientific results. USGS scientists must ensure that the data associated with their research have proper acknowledgment regarding how the data were collected, where the data will reside, who will release the data, and how the data will be released. This information about data release must be described in data management plans (DMPs) that are included in associated research project plans (refer to guidance on developing DMPs). If the party collecting the data is another Federal agency, that agency has the primary responsibility for releasing the data according to their specific requirements. Refer to https://www2.usgs.gov/fsp/guide_to_datareleases.asp#publishing for six scenarios that describe data release obligations according to roles of USGS scientists and project funding arrangements.
Provisions for handling proprietary data and information, that is, data that cannot be released to the public for specific reasons, are found in SM 502.5. The author must ensure discussion about data release takes place with the data collector prior to signing any cooperative or collaborative agreement to use proprietary data, and decisions about these data releases should also be reflected in the DMP.
Data that are part of a USGS science data information product or used in interpretive work are subject to Freedom of Information Act (FOIA) requests. If the data are considered Federal records, we must comply with requirements related to responding to FOIA requests (refer to https://www2.usgs.gov/foia/). Contact the USGS FOIA Officer for additional guidance.
5. If a non-USGS lead author does not release data collected by using Federal funds, is the USGS coauthor responsible for providing public access to those data?
The OSTP and OMB requirements apply to data collected by using Federal funds. Regardless of authorship, if the research was federally funded, then the funding agency is responsible for providing public access to those data. If the research is not federally funded, then the non-USGS lead author has discretion in releasing the data to the public. The “new normal” throughout the majority of the scientific publishing community, however, is to release the data upon which scholarly conclusions are based. Major publishers including Science, Nature, American Geophysical Union, Elsevier, and Wiley require access to the data upon which scholarly conclusions are based as a condition for publication.
6. Who owns the data collected during research or produced as an information product on behalf of the USGS?
Data collected on behalf of the USGS or by using USGS funds belong to the USGS and not to the individual who collected the data (for example, a USGS employee, student, emeritus or other volunteer, or contractor). If a USGS employee is under contract with a cooperator to collect data funded by the cooperator, the DMP should specify the data ownership, the distribution rights for USGS use of the data, the data preservation responsibilities, and the party responsible for providing the data to the public (refer to SM 502.6 and to guidance on developing DMPs).
7. What are some examples of a dataset and a database?
Aggregated data received from an analytical laboratory for field samples or measurements made directly during fieldwork are both examples of datasets. If a number of datasets are combined together into a searchable product or defined system, this product or system is an example of a database regardless of whether a formal database management system is used. A geologic map has a geospatial dataset, and when this dataset is combined with other regional datasets, the result is another example of a database. The National Water Information System (NWIS) is a database. Data retrieved from NWIS (such as a table of data) are a dataset.
8. What outlets are available for releasing data?
The preferred path for USGS data release is through USGS data repositories or portals, such as ScienceBase, NWIS, or Biodata, or via USGS Science Center Web pages as long as those Web pages reside on a trusted digital repository at the Science Center. The goal is to ensure that the USGS maintains the authoritative copy of the data it releases. The USGS has guidance available on acceptable digital repositories for releasing USGS data at https://www2.usgs.gov/fsp/acceptable_repositories_digital_assets.asp. This guidance includes a list of repositories that will be updated as additional repositories are deemed acceptable.
9. Can I login and enter data into the USGS ScienceBase before those data have been approved for release?
Yes. Under the manage option in ScienceBase, users can enter data and keep the data private, that is, available only to them and others within the USGS. Data can be added to ScienceBase at any time and can be used as a resource during the data review process prior to release. Once the data are approved for release, the same manage option can be used to make the data public.
10. How are raw data handled?
"Raw data" refers to digital and nondigital data that are unprocessed and unverified. Examples include field observations and unaltered output from sensors. Retention of raw data is important in support of reproducible science and for recovering from processing errors. Raw data must be archived according to the USGS records disposition schedule and can be released as either provisional or approved data according to the USGS policy on data release (SM 502.8). Raw data may also be subject to FOIA requirements. In the event such data are requested, contact the USGS FOIA Officer for additional guidance.
11. What about using non-Federal data repositories to provide or host the required public access to my data?
Use of Non-Federal repositories is acceptable as described at https://www2.usgs.gov/fsp/acceptable_repositories_digital_assets.asp. The authoritative copy of the data, however, must be hosted on USGS servers (refer to FAQ 8 above) or a federally maintained data service (refer to SM 502.9). For established agreements with the USGS, these arrangements, including a hosting agreement, need to be clearly spelled out in the DMP. In all cases, a metadata record, as described in SM 502.7, must be included in the USGS Science Data Catalog that includes a Digital Object Identifier (DOI) link back to the data source, regardless of where the data reside or are hosted.
12. Where do I get a Digital Object Identifier (DOI) for USGS data that will be released?
For specific guidance on DOIs, refer to https://www2.usgs.gov/datamanagement/preserve/persistentIDs.php.
13. What are the policy requirements for USGS authors regarding use of "unpublished data" or "personal communication" (including written, oral or verbal communication) when citing data used to support scholarly publications?
In accordance with the USGS Public Access Plan, effective October 1, 2016, all supporting digital research data approved for release for final accepted manuscripts or final publications must be freely available for public access at the same time as or before the official publication date. Exceptions are allowed for special circumstances such as location data for endangered species or location data pertaining to homeland security or privacy issues as well as data mentioned in the text but not used as a basis for the conclusions. Thus, the use of "unpublished data" or "personal (written, oral, or verbal) communication" will no longer be permitted for in-text citations when USGS authors refer to data used to support the results and conclusions in their scholarly publications. However, citations referring to unpublished data are allowed when the citation refers to examples that support or contradict findings but are not imperative to the results and conclusions of the publication. Citations referring to written communications are allowed to identify the source of data included in a data table that is part of a publication or in a data release associated with a publication. By doing this, the author is identifying where and how the data originated (name, affiliation, written commun., date).
14. Where can I find additional guidance related to releasing USGS scientific data
Additional guidance is available on the USGS Data Management Web page at http://www.usgs.gov/datamanagement and the FSP Web page at https://www2.usgs.gov/fsp/guide_to_datareleases.asp.