Fundamental Science Practices (FSP)
Guidance on Documenting Revisions to Data Releases
This guidance describes a formal revision process for datasets and associated metadata that have been released as USGS information products and require change.
Not covered in this guidance are USGS approved databases or web data services for data that are expected to change continually or on a schedule, with additions and updates made over time. Examples of these systems or services include National Water Information System (NWIS), USA National Phenology Network (USA-NPN), and Biodiversity Information Serving Our Nation (BISON). These data products have processes in place for data quality evaluation prior to data being uploaded.
Revision of a data release is warranted, for example, when an error is detected and needs to be corrected (deleted, changed) for future use of the data. When correcting data errors, changes are made to the data only where needed, but no other alterations are made to either the structure or content. Another example case for revision is the release of data in stages in order to meet project timelines, so that the amount of data provided in an information product increases through subsequent versions.
If data are corrected or added as part of the revision, the data release must again be reviewed for quality and accuracy, and the modifications must be documented as described below. For substantial or major revisions (defined below, in the “Version Numbering” section), the review process is documented in the Information Product Data System (IPDS).
The revision process is described below for the following four cases:
Following the process descriptions for these cases, guidance is provided for assigning revision numbers.
If an error is found that is not in the data itself, such as a misspelling in a data header or a site location name, replace or update the erroneous file and update the metadata record and any additional documentation to reflect the update.
If an error is found in the data, the author corrects the data release. If the error is large enough to affect outcomes of future data use, a new data release record is created in the IPDS. A new version of the corrected data connotes that the revised version (as opposed to previous versions) is current. The landing page includes a description of the error and point users to the new version of the data. Previous versions of the data are preserved in case they are needed to understand previous uses, and in accordance with records management disposition schedules and litigation holds requirements. The revision process will result in a new IPDS record, an updated metadata record, updates to the online documentation including a revision history, and a new incremental version number (for example, version 1.1; refer to “Version Numbering” below.). All review requirements apply to the new version. Once revised data are released, the citation is revised to reflect the new incremental version of the data as shown in the example citations below.
If the error could affect existing USGS scientific conclusions, consult your local Bureau Approving Official in the Office of Science Quality and Integrity (https://internal.usgs.gov/fsp/toolbox/approvingofficials.html) for guidance.
Examples of the citation change on data release landing page:
Note that the title and digital object identifier (DOI) do not change but that the citation changes by adding version information. Additionally, the year of the publication may change.
On the landing page of the data release, include text reflecting the revision:
First release: 2012
Additionally, there is a revision history text file available that explains exactly what changed in each revision. (See the revision history file in the example provided below for appending new data.)
The addition of data to released datasets, such as updating a data release with data from a new time period, place, or new field activity, requires most of the same steps as an original data release. In addition to the inclusion of new data, errors in previously released data may also be corrected. The following are required when new data are added: a new IPDS record, updated citation, updated metadata record, updated revision history, and notation on the landing page reflecting the new version.
NOTE: The new IPDS data release record is used to ensure requirements of SM 502.7 and SM 502.8 have been met. A new digital object identifier is not created. That is, the existing DOI should be used for the revised data release.
For an example, see Pendleton, E.A., Ackerman, S.D., Baldwin, W.E., Danforth, W.W., Foster, D.S., Thieler, E.R., and Brothers, L.L., 2016, High-resolution geophysical data collected along the Delmarva Peninsula, 2014, USGS Field Activity 2014-002-FA (ver. 4.0, October 2016): U.S. Geological Survey data release, https://doi.org/10.5066/F7MW2F60.
There are cases in which the data structure is modified to allow the inclusion of new data types through the addition of new tables or fields. The extended structure is then considered a new version. These revisions are appropriate for data releases that are stand-alone research products, rather than the data foundations of scientific reports. In this case, the requirements include a new IPDS record, updated citation, updated metadata record, updated revision history, and text on the landing page reflecting the new version. Changes reflect a new version of the data release (for example, version 2.0, refer to the “Version Numbering” section below.).
NOTE: The new IPDS data release record is used to ensure the requirements of SM 502.7 and SM 502.8 have been met. A new DOI is not created. That is, the existing DOI should be used for the extended data structure.
When data with identified errors are corrected and replaced—for example, as a new incremental version—the version with errors is not publicly offered, but may be available on request, to future users. Because the errored data may have been used to support conclusions in a publication or a policy decision, there may be future consequences; therefore, it is essential to preserve the original data, for example in a dark archive (an offline location for preservation), with errors intact. The filename and accompanying documentation make clear that the data are deprecated. This provides a snapshot in time of the data in terms of provenance, while ensuring that they are not recommended for future use. If size constraints make archiving a full copy impractical, some other process should be provided for making the original data available.
Version numbers consist of two parts, a major and a minor component, separated by a period. In the example “version 1.2,” the number to the left of the period, “1,” is the major component and represents the number of separate major revisions. The number to the right of the period, “2,” is the minor component and represents the number of separate substantial revisions.
The original release is considered version 1.0, although no version annotation is used. Either the major or the minor component of the version number will be incremented when a revision is released. When a major revision is released, the major component increases by one number and the minor component is reset to zero (0). Substantial revisions (see definition below), regardless of how many, do not trigger a change in the major component of the version number. For example, if the data release was revised on seven separate occasions for substantial revisions, the version number will be 1.7.
Minor revisions that are so insignificant that they do not affect the use or interpretation of the data include, but are not limited to correcting misspelled words in data or metadata and improvements in presentation of ancillary information on data landing pages. There is no version numbering system for these types of minor revisions and therefore no need to develop a version history document.
Using a ScienceBase (sciencebase.gov) data release page as an example, a minor revision could involve correcting a misspelled word in the title or in the abstract. In another example, the author may revise one of the contacts listed on the landing page. In other words, the data are not changed in a minor revision.
Action: No new version number required.
Substantial revisions are corrections to the data or metadata that are large enough to affect outcomes of future data use. Such errors typically involve missing or incorrect data values, but could also be missing or unclear annotations in table headings or in metadata records. Substantial revisions might also improve the usability or interpretation of the product content such as a modification in a polygon shapefile, slightly shifting a line so that a western boundary is consistent with another polygon shapefile that was recently released.
An example of a substantial revision would be correcting a geospatial file in which a small number of negative longitudes were entered as positive numbers. The revision would change the incorrect longitude values to negative numbers. In a substantial revision, some of the data are changed.
Action: Create a new minor component number (for example, version 1.0 is changed to version 1.1).
Major revisions include changes in the data structure and updates that add or modify substantial amounts of data. Also included are large corrections to data, for example, correcting a data file in which many data values are consistently incorrect as a result of improper processing.
An example of a major revision is a new release of a bathymetry grid after an error was detected in the processing step that applied tide corrections. In a major revision, the data are significantly and substantially changed.
Action: Create a new major component number (for example, version 1.0 is changed to version 2.0).