A data dictionary is used to catalog and communicate the structure and content of data, and provides meaningful descriptions for individually named data objects.
Data Dictionaries & Metadata
Data dictionary information can be used to fill in Entity & Attribute / Feature Catalog sections of formal metadata. If you are working with data dictionary information within formal metadata, these tools can help.
What's in a Data Dictionary?
Data dictionaries store and communicate metadata about data in a database, a system, or data used by applications. A useful introduction to data dictionaries is provided in this video. Data dictionary contents can vary but typically include some or all of the following:
- A listing of data objects (names and definitions)
- Detailed properties of data elements (data type, size, nullability, optionality, indexes)
- Entity-relationship (ER) and other system-level diagrams
- Reference data (classification and descriptive domains)
- Missing data and quality-indicator codes
- Business rules, such as for validation of a schema or data quality
How Data Dictionaries are Used
- Documentation - provide data structure details for users, developers, and other stakeholders
- Communication - equip users with a common vocabulary and definitions for shared data, data standards, data flow and exchange, and help developers gage impacts of schema changes
- Application Design - help application developers create forms and reports with proper data types and controls, and ensure that navigation is consistent with data relationships
- Systems Analysis - enable analysts to understand overall system design and data flow, and to find where data interact with various processes or components
- Data Integration - clear definitions of data elements provide the contextual understanding needed when deciding how to map one data system to another, or whether to subset, merge, stack, or transform a dataset for a specific use
- Decision Making - assist in planning data collection, project development, and other collaborative efforts
Data Dictionaries are for Sharing
For groups of people working with similar data, having a shared data dictionary facilitates standardization by documenting common data structures and providing the precise vocabulary needed for discussing specific data elements. Shared dictionaries ensure that the meaning, relevance, and quality of data elements are the same for all users. Data dictionaries also provide information needed by those who build systems and applications that support the data. Lastly, if there is a common, vetted, and documented data resource, it is not necessary to produce separate documentation for each implementation.
Examples of Shared USGS Data Dictionaries
Examples of non-USGS Data Dictionaries
Start your data dictionary in the Planning stage and keep it up to date
Plan ahead for storing data at the start of any project by developing a schema or data model as a guide to data requirements. As required and optional data elements are identified, add them to the data dictionary. When data structures change, update the dictionary. Try to use naming conventions appropriate to the system or subject area. The easiest path is to adopt and cite a data standard, thus avoiding the need to provide and manage your own documentation.
The Alaska Science Center Research Data Management Plan [PDF] has excellent examples of a Data Description Form and other forms to capture metadata before, during, and at the end of a project.
Data dictionaries can reveal poorly designed data structures and object naming decisions
For both data reviewers and data users, the data dictionary can reveal potential credibility problems within the data. Poor table organization and object naming can severely limit data understandability and ease-of-use, incomplete data definitions can render an otherwise stellar dataset virtually useless, and failure to keep the dictionary up to date with the actual data structures suggests a lack of data stewardship. Although getting critical feedback about their data may be initially troublesome for some data creators, developing good data design and description habits is worth the effort and ultimately benefits everyone who will use the data.
Learn more about naming conventions and find guides to writing column descriptions at Best Practices for Data Dictionary Definitions and Usage and Captain Obvious' Guide to Column Descriptions - Data Dictionary Best Practices.
Making a Data Dictionary
Most database management systems (DBMS) have built-in, active data dictionaries and can generate documentation as needed (SQL Server, Oracle, mySQL). The same is true when designing data systems using CASE tools (Computer-aided software engineering). The open source Analyzer tool for MS Access can be used to document Access databases and Access-connected data (SQL Server, Oracle, and others). Finally, use the Data Dictionary - Blank Template for manually creating a simple 'data dictionary' in Excel.
For information on creating a data dictionary in a formal metadata file (Entity and Attribute section) refer to the Metadata page.
- Data Acquisition Methods - check the data dictionary when acquiring data from external sources
- Data and File Formats - capture file, table, and field names and properties in a data dictionary
- Data Modeling - gather data requirements and use design standards to help build data dictionaries
- Data Standards - use a standard that includes a fully defined data structure
- Data Templates - use a template for a predefined schema and data dictionary
- Domains - include domains (reference lists, lookup tables) as part of the dictionary information
- Naming Conventions - apply a consistent approach to create meaningful table and field names; consider a similar naming convention for files and folders
- Organize Files and Data - include the name and description of data files in the metadata and associate the file names with tables in the data dictionary
- University of Wisconsin Data Services. Data Dictionaries [Video]. [Link Verified January 24, 2018]
- Northwest Environmental Data-Network. Best Practices for Data Dictionary Definitions and Usage [PDF]. [Link Verified February 8, 2018]
- Encyclopedia Britannica. Dictionary and Kinds of Dictionaries. [Link Verified January 26, 2018]
- What is a SQL Server Data Dictionary. [Link Verified March 7, 2018]
- The Data Dictionary (Oracle). [Link Verified March 7, 2018]
- Computer-aided Software Engineering (CASE). [Link Verified March 8, 2018]
- Analyzing Systems using Data Dictionaries. [Link Verified March 7, 2018]
- 10 Ways Data Dictionary Increases Software Developers Productivity. [Link Verified February 8, 2018]
- Captain Obvious' Guide to Column Descriptions - Data Dictionary Best Practices. [Link Verified February 8, 2018]
- DOI. 2008. Data Quality Management Guide [PDF]. [Link Verified April 24, 2018]
Examples, Tools and Templates
- Entity/Attribute metadata for: Knight, R.R., Cartwright, J.M., and Ladd, D.E., 2016, Streamflow and fish community diversity data for use in developing ecological limit functions for the Cumberland Plateau, northeastern Middle Tennessee and southwestern Kentucky, 2016: U.S. Geological Survey Data Release: http://dx.doi.org/10.5066/F7JH3J83. [Link Verified January 26, 2018]
- JPL, 2008, Planetary Science Data Dictionary, JPL D-7116, Rev. F (Corresponds to Database Build pdscat1r71), https://pds.jpl.nasa.gov/documents/psdd/PSDDmain_1r71.pdf. [Link Verified February 8, 2018]
- National Water Information System (NWIS). Search Criteria and Codes. [Link Verified January 26, 2018]
- USDA, Ag Data Commons Data Submission Manual v1.3. Data Dictionary Blank Template. [Link Verified February 8, 2018]
- 24 Data Dictionary Tools. [Link Verified March 7, 2018.