Skip to Main Content
Link to Clark Family Library

Data Science (DS) Guide

This guide will introduce you to data resources, as well as how to cite data, work with data, and ask questions about data.

Why Do We Cite Data?

The National Institutes of Health issued the Data Management and Sharing policy  (effective January 25, 2023) to promote the sharing of Scientific data: https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html.  Data Citation encourages collaboration to help researchers find and use one-another's datasets.  For data users, Data citation provides evidence that allows the reproducibility of research and allows others to easily locate and access this data.  It also promotes transparency and encourages the creation of high-quality datasets.  For data producers, it provides the creators of datasets appropriate credit.  Finally, data citation allows for the tracking and measuring of data, providing a comprehensive view of its influence and importance.

How to Cite Data

The format for Data Citation depends on which style guide the publisher is following.  While different style guides and publications have their own formats for Data Citation, the following components are usually required:

  • Author(s): Responsible Party for the Data
  • Date of Publication
  • Title of Dataset
  • Version, when appropriate
  • Publisher or Repository
  • Persistent Locator/Identifier (i.e. DOI)
  • Date Accessed, when appropriate

Hint: If you have a DOI, you can use a CrossCite DOI data citation formatter or the DataCite citation formatter to create citations corresponding to a variety of citation styles

Most data repositories will provide a suggested citation for their datasets. Some will also request that you cite the related publication(s) along with the data. Follow the most appropriate format while meeting the requirements of the data creators and repositories.

This video covers tips for citing datasets in your projects.

Attributing and Citing Code

Coding is a very collaborative process, and the programming community is widely accepting of collaboration and reusing code to expand upon projects.  However, it is important that you attribute code that you reference as you work through your problems, projects, and cite and any code that you might reuse from other programmers.  A code citation should include the following fields:

  1. Author or Creator (The entity or entities responsible for creating and maintaining the code)
  2. Date of Publication (The date the code was first published or released to the public)
  3. Title (The title of the code or software package or a brief description of the code if a title is missing)
  4. Publisher (Entity responsible for hosting the code)
  5. URL/DOI (Where on the web the code can be found)

 The most transparent method of doing that in your own code is to use comments explaining your process and explicitly referencing attributions/citations when appropriate.

Citation Styles