Skip to main content

Publishing and Sharing Data

Why Should I make my Data Public?

Increasingly, funders are requiring that the data underlying a publication be made publicly available. Making data available is also a good part of the research lifecycle regardless of funding requirements. Most venues for making data public will give the researcher a persistent identifier such as a Digital Object Identifier (DOI) for use in a citation and to help the researcher track the impact of the data. Published data can also be cited and can be included on your CV.

Data Repositories

Selecting a Repository

A data repository is a long-term storage platform that is carefully stewarded to ensure preservation and/or access to the data. Before you begin the process of selecting a specific data repository, be sure to determine:

  • If your funder requires you to submit your research to a specific repository
  • If there is an established, well known discipline/subject specific repository for your field or type of research

When selecting a repository, it is strongly recommended that you start the process early, preferably even before you begin your research project. This will allow you to know what will be required when you do begin the curation and deposit process. In particular, if a repository requires researchers to use a specific metadata schema or file type, knowing this at the beginning of your project will allow you to customize your data collection procedures and tools.

When choosing a specific repository, please consider the following questions:

  • What types of data does the repository accept?
  • Do you have to follow a specific deposit procedure? If so, what is it?
  • Do they require researchers use a specific metadata schema?
  • Do they require researchers to convert their files to or use a specific file type?
  • Is there a fee or other cost required to deposit data to the repository?
  • Do they provide a persistent identifier, such as a DOI? (Please see the National Science Foundation’s Dear Colleague Letter: Effective Practices for Data.)
  • How discoverable is the research data one deposited? Are the data sets openly available? Is the repository indexed by Google and other search engines?
  • Do they provide usage information? Download statistics?
  • If appropriate for your data, does the repository provide embargo or other restricted access options?
  • Do they have a rights or intellectual property policy statement? Are you required to transfer any rights in order to deposit your data in the repository?
  • What type of use license or rights statement do they provide along with the data when making it available to researchers?
  • Do they provide assurances for long term preservation and management of the repository?

Finding a Data Repository

If you are not sure which repository to use, re3data.org provides information for over 2000 data repositories. Researchers also welcome to request a meeting with members of the for help in finding an appropriate repository for your data.

Using ScholarWorks

Albertsons Library provides data deposit and publishing services through ScholarWorks, Boise State’s institutional repository. To learn more about depositing your data with the Library, please visit ScholarWorks FAQs page.

Data Citations

Data Citation Principles

In 2013, the Amsterdam Manifesto on Data Citation Principles was published, detailing the concept of data as a citable product of research in 8 short statements.

In short,

  1. Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications.
  2. Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data.
  3. In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited.
  4. A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community. [An example of a persistent identifier might be a DOI or digital object identifier]
  5. Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data.
  6. Unique identifiers, and metadata describing the data, and its disposition, should persist — even beyond the lifespan of the data they describe. [See #4]
  7. Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice, version and/or granular portion of data retrieved subsequently is the same as was originally cited.
  8. Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities

Source: https://www.force11.org/datacitation

Data Set Citing Basics

Basic Items to Include:

  • Author: Name of the individual, group, or organization responsible for the creation of the data set.
  • Title: Name of the data set.
  • Format: Notation that this is a data set as opposed to a journal article, book, or website.
  • Location: City/State of the organization/institution that produced the data set.
  • Date: Year the data set was released/published. Also, in some cases, the date you accessed the data set.
  • Version: If multiple versions of the data set are available, include the version number of the data set you used.
  • Unique Identifier: A unique identifier to link back to the specific data set (examples include: DOI, PURL, repository ID number, etc.).
  • Distributor: Name of the organization/site providing access to the data set.

The order and formatting of these pieces of information will vary according to different citation styles, journal publishers, and data repositories.

DATA SET CITATION EXAMPLES

APA

Format:

Author/Rightsholder. (Year). Title of data set (Version number) [Description of form]. Location: Name of producer.

or

Author/Rightsholder. (Year). Title of data set (Version number) [Description of form]. Retrieved from http://

Example:

Advanced Cooperative Arctic Data and Information Service (ACADIS). (2010). LiDAR (DEM) NIMS grid Barrow, Alaska 2010. [Data set]. Retrieved from:https://www.aoncadis.org/dataset/lidar_dem_nims_grid_barrow_alaska_2010.html

Example:

Zhang, G., Parker, P., Li, B., Li, H., & Wang, J. (2012). The genome of Darwin’s Finch (Geospiza fortis). GigaScience. [Data set]. doi: 10.5524/100040

See more information from the APA Style Blog here: http://blog.apastyle.org/apastyle/2013/12/how-to-cite-a-data-set-in-apa-style.html

Note: Some data sources will provide additional citation information and help.

MLA

Format:

MLA has not yet developed specific rules for dataset citations, so follow the rules for a general website.

Example:

Tweedie, Craig. E., and Steven Oberbauer. Kite Aerial Photography NIMS Grid Barrow, Alaska 2013. (Data set). Barrow, AK: Advanced Cooperative Arctic Data and Information Service, 2010. Web. 7 Apr. 2014. <https://www.aoncadis.org/dataset/kite_aerial_photography_nims_grid_barrow_alaska_2013.html>

Example:

Zhang, G., D. Lambert, and J. Wang. Genomic Data from Adelie Penguin (Pygoscelis adeliae). (Data set). Gigascience, 2011. Web. 17 Apr. 2014. <http://dx.doi.org/10.5524/100006>

Note: Some data sources will provide additional citation information and help.