Data are any and all of the digital materials that are collected and analyzed in the pursuit of scientific advances. In line with Open Access to research articles themselves, PLOS strongly believes that to best foster scientific progress, the underlying data should be made freely available for researchers to use, wherever this is legal and ethical. Data availability allows replication, reanalysis, new analysis, interpretation, or inclusion into meta-analyses, and facilitates reproducibility of research, all providing a better ‘bang for the buck’ out of scientific research, much of which is funded from public or nonprofit sources. Ultimately, all of these considerations aside, our viewpoint is quite simple: ensuring access to the underlying data should be an intrinsic part of the scientific publishing process.
PLOS journals have requested data be available since their inception, but we believe that providing more specific instructions for authors regarding appropriate data deposition options, and providing more information in the published article as to how to access data, is important for readers and users of the research we publish. As a result, PLOS is now releasing a revised Data Policy that will come into effect on March 1, 2014, in which authors will be required to include a data availability statement in all research articles published by PLOS journals; the policy can be found below. This policy was developed after extensive consultation with PLOS in-house professional and external Academic Editors and Editors in Chief, who are practicing scientists from a variety of disciplines.
We now welcome input from the larger community of authors, researchers, patients, and others, and invite you to comment before March. We encourage you to contact us collectively at firstname.lastname@example.org; feedback via Twitter and other sources will also be monitored. You may also contact individual PLOS journals directly.
Theo Bloom, Editorial Director for Biology
Emma Ganley, Senior Editor, PLOS Biology
Margaret Winker, Senior Research Editor, PLOS Medicine
for the PLOS Data Group
We thank all the members of the PLOS Data Policy team, PLOS staff, and Academic Editors and Editors in Chief for all their invaluable contributions to this policy and process. We particularly thank Emma Veitch, Senior Editor, PLOS ONE, for her leadership in bringing this policy to fruition.
Image Credit: jonathangray.com
PLOS Data Policy
from March 1, 2014
PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception1.
When submitting a manuscript online, authors must provide a Data Availability Statement describing compliance with PLOS’s policy. The data availability statement will be published with the article if accepted.
Refusal to share data and related metadata and methods in accordance with this policy will be grounds for rejection. PLOS journal editors encourage researchers to contact them if they encounter difficulties in obtaining data from articles published in PLOS journals. If restrictions on access to data come to light after publication, we reserve the right to post a correction, to contact the authors’ institutions and funders, or in extreme cases to retract the publication.
Methods acceptable to PLOS journals with respect to data sharing are listed below, accompanied by guidance for authors as to what must be indicated in their data availability statement and how to follow best practices in reporting. If authors did not collect data themselves but used another source, this source must be credited as appropriate.
Authors who have questions or difficulties with the policy, or readers who have difficulty accessing data, are encouraged to contact the relevant journal office or email@example.com
Acceptable data-sharing methods:
Data deposition (strongly recommended): All data and related metadata underlying the findings reported in a submitted manuscript should be deposited in an appropriate public repository2, unless already provided as part of the submitted article. Repositories may be either subject-specific (where these exist) and accept specific types of structured data, or generalist repositories that accept multiple datatypes, such asDryad. Guidance on acceptable repositories is included below2. The Data Availability Statement must specify that data are deposited publicly and list the name(s) of repositories along with digital object identifiers or accession numbers for the relevant datasets. In some cases authors may not be able to obtain DOIs or accession numbers until the manuscript is accepted; in these cases, the authors must provide these numbers at acceptance. In all other cases, these numbers must be provided at submission.
Data in supporting information files:
For smaller datasets and certain data types, authors may upload data as supporting information files accompanying the manuscript. Authors should take care to maximize the accessibility and reusability of the data by selecting a file format from which data can be efficiently extracted (for example, spreadsheets are preferable to PDF when providing tabulated data).
If data deposition or provision in supporting information is not ethical or legal (e.g., underlying data pose privacy or legal concerns, or include human participants3), the following two methods may be acceptable alternatives, subject to case-by-case evaluation:
Data made available to all interested researchers upon request. Data Availability Statement must specify “Data available on request” and identify the group to which requests should be submitted (e.g., a named data access committee or named ethics committee). The reasons for restrictions on public data deposition must also be specified. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.
Data available from third party. In the case of a primary dataset that was not originally generated by the authors of the submitted manuscript, appropriate data sharing may require that interested researchers obtain third-party data independently from the named original source. In this case, the Data Availability Statement must state the source of the data with full citation and, if the dataset cannot be provided, indicate “Data available from (named source).” The reasons for restrictions on public data deposition must also be specified.
Unacceptable data access restrictions: PLOS journals will not consider manuscripts where the following factors influence ability to share data:
- Authors will not share data because of personal interests, such as patents or potential future publications.
- The conclusions depend solely on the analysis of proprietary data (e.g., data owned by commercial interests, or copyrighted data). If proprietary data are used, the manuscript must include an analysis of public data that validates the conclusions so others can reproduce the analysis and build on the findings.
1. Definition of data that must be shared
PLOS defines the “minimal dataset” to consist of the dataset used to reach the conclusions drawn in the manuscript with related metadata and methods, and any additional data required to replicate the reported study findings in their entirety. Core descriptive data, methods, and study results should be included within the main paper, regardless of data deposition. PLOS does not accept references to “data not shown”. Authors who have datasets too large for sharing via repositories or uploaded files should contact the relevant journal for advice.
2. Guidance on data repositories
PLOS requires that authors comply with field-specific standards forpreparation and recording of data and to select repositories appropriate to their field, for exampledeposition of microarray data in ArrayExpress or GEO; deposition of gene sequences in GenBank, EMBL or DDBJ; and deposition of ecological data in Dryad. Authors are encouraged to select repositories that meet accepted criteria as trustworthy digital repositories, such as criteria of theCentre for Research Libraries orData Seal of Approval. Large, international databases are more likely to persist than small, local ones. Copyright licensing for data held in repositories may be unclear. If authors use repositories with stated licensing policies; the policies should not be more restrictive than CC-BY.
3. Guidance on sharing datasets that derive from clinical studies or other work involving human participants
For studies involving human participants, data must be handled so as to not compromise study participants’ privacy. PLOS recommends that researchers follow established guidance and applicable local laws in ensuring they do not compromise participant privacy. Resources which researchers may consult for guidance include:
Steps necessary to protect privacy may include de-identification, blocking portions of the database, or license agreements directed specifically at privacy concerns. Authors should indicate, as part of the ethics statement, the ways in which the study participants’ privacy was preserved. If license agreements apply, authors should note the process necessary for other researchers to obtain a license.