Can you really deposit your data in a public repository?
Sometimes it is difficult to determine if publishing data you have at hand is the right thing to do. Some reasons for hesitations might be that you have not used the data in a publication yet and don’t want to be scooped, that the data contains personal information about patients or that the data was collected or produced in a collaboration.
- Do you have the rights or permissions to publish the data?
- Is the data commercially-sensitive?
- Does the data contain confidential/restricted information?
- Who controls the data?
- Certain repositories offer solutions for depositing data that need to be under restricted access. This allows for data to be findable even when it can not be published openly. One example is the The European Genome-phenome Archive (EGA) that can be used to deposit potentially identifiable genetic and phenotypic human data.
- Many repositories provide the option to put an embargo on a deposited dataset. This might be useful if you prefer to use the data in a publication before making it available for others to use.
- Establish an agreement outlining the controllership of the data and each collaborators’ rights and responsibilities.
Which repository should you use to publish your data?
Once you have completed your experiments and have performed quality control of your data it is good scientific practice to share your data in a public repository. Publishing your data is often required by funders and publishers.
The most suitable repository will depend on the data type and your discipline.
- What type of data are you planning to publish?
- Does the repository need to provide solutions for restricted access for sensitive data?
- Do you have the rights to publish the data via the repository?
- How sustainable is the repository, will the data remain public over time?
- How FAIR is the repository?
- Does the funding agency pose specific requirements regarding data sharing?
- What are the repository’s policies concerning licences and data reuse?
- Discipline-specific repositories: If a discipline-specific repository, recognised by the community, exists this should be your first choice since discipline-specific repositories often increases the FAIRness of the data.
- The EMBL-EBI’s data submission wizard will help you choose a suitable repository based on your data type.
- Lists of discipline-specific, community-recognised repositories can be found in the following links:
- General-purpose and institutional repositories: For other cases, a repository that accepts data of different types and disciplines should be considered. It could be a general-purpose repository or a centralised repository provided by your institution or university.
How do you prepare your data for publication?
Once you have decided to publish your data there are a few preparations that need to be done to get the data ready for repository submission.
- What repository should you choose?
- What file formats should be used for the data?
- How is the data uploaded?
- What metadata do you need to provide?
- Under which licence should the data be published?
- To find a repository see Which repository should I use to publish my data? above.
- Repositories generally have information about data formats, metadata requirements and how data can be uploaded under a section called “submit”, “submit data”, “for submitters” or something similar. Read this section in detail.
- To ascertain re-usability data should be released with a clear and accessible data usage license. We suggest making your data available under licences that permit free reuse of data, e.g. a Creative Commons licence, such as CC0 or CC-BY. The EUDAT licence selector wizard can help you select suitable licences for your data. Note that sequence data submitted to for example ENA are implicitly free to reuse by others as specified in the INCD Standards and policies.
- See the corresponding page for more detailed information about metadata management, licences and data transfer.
Relevant tools and resources
|Tool or resource||Description||Tags||Registry|
|dbGAP||The database of Genotypes and Phenotypes (dbGaP) archives and distributes data from studies investigating the interaction of genotype and phenotype in Humans||data publication researcher IT support human data|
|Dryad||Open-source, community-led data curation, publishing, and preservation platform for CC0 publicly available research data||data publication biomol sim|
|ELIXIR Deposition Databases for Biomolecular Data||List of discipline-specific deposition databases recommended by ELIXIR.||data publication researcher data manager IT support|
|EMBL-EBI's data submission wizard||EMBL-EBI's wizard for finding the right EMBL-EBI repository for your data.||data publication researcher data manager|
|fairsharing||A curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies.||metadata data publication policy officer data manager researcher micro biotech|
|FigShare||Data publishing platform||data publication biomol sim|
|GA4GH data security toolkit||Principled and practical framework for the responsible sharing of genomic and health-related data.||data publication policy officer data manager IT support human data|
|GitHub||Versioning system, used for sharing code, as well as for sharing of small data||data publication data organisation IT support data manager|
|GitLab||GitLab is an open source end-to-end software development platform with built-in version control, issue tracking, code review, CI/CD, and more. Self-host GitLab on your own servers, in a container, or on a cloud provider.||data organisation data publication IT support data manager|
|Mendeley data||Multidisciplinary, free-to-use open repository specialized for research data||data publication biomol sim|
|OpenScienceFramework||free and open source project management tool that supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery||data publication biomol sim|
|Repository Finder||Repository Finder can help you find an appropriate repository to deposit your research data. The tool is hosted by DataCite and queries the re3data registry of research data repositories.||data publication researcher data manager|
|Scientific Data's Recommended Repositories||List of respositories recommended by Scinetific Data, contains both discipline-specific and general repositories.||data publication researcher data manager IT support|
|The European Genome-phenome Archive (EGA)||EGA is a service for permanent archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical research projects||data publication human data policy officer|
|Wellcome Open Research - Data Guidelines||Wellcome Open Research requires that the source data underlying the results are made available as soon as an article is published. This page provides information about data you need to include, where your data can be stored, and how your data should be presented.||data publication researcher data manager|
|Zenodo||Generalist research data repository built and developed by OpenAIRE and CERN||data publication biomol sim|