How do you find appropriate standard metadata for datasets or samples?

Description

There are multiple standards for different types of data, ranging from generic dataset descriptions (e.g. DCAT, Dublin core, (bio)schema.org) to specific data types (e.g. MIABIS for biosamples). Therefore, how to find standard metadata, and how to find an appropriate repository for depositing your data become relevant questions.

Considerations

  • Decide at the beginning of the project what are the recommended repositories for your data types.
    • Note that you can use several repositories if you have different data types.
    • Distinguish between generic (e.g. Zenodo) and data type (technique) specific repositories (e.g. EBI repositories).

Solutions

  • If you have a repository in mind:
    • Go to the repository website and check the “help”, “guide” or “how to submit” tab to find information about required metadata.
    • On the repository website, go through the submission process (try to submit some dummy data) to identify metadata requirements. For instance, if you consider publishing your transcriptomic data in ArrayExpress, you can make your metadata spreadsheet by using Annotare 2.0 submission tool, at the beginning of the project.
    • Be aware that data type specific repositories usually have check-lists for metadata. For example, the European Nucleotide Archive provides sample checklists that can also be downloaded as a spreadsheet.
  • If you don’t know yet what repository you will use, look for what is the recommended minimal information (i.e. “Minimum Information …your topic”, e.g. MIAME or MINSEQE or MIAPPE) required for your type of data in your community, or other metadata, at the following resources:

How do you find appropriate vocabularies or ontologies?

Description

Vocabularies and ontologies are meant for describing concepts and relationships within a knowledge domain. Used wisely, they can enable both humans and computers to understand your data. There is no clear-cut division between the terms “vocabulary” and “ontology”, but the latter is more commonly used when dealing with complex (and perhaps more formal) collections of terms.

There are many vocabularies and ontologies to be found on the web. Finding a suitable one can be both difficult and time-consuming.

Considerations

  • Check whether you really need to find a suitable ontology or vocabulary yourself. Perhaps the repository where you are about to submit your data have recommendations? Or the journal where you plan to publish your results?
  • Understand your goal with sharing data. Which formal requirements (by e.g. by funder or publisher) need to be fulfilled? Which parts of your data would benefit the most from adopting ontologies?
  • Learn the basics about ontologies. This will be helpful when you search for terms in ontologies and want to understand how terms are related to one another.
  • Accept that one ontology may not be sufficient to describe your data. It is very common that you have to combine terms from more than one ontology.
  • Accept terms that are good enough. Sometimes you you cannot find a term that perfectly match what you want to express. Chosing the best available term is often better than not chosing a term at all. Note that the same concept may also be present in multiple ontologies.

Solutions

Relevant tools and resources

Tool or resource Description Tags Registry
Biosamples BioSamples stores and supplies descriptions and metadata about biological samples used in research and development by academia and industry. metadata plants
Biostudies The BioStudies database holds descriptions of biological studies and links to data from these studies in other databases. metadata plants
COPO Portal for scientists to broker more easily rich metadata alongside data to public repos. metadata researcher plants
Data Curation Centre Metadata list List of metadata standards metadata researcher data manager
EMBL-EBI Ontology Lookup Service EMBL-EBI’s web portal for finding ontologies metadata data manager researcher
FAIRDOMHub Data, model and SOPs management for projects, from preliminary data to publication, support for running SBML models etc. (public SEEK instance) storage researcher nels metadata micro biotech
fairsharing A curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies. metadata data publication policy officer data manager researcher micro biotech
IDPO Intrinsically disordered proteins ontology IDP metadata
Linked Open Vocabularies (LOV) Web portal for finding ontologies metadata data manager researcher
MCPD The Multi-Crop Passport Descriptor is the metadata standard for plant genetic resources maintained ex situ by genbanks. metadata researcher IT support policy officer plants
MCPD The Multi-crop Passport Descriptors are an international standard to facilitate germplasm passport information exchange metadata plants
MIADE Minimum Information About Disorder Experiments (MIADE) standard metadata researcher data manager IDP
MIAPPE Minimum Information About a Plant Phenotyping Experiment metadata researcher data manager plants
MIGS/MIMS Minimum Information about a (Meta)Genome Sequence metadata researcher data manager marine micro biotech
MIxS Minimum Information about any (x) Sequence metadata researcher data manager marine
Ontobee A web portal to search and visualise ontologies metadata data manager researcher
OTP One Touch Pipeline (OTP) is a data management platform for running bioinformatics pipelines in a high-throughput setting, and for organising the resulting data and metadata. human data metadata DMP data analysis
RDA Standards Directory of standard metadata, divided into different research areas metadata researcher data manager
Research Object Crate (RO-Crate) RO-Crate is a lightweight approach to packaging research data with their metadata, using schema.org. An RO-Crate is a structured archive of all the items that contributed to the research outcome, including their identifiers, provenance, relations and annotations. metadata storage data organisation data manager researcher micro biotech
Rightfield RightField is an open-source tool for adding ontology term selection to Excel spreadsheets researcher metadata data manager micro biotech
Schemapedia Web portal for finding ontologies metadata data manager researcher
The Genomic Standards Consortium (GSC) Minimum Information about any (x) Sequence metadata researcher IT support policy officer human data
The Open Biological and Biomedical Ontology (OBO) Foundry Collaborative effort to develob interoperable ontologies for the biological sciences metadata data manager researcher
UniProt Comprehensive resource for protein sequence and annotation data metadata researcher IDP micro biotech