Tool assembly: Plant Genomics

What is the plant genomics tool assembly?

The plant genomics tool assembly is a toolkit for managing plant genomics and genotyping data throughout their life cycle, with a particular focus on ensuring traceability of the biological material to enable interoperability with plant phenotyping data. To enable this, the same persistent identifiers must be used in both the genotyping and phenotyping experiments. It is recommended that the biological plant material is accurately described using rich metadata and stored in a central repository. The tool assembly also provides guidance on how users should structure their analysis results in the form of VCF files to achieve a higher degree of interoperability.

Who can use the plant genomics tool assembly?

This tool assembly can be used by any researcher producing plant genomic or genotyping data interested in ensuring their data complies with the FAIR principles.

How can you access the plant genomics tool assembly?

All the components of this tool assembly are publicly available, but most require registration. So anyone can access the tool assembly provided they register for each tool that requires it.

For what purpose can you use the plant genomics tool assembly?

Tools and resources used in managing plant genomics and genotyping data. — Figure 1. The plant genomics tool assembly.

Data management planning

The general principles to be considered are described in the Plant Sciences domain page.

Data Stewardship Wizard is a human-friendly tool for machine-actionable DMP collaborative editing. The DSW Plant Sciences project template, available on ELIXIR’s DSW instance for researchers can be used for any plant sciences project. When creating the DMP Project, choose the option “From Project Template” and search for the “Plant Sciences” template.

Metadata collection and tracking

Accurate documentation of the plant biological materials and samples is critical for interoperability, and should comply with the MIAPPE standard. This information should be submitted to BioSamples, with MIAPPE compliance validated using BioSamples’ plant-miappe.json template available on the sample validation page. Submission of sample descriptions to BioSamples can be done as early as the data collection stage, but at the latest, must acompany submission of the genomic data to the European Nucleotide Archive (ENA) (ENA) or of genotyping data to the European Variation Archive (EVA) (EVA). The complete timeline for submitting plant biological material to BioSamples and resulting genotyping experiment results to ENA and EVA should look like this:

Register plant biological material information to BioSamples
Submit Sequencing reads to ENA (using BioSamples IDs to identify material)
Check if used reference genome assembly is INSDC available (GCF / GCA accesion number available)
1. If yes proceed to submit VCF at step 4, if no proceed to step 3 b
2. Submit reference genome assembly to INSDC (NCBI Genbank / EBML-EBI ENA / DDBJ) and wait until accession number is issued, then proceed to step 4
Submit VCF file to EVA (using BioSamples IDs to identify material, GCF/GCA accession for the reference genome assembly)

Note: Metadata associated with a single sample registered with BioSamples can only be updated from the original account.

Plant Genomics and Phenomics Research Data Repository, FAIRDOM-SEEK instances such as FAIRDOMHub or Recherche Data Gouv can be used to manage and share experimental metadata, as well as data.

Data processing and analysis

Reference genomes for genome assembly and annotation should be obtained from Ensembl Plants or PLAZA, if available. Genetic variant data must be produced in the VCF format, and validated using the EVA vcf-validator (https://github.com/EBIvariation/vcf-validator). Please note to only use identifiers of sequences that match the reference genome assembly identifiers. In order to ensure interoperability of VCF files, the VCF meta-information lines should be used: see the Plant sciences page for more details.

All sequencing data collected in plant genotyping experiments should be submitted to ENA together with metadata compliant to the GSC MIxS plant associated checklist. Final results of such studies in the form of VCF files should be submitted to EVA. Additionally, supplemental data complementing these two data types is encouraged to be submitted to Plant Genomics and Phenomics Research Data Repository or Recherche Data Gouv.

Your tasks

Documentation and metadata

How to document and describe your data.

Your tasks

Data publication

How to prepare data and find repositories for publication.

Your domain

Plant sciences

Data management solutions for plant sciences data.

Tool assembly

Plant Phenomics

Tool assembly for managing plant phenomic data.

More information

Links to FAIR Cookbook

FAIR Cookbook is an online, open and live resource for the Life Sciences with recipes that help you to make and keep data Findable, Accessible, Interoperable and Reusable; in one word FAIR.

Improving dataset maturity - MIAPPE-compliant submission to EMBL-EBI databases

Tools and resources on this page

Tool or resource	Description	Related pages	Registry
BioSamples	BioSamples stores and supplies descriptions and metadata about biological samples used in research and development by academia and industry.	Biodiversity Plant sciences Virology Data interlinking	Tool info Standards/Databases Training
Data Stewardship Wizard	Publicly available online tool for composing smart data management plans DSW@IFB learning.DSW DS-Wizard ELIXIR-Norway BioData.pt Data Stewardship Wizard SciLifeLab DS-Wizard DS Wizard ELIXIR Slovenia	CSC FAIRtracks Plant Phenomics Plant sciences Data management plan GDPR compliance	Tool info Training
Ensembl Plants	Open-access database of full genomes of plant species.		Standards/Databases Training
European Nucleotide Archive (ENA)	A record of sequence information scaling from raw sequcning reads to assemblies and functional annotation	Galaxy Biodiversity Epitranscriptome data Human pathogen genomics Microbial biotechnology Single-cell sequencing Virology Data brokering Data interlinking Data publication Project data managemen...	Tool info Standards/Databases Training
European Variation Archive (EVA)	Open-access database of all types of genetic variation data from all species.	Plant sciences Data interlinking	Tool info Standards/Databases Training
FAIRDOMHub	Data, model and SOPs management for projects, from preliminary data to publication, support for running SBML models, etc. (public SEEK instance)	NeLS Plant Phenomics Microbial biotechnology Plant sciences Data discoverability Documentation and meta...	Standards/Databases Training
MIAPPE	Minimum Information About a Plant Phenotyping Experiment	Plant Phenomics Plant sciences Machine actionability Documentation and meta...	Standards/Databases Training
Plant Genomics and Phenomics Research Data Repository	A repository for plant genomics and phenomics research data, including data from the German Plant Phenotyping Network (DPPN) and the European Plant Phenotyping Network (EPPN).	Plant Phenomics Agroecology	Tool info Standards/Databases
PLAZA	Access point for plant comparative genomics, centralizing genomic data produced by different genome sequencing initiatives.		Standards/Databases Training