Tool assembly: Plant Phenomics
What is the plant phenomics tool assembly and who can use it?
The plant phenomics tool assembly covers the whole life cycle of experimental plant phenotyping data. It uses the concepts of the MIAPPE (Minimum Information About a Plant Phenotyping Experiment) standard: (i) experiments description including organisation, objectives and location, (ii) biological material description and identification and (iii) traits (phenotypic and environmental) description including measurement methodology. A more detailed overview of the MIAPPE standard is available, as well as the full specifications.
The plant phenomics tool assembly helps everyone in charge of plant phenotyping data management to enable:
- the integration of phenotyping data with other omics data: see the general principles on the Plant Sciences domain page;
- the findability of their data in plant specific (e.g. FAIDARE) or generic search portal (e.g. Google Data Search);
- the long term reusability of their data.
How can you access the plant phenomics tool assembly?
All the components of the plant phenomics tool assembly are publicly available and listed below, but many of them require registration.
Data management planning
The general principles to be considered are described in the Plant Science domain page and in particular in its section dedicated to plant phenotyping data. In a nutshell:
- the phenotyping data must be described following the MIAPPE data standard;
- special attention should be given to the identification and description of the biological material and the observation variables.
The general principles for data management planning and available tools are described in the RDMkit data management plan page. The knowledge model of the data management planning application Data Stewardship Wizard (DSW) was reviewed for compliance with the needs of the Plant Sciences community.
File based data collection
The metadata and description of your experiments should be filled using a MIAPPE template. Note that there is a readme that fully describes each field as well as their type and their optional or mandatory status. All fields should be present in the file you are using, even if you leave the optional ones empty. This will allow standard processing and validation using dedicated tools.
Experimental data gathering and management
Systems for file based data collection
- FAIRDOM-SEEK is an open source web-based data sharing platform used as a repository or a catalog. It is being deployed as several instances ranging from confidential project data sharing platforms (INRAE/AGENT, VIB) to public repositories like FAIRDOMHub. It is MIAPPE compliant through the integration of MIAPPE metadata at the investigation, study and assay levels. It can be used for project based early data sharing, in preparation for long term data storage, but also as a preservation tool for raw data.
- pISA-tree is a data management solution developed to contribute to the reproducibility of research and analyses. Hierarchical set of batch files is used to create standardized nested directory tree and associated files for research projects.
- COPO is a data management platform specific to plant sciences.
High throughput dedicated systems
- PHIS is the open-source Phenotyping Hybrid Information System (PHIS), based on OpenSILEX, manages and collects data from Phenotyping and High Throughput Phenotyping experiments on a day to day basis. It can store, organize and manage highly heterogeneous (e.g. images, spectra, growth curves) and multi-spatial and temporal scale data (leaf to canopy level) originating from multiple sources (field, greenhouse). It unambiguously identifies all objects and traits in an experiment and establishes their relations via ontologies and semantics that apply to both field and controlled conditions. Its ontology-driven architecture is a powerful tool for integrating and managing data from multiple experiments and platforms, for creating relationships between objects and enriching datasets with knowledge and metadata. It is MIAPPE and BrAPI compliant, and naming conventions are recommended for users to declare their resources. Several experimental platforms use PHIS to manage their data, and PHIS instances dedicated to sharing resources (projects, genetic resources, variables) also exist to allow the sharing of studied concepts.
- PIPPA is the PSB Interface for Plant Phenotype Analysis, is the central web interface and database that provides the tools for the management of the plant imaging robots on the one hand, and the analysis of images and data on the other hand. The database supports all MIAPPE fields which are accessible through the BrAPI endpoints. Experiment pages are marked up with Bioschemas to improve findability on google.
Data processing and analysis
It is important to keep in mind the difference between data processing and analysing.
- Processing provides the tools and procedures to transform primary data, such as imaging or observational data, to appropriate quality and processability.
- Analysing, on the other hand, is concerned with extracting information from the processed data for the purpose of supporting knowledge acquisition. Some analysis tools dedicated to plant phenotyping experiments are registered in bio.tools, for example: Plant 3D, LeafNet, PlantCV, Phenomenal 3D.
The data collected and annotated can be shared in trustworthy repositories under clear conditions of access to the data. As no global central repository exists for phenotyping data, the Plant Science research community combines the use of scattered trustworthy repositories and of centralized search tools.
Dataverse is an open source research data repository software used by several research institute over the globe to publicly share heterogenous dataset. In Europe, it is being used among others by the portuguese DMPortal, the german Julich data portal, and the french Recherche Data Gouv (previously Data.INRAE) research communities. Its main strength is its flexibility, as the mandatory metadata are focused on publication information such as title, abstract, authors and keywords. It can therefore host any datatype, which is both a strength and a weakness, as shared good practices are necessary to ensure the reusability and findability of published phenomic data.
e!DAL-PGP is a comprehensive research data repository, which is hosted at the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben and is mainly focused on sharing high valuable and large genomics and phenomics datasets. It is the first productive instance, which is based on the open source e!DAL infrastructure software and is furthermore a part of the de.NBI/ELIXIR Germany services. All provided datasets are FAIR compliant and citable via a persistent DOI. By using the widely established LifeScience AAI (formerly known as ELIXIR AAI) the submission procedure is open for all ELIXIR associated users. The key feature of e!DAL-PGP is its user-friendly, simple and FAIR-compliant data submission and internal review procedure. The repository has no general limit to any type of size of datasets. A comprehensive documentation including, guidelines, code snippets for technical integration and videos is available on the project website.
Zenodo is a powerful data publication service, which is supported by the European commission and focused on research data, including supplemental material like software, tables, figures or slides. Therefore the publication is usually associated with the publication of a research paper, book chapters or presentations. The Zenodo data submission form allows to describe every data file with a set of technical metadata based on the DataCite metadata schema, which is necessary and assign a persistent DOI to every dataset. The Zenodo infrastructure is hosted at the CERN and can publish dataset up to a size of 50 GB for free. For larger datasets a specific support request is necessary. A further valuable feature of Zenodo is the connection to GitHub and the provided opportunity to assign a DOI to a concrete version or rather commit of a hosted software repository which allows to persist software scripts, which improves the reproducibility of research workflows and results, which is often a challenge especially for older research publications.
Machine actionable data sharing
BrAPI (the Breeding API) is a MIAPPE compliant web service specification available on several deposition databases. Those endpoints can be validated using the BrAPI validator BRAVA. BrAPI hosts several documentation and training material to support its usage.
Plant phenotyping data reuse relies on rich metadata following the MIAPPE specifications annotated with proper ontologies. Most of the important ontologies are registered on FAIRSHARING: use this search example.
- AgroPortal is a vocabulary and ontology repository for agronomy and related domains.
- FAIDARE (FAIR Data-finder for Agronomic Research) is a portal facilitating discoverability of public data on plant biology from a federation of established data repositories.
Relevant tools and resourcesSkip tool table
|Tool or resource||Description||Related pages||Registry|
|AgroPortal||Browser for ontologies for agricultural science based on NBCO BioPortal.||Plant sciences Documentation and metadata||Tool info Standards/Databases|
|BioSamples||BioSamples stores and supplies descriptions and metadata about biological samples used in research and development by academia and industry.||Plant sciences Plant Genomics||Tool info Standards/Databases Training|
|BrAPI||Specification for a standard API for plant data: plant material, plant phenotyping data||Data Steward: infrastructure Plant sciences|
|COPO||Portal for scientists to broker more easily rich metadata alongside data to public repos.||Documentation and metadata Researcher Plant sciences Machine actionability Plant Genomics||Tool info Standards/Databases|
|Crop Ontology||The Crop Ontology compiles concepts to curate phenotyping assays on crop plants, including anatomy, structure and phenotype.||Researcher Data Steward: research Data Steward: infrastructure Plant sciences||Standards/Databases Training|
|Data INRAE||Dataverse for life sciences and agronomic related data||Plant sciences Plant Genomics Researcher Data Steward: research||Standards/Databases|
|Data Stewardship Wizard||Publicly available online tool for composing smart data management plans||Data management plan Researcher Data Steward: research Data Steward: infrastructure NeLS TSD Plant Genomics||Tool info Training|
|e!DAL-PGP||Plant Genomics and Phenomics Research Data Repository||Plant sciences Plant Genomics Researcher Data Steward: research Data Steward: infrastructure Data publication Documentation and metadata||Standards/Databases|
|EURISCO||European Search Catalogue for Plant Genetic Resources||Plant sciences Researcher Data Steward: research||Tool info|
|FAIDARE||FAIDARE is a tool allowing to search data across dinstinct databases that implemented BrAPI.||Researcher Data Steward: research Plant sciences IFB Plant Genomics||Tool info|
|FAIRDOM-SEEK||A data Management Platform for organising, sharing and publishing research datasets, models, protocols, samples, publications and other research outcomes.||Data storage Data Steward: infrastructure NeLS Microbial biotechnology IFB Machine actionability Plant Genomics||Tool info Training|
|GnpIS||A multispecies integrative information system dedicated to plant and fungi pests. It allows researchers to access genetic, phenotypic and genomic data. It is used by both large international projects and the French National Research Institute for Agriculture, Food and Environment.||Plant sciences||Tool info Standards/Databases|
|ISA4J||Open source software library that can be used to generate a ISA-TAB export from in-house data sets. These comprises e.g. local database or local file system based experimental.||Plant sciences Machine actionability||Tool info|
|MIAPPE||Minimum Information About a Plant Phenotyping Experiment||Documentation and metadata Researcher Data Steward: research Plant sciences Plant Genomics||Standards/Databases Training|
|Multi-Crop Passport Descriptor (MCPD)||The Multi-Crop Passport Descriptor is the metadata standard for plant genetic resources maintained ex situ by genbanks.||Documentation and metadata Researcher Data Steward: infrastructure Data Steward: policy Plant sciences Plant Genomics||Standards/Databases|
|PHIS||The open-source Phenotyping Hybrid Information System (PHIS) manages and collects data from plants phenotyping and high throughput phenotyping experiments on a day to day basis.||Plant sciences IFB||Training|
|pISA-tree||A data management solution for intra-institutional organization and structured storage of life science project-associated research data, with emphasis on the generation of adequate metadata.||Microbial biotechnology Researcher Data Steward: research Data organisation Documentation and metadata Plant Genomics||Tool info|
|Zenodo||Generalist research data repository built and developed by OpenAIRE and CERN||Data publication Biomolecular simulation data Bioimaging data||Standards/Databases Training|
PIPPA, the PSB Interface for Plant Phenotype Analysis, is the central web interface and database that provides the tools for the management of the plant imaging robots on the one hand, and the analysis of images and data on the other hand.
|Plant sciences Data Steward: research Researcher Data Steward: infrastructure||Tool info|