Your domain: Agroecology

Introduction

Agroecology is a transdisciplinary field that integrates principles from agriculture, ecology, and environmental sciences to study and promote sustainable farming systems. It focuses on the interactions between plants, soil, water, biodiversity, and climate, aiming to optimise agricultural productivity while enhancing ecosystem resilience and biodiversity conservation. In the context of modern challenges such as climate change, soil degradation, and biodiversity loss, agroecology aims to research solutions that prioritise ecological balance and resource efficiency.

Agroecological research generates diverse and complex datasets, spanning field observations, soil and plant analysis, remote sensing, genomic studies, and environmental monitoring. Effective Research Data Management (RDM) is essential to ensure that these datasets remain accessible, interoperable, and reusable, facilitating collaboration across disciplines and institutions.

This page focuses on RDM best practices for the life science aspects of agroecology and supports alignment with the FAIR principles (Findable, Accessible, Interoperable, Reusable). Given the breadth of agroecology, readers may also find relevant RDM guidance in existing RDMkit domain pages, in particular Your Domain - Biodiversity and Your Domain - Plant Sciences, which cover overlapping data types, standards, and workflows commonly encountered in agroecological research. Topics related to social and economic sciences are outside the scope of this page but are acknowledged as complementary aspects of agroecology research.

Data collection

Description

Agroecology studies often combine field observations, sensor measurements, remote sensing, laboratory analyses, and management records across multiple spatial and temporal scales (plot, farm, landscape; days, seasons, years). This leads to datasets that are heterogeneous, context-dependent, and sensitive to local conditions (soil, climate, practices). Robust data collection workflows are therefore essential to ensure that datasets remain comparable, reusable, and suitable for integration with external sources.

Considerations

What do you need to define to ensure measurements are comparable across sites and time?
Which information is essential to interpret observations in their local context?
How will you represent spatial and temporal variability consistently?
What quality assurance steps do you need to prevent or detect errors and inconsistencies during data capture?
Are any data sensitive, and if so, what restrictions do you need to enable responsible reuse?

Solutions

The following practices help you ensure agroecology data remain comparable across sites and reusable:

Use standardised protocols and templates to harmonise sampling and field observations across teams and sites (e.g. by maintaining shared, versioned methods in protocols.io, or aligning practices with long-term monitoring initiatives such as LTER protocols).
Capture a minimum set of contextual information during collection so measurements remain interpretable later (e.g. where/when/how the observation was made).
Maintain a simple field-level data dictionary (variables, units, and codes) to avoid inconsistencies between teams and seasons.
Apply lightweight QA/QC during collection, for example calibration checks, validation rules, and clear missing-value conventions. More information is available at Your task - Data Quality.
Keep traceability from raw to derived data, documenting transformations and any scripts or tools used. More information is available at Your task - Data Provenance.
Use consistent naming and structures from the start to support later integration and reuse. More information is available at Your task - Data Organisation.
Handle sensitive information responsibly, separating identifying information where needed and documenting access conditions (e.g. precise farm locations or rare species occurrence locations).

Metadata and data interoperability

Description

In agroecological research, documenting data and ensuring data interoperability are essential for making data FAIR. However, achieving consistent and comprehensive documentation poses several challenges:

Agroecological research involves a wide range of data types, including genomic sequences, phenotypic traits, geospatial data, and environmental observations. This diversity makes it difficult to apply a one-size-fits-all approach to metadata and data interoperability.
Agroecology integrates data from life sciences, environmental science, and agronomy. The absence of standardised terminology and metadata practices across these disciplines can lead to inconsistencies.
Inadequate metadata or inconsistent documentation can hinder the long-term usability of datasets, reducing their value for future research and reproducibility.

Considerations

What metadata standards are most relevant for the type of agroecological data you collect (e.g. genomic, environmental, geospatial)?
How can you ensure consistent data documentation across multidisciplinary research teams?
What controlled vocabularies and ontologies should you use to enhance data interoperability?
Which data formats best support long-term preservation and reusability?

Solutions

The following practices help improve interoperability and reuse of agro-ecology datasets across disciplines and infrastructures:

Adopt community metadata standards where possible, selecting those that best match your data types and community expectations. For example, guidance on recommended standards and how they map to common agroecology data types is collected via the AgroServ FAIRSharing collection, which links out to discipline- and technology-specific best practices.
Use AgriSchemas to implement a lightweight, low-formal schematisation approach. Unlike complex OWL-based ontologies, AgriSchemas extends Schema.org and Bioschemas to provide a pragmatic “farm-to-fork” pathway for sharing large, semi-structured datasets.
Leverage the successful mapping of Minimum Information about Plant Phenotyping Experiment (MIAPPE) to AgriSchemas. This allows researchers to bridge the gap between genomic data and phenotypic observations, such as investigating gene function based on expression measured in field trials.
Use controlled vocabularies and ontologies to standardise terminology, especially for describing environments, experimental conditions, traits, organisms, and management variables. This improves consistency within a project and makes it easier to integrate datasets across studies and infrastructures. Useful resources include:
- KnetMiner: A knowledge discovery platform that integrates diverse biological datasets into searchable knowledge graphs, supporting AgriSchemas and enabling exploratory research through APIs, graph queries, and interactive analysis.
- AgriSchemas: Provides open-source ETL utilities and data conversion pipelines to transform raw data into compliant RDF and Neo4j datasets.
- AgroPortal: a curated portal for agronomy and related domains (e.g., plant sciences, nutrition, biodiversity), supporting ontology search, browsing, and semantic annotation.
- The Environment Ontology (EnvO): a widely used ontology for environmental and habitat descriptors, often used to harmonise site and environmental context.
- Crop Ontology: provides standardised trait and variable descriptions for crops, commonly used in plant breeding and phenotyping contexts.
- Ontology Lookup Service: a general-purpose service for searching and resolving terms across many biomedical and life science ontologies, useful when projects span multiple domains.
- AgroServ FAIRSharing collection: complements the ontology resources above by pointing to recommended community standards, and project-relevant guidance across life sciences and beyond.
Keep identifiers consistent across datasets for sites, plots, samples, and observations, so that data from different sources can be reliably linked and integrated.
- See also: Your task - Identifiers
Choose formats that support long-term reuse and integration, and document format choices early in the project (including version and conventions used).
- CSV (Comma-Separated Values): A simple, human-readable format for tabular data.
- NetCDF: A format suited for climate and environmental datasets, supporting multi-dimensional data structures.
- Use ISA-Tab format to describe complex field experiments and keep sample - measurement - datafile relationships clear across many sites, seasons, and analytical methods (soil chemistry, crop traits, biodiversity surveys, omics, remote sensing, etc.).
- For a more detailed overview of appropriate formats, see Your task - Data organisation.
Make transformations and harmonisation steps explicit, especially when integrating multi-source data (field, lab, sensors, remote sensing), to support reproducibility and cross-study reuse. More information on this can be found in Your task - Data Provenance.

Description

Agroecology datasets are often valuable beyond the original study, for example for meta-analyses, modelling, long-term monitoring, and synthesis across sites. However, publishing agroecology data can be challenging because datasets are heterogeneous, collected across multiple partners, and may include sensitive elements. Sharing data with appropriate documentation, licensing, and access conditions helps maximise reuse while supporting responsible data handling.

Considerations

What can you share openly, and what requires controlled access, aggregation, or other safeguards?
Which repository or platform best fits the data type and community practices?
What documentation doe you need so others can interpret your data outside its original context?
How will you make datasets citable, with stable identifiers and clear attribution?
Which licence and reuse conditions are appropriate for your intended audiences and stakeholders?

Solutions

The following practices help agroecology datasets become discoverable, citable, and reusable:

Publish data in a suitable repository or platform, prioritising disciplinary services where possible, and ensuring long-term availability. The following repositories and data networks provide open-access platforms that support collaboration, standardization, and FAIR data practices across multiple scientific domains:
- KnetMiner SPARQL Endpoint: Maintains a showcase data endpoint allowing researchers to use the SPARQL graph query language to answer complex, multi-domain questions across previously siloed sources like ENSEMBL and PubMed.
- knetgraph-gene-traits: A proof-of-concept project demonstrating how standardised knowledge graphs enable flexible analytical approaches like Gene Set Enrichment Analysis (GSEA).
- Global Biodiversity Information Facility (GBIF): GBIF is a biodiversity data repository, providing open access to species occurrence records, ecological datasets, and conservation-relevant information. By publishing and integrating biodiversity data in GBIF, researchers contribute to global biodiversity assessments and ecological research.
- AgroDataCube: This platform aggregates spatial agricultural datasets, combining open and derived data to support precision agriculture and agri-food applications. Researchers can use AgroDataCube to analyze spatial patterns in farming, climate interactions, and land use.
- Agricultural Collaborative Research Outcomes System (AgCROS): AgCROS fosters data collaboration in agriculture, allowing researchers to share, compare, and integrate datasets to enhance agricultural productivity, sustainability, and resilience. The platform supports open-access data integration across different agroecological domains.
- Plant Genomics and Phenomics Research Data Repository (PGP): PGP provides specialised data hosting for plant genomics and phenomics research, ensuring that datasets on crop adaptation, resilience, and functional traits are findable and accessible for agricultural and environmental studies.
- Crop Nutrient Data: Crop Nutrient Data is a comprehensive database of field trial measurements on soil and crop nutrient concentrations. It enables researchers to explore, compare, and reuse standardized nutrient datasets across locations and crops, supporting improved nutrient management strategies and more sustainable agriculture practices.
- International Soil Data Network (ISRIC): ISRIC provides global soil datasets, tools, and resources for soil information management and mapping. By using ISRIC’s data, researchers can study soil properties, fertility, and sustainability at a regional and global scale.
- Tool for Agroecology Performance Evaluation (TAPE): Developed by the Food and Agriculture Organization (FAO), TAPE is a structured framework to assess the multidimensional performance of agroecological systems. It is designed for researchers, practitioners, and policymakers to facilitate the adoption and scaling up of agroecological practices.
- FAIDARE: Developed by INRA-URGI, FAIDARE is a data portal that enables and increases dataset findability and accessibility in open international federations of information systems for research on plants. It uses BrAPI to access the individual repositories which is an implementation of MIAPPE.
Provide sufficient documentation for reuse, including methodology, context, variable definitions, and known limitations. More information is available at Your Task - Documentation and metadata.
Ensure datasets are citable, for example by assigning a persistent identifier (e.g. DOI) and providing a recommended citation. More information is available at Your Task - Identifiers.
Choose a clear licence to remove ambiguity for reusers and to support responsible reuse. More information is available at Your Task - Licensing.
Manage sensitive data responsibly, especially where sharing could expose precise farm locations or commercially sensitive management records. More information is available at Your Task - Data sentivity, Your Task - Ethical Aspects, Your Task - GDPR compliance and Your Task - Data Security.
Ensure environmental context is preserved by following the FAIRAgro collaboration models, which focus on capturing complex field trial variables such as soil descriptions.
Prepare data for Generative AI Integration. Structuring data with AgriSchemas allows for the development of agentic AI systems that ease knowledge discovery through natural language interfaces.

Your tasks

Documentation and metadata

How to document and describe your data.

Your tasks

Data quality

How to ensure high quality of research data.

Your tasks

Data provenance

How to record information about data provenance.

Your tasks

Data organisation

Best practices to name and organise research data.

Your tasks

Data sensitivity

How to identify the sensitivity of different research data types

Your tasks

GDPR compliance

How to protect your research data, and how to make research data compliant to GDPR.

Your tasks

Ethical aspects

Working on aspects in the management of research data that can raise ethical issues

More information

Links to FAIRsharing

FAIRsharing is a curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies.

Agroecology collection

Tools and resources on this page

Tool or resource	Description	Related pages	Registry
Agricultural Collaborative Research Outcomes System (AgCROS)	A data repository for agricultural research data, providing a platform for sharing and accessing datasets related to agriculture and food systems.
AgriSchemas	Extends Schema.org and Bioschemas to provide a pragmatic, "farm-to-fork" pathway for the automatic integration of large, semi-structured datasets in exploratory research.		Tool info
AgroDataCube	AgroDataCube is a comprehensive data platform that provides access to a wide range of agricultural data, including soil, weather, crop, and management information, to support research and decision-making in agriculture.
AgroPortal	Browser for ontologies for agricultural science based on NBCO BioPortal.	Plant Phenomics Documentation and meta...	Tool info Standards/Databases
Bioschemas	Bioschemas aims to improve the Findability on the Web of life sciences resources such as datasets, software, and training materials	Plant Phenomics Enzymology and biocata... Intrinsically disorder... Virology Data discoverability Machine actionability Documentation and meta...	Tool info Standards/Databases Training
Crop Nutrient Data	Crop Nutrient Data is a comprehensive database that provides information on the nutrient content of various crops.
Crop Ontology	The Crop Ontology compiles concepts to curate phenotyping assays on crop plants, including anatomy, structure and phenotype.	Plant sciences	Standards/Databases Training
CSV (Comma-Separated Values)	Plain-text tabular file format where each row is a record and fields are separated by commas; widely supported for exchanging simple tables		Standards/Databases Training
FAIDARE	FAIDARE is a tool allowing to search data across dinstinct databases that implemented BrAPI.	Plant Phenomics Plant sciences	Tool info Training
Global Biodiversity Information Facility (GBIF)	Global Biodiversity Information Facility (GBIF) is an international network and data platform that provides open access to biodiversity data from sources worldwide to support research and conservation.	Biodiversity	Tool info Standards/Databases Training
ISA-Tab	Tab-delimited (TSV) metadata format based on the ISA model to describe investigations, studies, and assays, capturing experimental design, sample metadata, protocols, and sample-to-data relationships		Standards/Databases Training
KnetMiner	A software to make biological search more integrated, intuitive and intelligent, enabling a better way to discover and share new insights.	Machine actionability	Tool info Standards/Databases Training
Minimum Information about Plant Phenotyping Experiment (MIAPPE)	MIAPPE is an open and community-driven data standard designed to harmonise data from plant phenotyping experiments.	Plant Genomics Plant Phenomics Plant sciences Machine actionability Documentation and meta...	Standards/Databases Training
NetCDF	Machine-independent data model and file format for storing and sharing array-oriented scientific data with self-describing metadata		Tool info Standards/Databases Training
Ontology Lookup Service	EMBL-EBI's web portal for finding ontologies	FAIRtracks Bioimaging data Enzymology and biocata... Health data Documentation and meta...	Tool info Standards/Databases Training
Plant Genomics and Phenomics Research Data Repository	A repository for plant genomics and phenomics research data, including data from the German Plant Phenotyping Network (DPPN) and the European Plant Phenotyping Network (EPPN).	Plant Genomics Plant Phenomics	Tool info Standards/Databases
Schema.org	Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond.	Enzymology and biocata... Machine learning Data discoverability Machine actionability Documentation and meta...	Tool info Standards/Databases Training
The Environment Ontology (EnvO)	An ontology for expressing environmental terms	Microbial biotechnology
Tool for Agroecology Performance Evaluation (TAPE)	The Tool for Agroecology Performance Evaluation (TAPE) is a software tool designed to assess and evaluate the performance of agroecological practices. It provides a framework for analysing various aspects of agroecology, including environmental, economic, and social factors, to support sustainable agricultural practices.