Skip to content Skip to footer

Marine metagenomics Edit me

Introduction

The marine metagenomics domain is characterized by large datasets that require access to substantial storage and High-Performance Computing (HPC) for running complex and memory-intensive analysis pipelines, and therefore are difficult to handle for typical end-users and beyond the resources of many service providers. With respect to sharing metagenomics datasets in compliance with the FAIR principles, so that they can be reused, it hinges entirely on recording rich metadata about all the steps from sampling to data analysis.

Managing marine metagenomic metadata

Description

Metagenomics is a highly complex process the encompasses several steps including: sampling, isolation of DNA, generation of sequencing libraries, sequencing, pre-processing of raw data, taxonomic and functional profiling using reads, assembly, binning, refinement of bins, generation of MAGs, taxonomic classification of MAGs, and archiving of raw or processed data. To comply with the FAIR principles, you need to collect metadata about all these steps.

Moreover, in marine metagenomics, it is also necessary to characterize the marine environment of the sample, including geolocation, and the physico-chemical properties of the water.

Solutions

Tools and resources for analyzing metagenomics datasets

Description

The field of marine metagenomics has been in rapid expansion, with many statistical/computational tools and databases developed to explore the huge influx of data. You need to be able to choose between the multiple bioinformatics techniques, tools, and methodologies available for performing each step of a typical metagenomics analysis, while ensuring that your choice conforms to the best practices for the domain. Moreover you need access to HPC facilities with capacity to execute the analysis and store the resulting data, and therefore should be aware of what computing infrastructures are available to you (and at what cost).

Considerations

  • Are there particular characteristics of your dataset that would restrict the choice of applicable tools?
  • Are the recommended tools freely available?
    • If not, can you afford the software licensing cost?
    • If not, are there freely available alternatives?
  • Does your institution have its own HPC facilities, and what are the access conditions?
  • Does your country have a research HPC infrastructure, and what are the access conditions?

Solutions

More information

Related RDMkit pages in "Your tasks"
Related RDMkit pages in "Tool assembly"
Training

Relevant tools and resources

Skip tool table
Tool or resource Description Related pages Registry
MIGS/MIMS Minimum Information about a (Meta)Genome Sequence Documentation and metadata Researcher Data steward research Microbial biotechnology FAIRsharing
MIxS Minimum Information about any (x) Sequence Documentation and metadata Researcher Data steward research Plant Genomics Assembly FAIRsharing TeSS