Skip to content Skip to footer

Microbial biotechnology Edit me

Introduction

The Microbial Biotechnology domain is a very broad field that encompasses the application of microorganisms to the development of useful products and processes. As such, there are a very wide variety of experimental tools, approaches, and ultimately data, that arise in this field. A convenient representation of microbial biotechnology for organisational purposes is the stages of the engineering life cycle drawn from the related field of synthetic biology.

Storing and Sharing Data and appropriate solutions

Here, we adopt the stages of design, build and test to categorise the various approaches available for the management of data in microbial biotechnology. There are important data standards to consider and various ways to manage, store and share data. Ultimately, the ideal scenario is that data is captured in a standard format and then uploaded to a repository to ensure that it is Findable, Accessible, Interoperable and Reusable (FAIR). However, for the biotechnology field, data standards are still under development or missing completely and there are still gaps in database provision for some data types.

Due to the interdisciplinary nature of the field, data arising from studies in microbial biotechnology relate to both computational studies, such as modelling and simulation, and the results of wet-lab based studies used for the construction and experimental characterisation of microbial systems. Given the breadth, scope and rapid development of the field of microbial biotechnology, this guide is by no means exhaustive.

This guide is by no means comprehensive. Please get in touch with further suggestions for relevant standards and data sharing tools that can make it more complete. Sites such as Fairsharing can provide a wealth of data about standards that may be appropriate for a given data type and not mentioned in this brief guide.

Design

Description

The design for a system in microbial biotechnology essentially involves two, interrelated exercises: (i) Identification of the biological entities/hosts that will be used to develop the product in question (ii) Identification of the genetic modifications/circuitry/constructs necessary to modify the host if appropriate. The design stage may also include optional approaches: (iii) Metabolic engineering of biosynthetic pathways (iv) Using mathematical modelling to aid the design of the system.

In this section, the data management considerations and solutions surrounding the exercises outlined above will be discussed.

Biological hosts

Considerations

  • The recording of taxonomic and genetic data must be considered carefully as part of the design stage
  • Metadata surrounding the host is essential, such as where it was isolated, growth conditions, recommended protocols etc.
  • Genetic information relating to strains and any modifications needs to be kept track of as modifications are made

Solutions

Genetic parts, device and systems

Considerations

  • Format of designs may vary depending on the application, whether this be at the sequence level or an entire system
  • Consider existing management tools that can help visualise and modify genetic designs
  • How can the information about characterisation of genetic constructs assist in the selection of parts and modelling designs?
  • Consider how you will record metadata at each point in the design process

Solutions

  • Sequences are characterised as parts which can be found with the assistance of various repositories such as the iGEM Parts Registry, The Joint BioEnergy Institute’s Inventory of Composable Elements (JBEI-ICE) (Ham et al., 2012) and SynBioHub, or isolated from standard genetic databases such as ENA and GenBank. At this point it may be desirable to assert which host the designed device is intended to express in and also the intended method of replication in the host - for example, cloned on a particular plasmid or integrated in the host chromosome.

  • You can manage the design stage using genetic computer aided design tools, such as Benchling for example, where information can be shared within small teams. Benchling supports a number of different data standards including FASTA, GenBank and SBOL1. Sometimes FASTA will be the most relevant format, for example when sending for DNA synthesis. Formats like GenBank, DICOM-SB (Sainz de Murieta, Bultelle and Kitney, 2016) or SBOL may be more applicable for instances where more information, such as functional annotation, would be useful to be shared. SBOL 2.0 and higher allows more than just the genetics of a system to be captured and shared. Using SBOL allows interactions between components in the design to be specified, information about RNA and proteins can be included and the provenance of a design can also be captured. Experimental information relating to the test and build of a system can also be captured and shared.
  • SBOL data can be made using tools such as Benchling (SBOL1 only), SBOL Designer (Zhang et al., 2017) and ShortBOL to name but a few. A more comprehensive list of SBOL tools can be found on the sbolstandard website.

  • Once the design is complete, you can share this information via a repository such as the iGEM Parts Registry, SynBioHub, JBEI-ICE or Addgene. Here, much information about its performance can be included varying from experimental results such as fluorescence curves to predicted performance based on modelling. It would be recommended to use standard figures that can be easily understood. SBOL-Visual is a good example of a graphical standard; it utilises standard shapes to represent different genetic parts which can help clarify a complex synthetic construct. SBOL-Visual can be crafted using tools such as VISBOL.

  • More generally, The Investigation/Study/Assay (ISA) model can be used in systems biology, life sciences, environmental and biomedical domains to structure research outputs. The ISA-Tab format provides a framework for capturing these data in CSV files.

  • Platforms such as SEEK, built on technologies such as ISA, support a large range of systems and synthetic biology projects. SEEK provides a web-based resource for sharing scientific research datasets, models or simulations, and processes. SEEK can be installed locally or FAIRDOMHub, a version of SEEK which is hosted by FAIRDOM, is available for general community use. Rightfield provides a mechanism for capturing metadata using easy to use spreadsheets.

Metabolic engineering designs and enzyme data

Considerations

  • How can designs regarding metabolic pathways be accurately represented and stored?
  • Enzymes have specific data standards that should be considered when accessing and recording their data
  • How can assay data and functional information be collected and recorded?

Solutions

Model based designs

Considerations

  • What tools and standards need to be considered when building mathematical models to aid the design of genetic systems
  • How can the models be shared via repositories and made available in a way that makes results replicable?

Solutions

Build

Description:

The build stage in the microbial biotechnology and/or synthetic biology life cycle involves the application of any number of a range of experimental techniques and, since these techniques are so varied, the domain is therefore very difficult to standardise in terms of the data and metadata to be shared. The current method of sharing information about the building of microbial systems is to write a detailed free text in the materials and methods section of a scientific paper.

Considerations:

  • Capturing the information about the build process involves collecting the information arising from DNA amplification, DNA preparation and purification, primer design, restriction enzyme analysis, gel electrophoresis and DNA sequencing to name but a few techniques.
  • If using a protein expression device, the intended vector for its replication in a given host will need to be named.
  • The cloning strategy used to assemble the protein expression device and the vector will also need to be specified and shared.
  • The design information about the vector DNA or RNA sequence should be shared via public databases such as ENA or Genbank.
  • The information about how the “final system” was built is highly variable, depending on the DNA synthesis and/or assembly approach used. Consider ways to share this information

Solutions

  • Various DNA synthesis companies build DNA from a computer specification of the sequence and also a variety of experimental approaches for assembling DNA molecules. This information can be shared as free text attached to a design in SBOL format and uploaded to a repository that supports SBOL2 format and above such as SynBioHub.

  • To the authors’ knowledge, there are no proposed standards that exist that are able to capture this diverse set of data. Currently, from a pragmatic point of view, the best a data manager can do is to make sure data is captured in some form from the lab scientist and grouped together with as much metadata as possible.The metadata standards for a build exercise are still to be defined and so at the discretion of the data manager.

  • Once grouped together in a free form the data can be archived along with the metadata, collecting the data together in an archived form using a file compression format. The combine archive format may also be useful.

  • SBOL versions 2.0 and above provides a data standard that allows build data that has been grouped to be associated with design data for a part, device or system along with a minimal amount of metadata.

  • Similarly, research object bundles, and more recently RO-Crates, can be used to gather together build data and test data with information about the overall study.

Test

Description

The test phase of a biotechnological study is the most variable in terms of the types of data produced. The types of experiments carried out to test a microbial system are highly dependent on the intended function of the system under construction. Some common approaches include at the simplest level, characterising the growth of an organism at various scales in different growth regimes and assaying the production of desired product.

Considerations

  • What types of experiments, e.g. organism growth, organism characterisation, will you undertake to test your microbial system? What types of data result from those experiments? Will you combine multi-omics assays in your study?
    • Is there a reporting guideline for the type of you are generating?
    • Will you re-use existing testing protocols or generate and share your own protocols?

Solutions

Bibliography

Field, D. et al. (2008) ‘The minimum information about a genome sequence (MIGS) specification’, Nature biotechnology, 26(5), pp. 541–547. doi: 10.1038/nbt1360.

Ham, T. S. et al. (2012) ‘Design, implementation and practice of JBEI-ICE: an open source biological part registry platform and tools’, Nucleic acids research, 40(18), p. e141. doi: 10.1093/nar/gks531.

Hecht, A. et al. (2018) ‘A minimum information standard for reproducing bench-scale bacterial cell growth and productivity’, Communications biology, 1, p. 219. doi: 10.1038/s42003-018-0220-6.

Kuwahara, H. et al. (2017) ‘SBOLme: a Repository of SBOL Parts for Metabolic Engineering’, ACS synthetic biology, 6(4), pp. 732–736. doi: 10.1021/acssynbio.6b00278.

Maloy, S. R. and Hughes, K. T. (2007) ‘Strain Collections and Genetic Nomenclature’, Methods in Enzymology, pp. 3–8. doi: 10.1016/s0076-6879(06)21001-2.

Parte, A. C. et al. (2020) ‘List of Prokaryotic names with Standing in Nomenclature (LPSN) moves to the DSMZ’, International journal of systematic and evolutionary microbiology, 70(11), pp. 5607–5612. doi: 10.1099/ijsem.0.004332.

Sainz de Murieta, I., Bultelle, M. and Kitney, R. I. (2016) ‘Toward the First Data Acquisition Standard in Synthetic Biology’, ACS synthetic biology, 5(8), pp. 817–826. doi: 10.1021/acssynbio.5b00222.

Sarkans, U. et al. (2018) ‘The BioStudies database—one stop shop for all data supporting a life sciences study’, Nucleic Acids Research, pp. D1266–D1270. doi: 10.1093/nar/gkx965.

Spidlen, J. et al. (2021) ‘Data File Standard for Flow Cytometry, Version FCS 3.2’, Cytometry. Part A: the journal of the International Society for Analytical Cytology, 99(1), pp. 100–102. doi: 10.1002/cyto.a.24225.

‘Standards for Reporting Enzyme Data: The STRENDA Consortium: What it aims to do and why it should be helpful’ (2014) Perspectives in Science, 1(1-6), pp. 131–137. doi: 10.1016/j.pisc.2014.02.012.

Tellechea-Luzardo, J. et al. (2020) ‘Linking Engineered Cells to Their Digital Twins: A Version Control System for Strain Engineering’, ACS synthetic biology, 9(3), pp. 536–545. doi: 10.1021/acssynbio.9b00400.

Ten Hoopen, P. et al. (2015) ‘Marine microbial biodiversity, bioinformatics and biotechnology (M2B3) data reporting and service standards’, Standards in genomic sciences, 10, p. 20. doi: 10.1186/s40793-015-0001-5.

Zhang, M. et al. (2017) ‘SBOLDesigner 2: An Intuitive Tool for Structural Genetic Design’, ACS synthetic biology, 6(7), pp. 1150–1160. doi: 10.1021/acssynbio.6b00275.

Relevant tools and resources

Skip tool table
Tool or resource Description Related pages Registry
Access to Biological Collection Data Schema (ABCD) A standard schema for primary biodiversity data
Addgene A searchable repository with a focus on plasmids
ArrayExpress A repository of array based genomics data bio.tools TeSS
ATCC Biological materials resource including cell-lines, strains and genomics tools bio.tools
BacDive A searchable database for bacteria specific information bio.tools
Bacillus Genetic Stock Center (BGSC) A repository specific to Bacillus strains
Benchling R&D Platform for Life Sciences
Biodiversity Information Standards (TDWG) Biodiversity Information Standards (TDWG), historically the Taxonomic Databases Working Group, work to develop biodiversity information standards
BioModels A repository of mathematical models for application in biological sciences bio.tools TeSS
BioStudies A database hosting datasets from biological studies. Useful for storing or accessing data that is not compliant for mainstream repositories. Documentation and metadata Plant sciences bio.tools TeSS
BRENDA Database of enzyme and enzyme-ligand information, across all taxonomic groups, manually extracted from primary literature and extended by text mining procedures bio.tools FAIRsharing TeSS
CellRepo A version management tool for modifying strains
Cellular Microscopy Phenotype Ontology (CMPO) An ontology for expressing cellular (or multi-cellular) terms with applications in microscopy TeSS
ChEBI Dictionary of molecular entities focused on 'small' chemical compounds bio.tools FAIRsharing TeSS
COmputational Modeling in BIology NEtwork (COMBINE) An initiative to bring together various formats and standard for computational models in biology
DNA Data Bank of Japan (DDBJ) A database of DNA sequences
European Nucleotide Archive (ENA) A record of sequence information scaling from raw sequcning reads to assemblies and functional annotation Plant Genomics Assembly bio.tools FAIRsharing TeSS
FAIRDOM-SEEK Data, model and SOPs management for projects, from preliminary data to publication, support for running SBML models etc. Data storage Data steward infrastructure NeLS assembly IFB - France bio.tools TeSS
FAIRDOMHub Data, model and SOPs management for projects, from preliminary data to publication, support for running SBML models etc. (public SEEK instance) Data storage Researcher NeLS assembly Documentation and metadata FAIRsharing
fairsharing A curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies. Documentation and metadata Data publication Data steward policy Data steward research Researcher Existing data FAIRsharing TeSS
Freegenes Repository of IP-free synthetic biological parts
GenBank A database of genetic sequence information. GenBank may also refer to the data format used for storing information around genetic sequence data. bio.tools TeSS
Gene Expression Omnibus (GEO) A repository of MIAME-compliant genomics data from arrays and high-throughput sequencing
iGEM Parts Registry A collection of standard biological parts to which all entrants in the iGEM competition must submit their parts
Image Data Resource (IDR) A repository of image datasets from scientific publications Data publication Documentation and metadata Data transfer OMERO FAIRsharing
International Nucleotide Sequence Database Collaboration (INSDC) A collaborative database of genetic sequence datasets from DDBJ, EMBL-EBI and NCBI
International Society for the Advancement of Cytometry (ISAC) Data standards and formats for reporting flow cytometry data
International Union of Biochemistry and Molecular Biology (IUBMB) Resource for naming standards in biochemistry and molecular biology
ISA-tools Open source framework and tools helping to manage a diverse set of life science, environmental and biomedical experiments using the Investigation Study Assay (ISA) standard Data steward infrastructure Data steward research FAIRsharing
IUPAC-IUBMB Joint Commission on Biochemical Nomenclature (JCBN) A collaborative resource from IUPAC and IUBMB for naming standards in biochemistry
JBEI-ICE A registry platform for biological parts
List of Prokaryotic names with Standing in Nomenclature (LPSN) A database of prokaryote specific biodiversity information
MetabolomeXchange A repository of genomics data relating to the study of the metabolome bio.tools
MIGS/MIMS Minimum Information about a (Meta)Genome Sequence Documentation and metadata Researcher Data steward research Marine metagenomics FAIRsharing
National Center for Biotechnology Information (NCBI) Online database hosting a vast amount of biotechnological information including nucleic acids, proteins, genomes and publications. Also boasts integrated tools for analysis.
NCBI Taxonomy NCBI's taxonomy browser is a database of biodiversity information
NCIMB Hosts information relating to strains, cultures and more
protocols.io A secure platform for developing and sharing reproducible methods.
Research Object Crate (RO-Crate) RO-Crate is a lightweight approach to packaging research data with their metadata, using schema.org. An RO-Crate is a structured archive of all the items that contributed to the research outcome, including their identifiers, provenance, relations and annotations. Documentation and metadata Data storage Data organisation Data steward research Researcher FAIRsharing
Rightfield RightField is an open-source tool for adding ontology term selection to Excel spreadsheets Researcher Documentation and metadata Data steward research Identifiers bio.tools
SBOL Visual A standard library of visual glyphs used to represent SBOL designs and interactions.
SBOLDesigner A CAD tool to create SBOL designs through the use of SBOL Visual glyphs.
ShortBOL A scripting language for creating Synthetic Biology Open Language (SBOL) in a more abstract way.
Standards for Reporting Enzyme Data (STRENDA) Resource of standards for reporting enzyme data
SynBioHub A searchable design repository for biological constructs bio.tools
Synthetic Biology Open Language (SBOL) An open standard for the representation of in silico biological designs and their place in the Design-Build-Test-Learn cycle of synthetic biology.
Systems Biology Markup Language (SBML) An open format for computational models of biological processes
The Environment Ontology (EnvO) An ontology for expressing environmental terms
UniProt Comprehensive resource for protein sequence and annotation data Documentation and metadata Researcher Intrinsically disordered proteins bio.tools FAIRsharing TeSS
VisBOL A JavaScript library for the visualisation of SBOL.