Your domain: Bioimaging data

Introduction

Bioimaging specialists are acquiring an ever growing amount of data: images, associated metadata, etc. However, image data management often does not receive the attention it requires or is avoided altogether since it is considered a burdensome task. At the same time, storing images on personal computers or USB keys is no longer an option, assuming it ever was! Data volume is exponentially increasing, and not just the acquired images need storing but potentially processed images will be generated and will need to be kept alongside the original images. It is critical to proactively identify where the data will be stored, for how long, who will cover the cost of the hardware, and who will cover the cost of managing the infrastructure. All the stakeholders need to be involved in the preliminary discussions: biologists, facility managers, data analysis, IT support, etc., to ensure that the requirements are understood and met.

What constitutes bioimage data

An image is much more than a collection of zeros and ones. The image will contain the binary representing the pixels on screen but it is usually packed with useful metadata. You will find the obvious keys indicating how to interpret the zeros and ones, you can also find a lot of acquisition metadata e.g. hardware/instrument used, settings used, etc.

The number of image proprietary formats is very large and keeps increasing. It is challenging to support so many proprietary file formats i.e. read/extract metadata. The Bio-Formats library currently supports over 150 different file formats. The Dataset Structure Table shows the extension of the files to read and indicates the structure of the image itself e.g. single file, multiple files, one image file and a companion file, etc.

Data management challenges

The number of files and their size could be extremely large. Deleting/misplacing a file could invalidate the study itself, preventing its reuse.

Managing images immediately becomes a larger problem, not only the binary files need to be handled, but also the associated metadata. Several efforts have been made and still ongoing to capture those metadata. Understanding and capturing the metadata are critical for many reasons, just to mention a few: analysis, detection of possible faults in acquisition systems. It is important to decide how much details will be recorded since this could dramatically increase the metadata volume and therefore the effort required to capture the metadata.

The collection of images could be:

data acquired within a facility;
data acquired in other facility (commissioned work or external guest user) and “transported” by the users to their facility;
slides scanned.

After acquisition, data are usually moved to more permanent storages with different level of permissions. This depends on the facility policies and could prevent collaborative work. Users will also adopt their own “organisation” conventions, this could potentially make it very difficult to find or understand the data when, for example, the data are migrated to a new location or when the researcher who acquired the data leaves the lab.

Standard (meta)data formats

Description

Unlike other domains, the bioimaging community has not yet agreed on a single standard data format which is generated by all acquisition systems. Instead, the images described above are most frequently collected in proprietary file formats (PFFs) defined by hardware vendors. Currently, there are several hundred such formats that the researchers may encounter. These formats combine critical acquisition metadata with the multidimensional binary data but are often optimized for quickly writing the data to disk. Tools and strategies are outlined below to ease working with this data.

Considerations

When purchasing a microscope, consider carefully how the resulting files will be processed. If open source tools will be used, proprietary file formats may require a time-consuming conversion. Discuss with your vendor if an open format is available.
If data from multiple vendors is to be combined, similar a conversion may be necessary to make the data comparable.
Imaging data brings special considerations due to the large, often continuous nature of the data. Single terabyte-scale files are not uncommon. Sharing these can require special infrastructure, like a data management server (described below) or a cloud-native format (described below). One goal of such infrastructure is to enable the selective (i.e. interactive) zooming of your image data without the need to download the entire volume, thereby reducing your internet bandwidth and costs.
Importantly, most acquisition systems produce proprietary file formats. Understanding how well they are supported by the imaging community could be a key factor of a successful study. Will it be possible to analyse or view the image using open-source software? Will it be possible to deposit the images to public repositories when published? The choice of proprietary file formats could prevent from using any other tools that are not related to the acquisition systems.

Solutions

Vendor libraries: Some vendors provide open source libraries for parsing their proprietary file formats. See libCZI from Zeiss.

Open source translators: Members of the community have developed multi-format translators that can be used to access your data on-the-fly i.e. the original format is preserved, no file written on disk. This implies that you will need to perform this translation each time you access your data and, depending on the size of the image(s), you could run out of memory. Translation libraries include,

Bio-Formats (Java) - supports over 150 file formats
OpenSlide (C++) - primarily for whole-slide imaging (WSI) formats
AICSImageIO (Python) - wraps vendor libraries and Bio-Formats to support a wide-range of formats in Python

Permanent conversion: An alternative is to permanently convert your data to

OME-Files - The Open Microscopy Consortium (OME) has developed an open format, “OME-TIFF”, to which you can convert your data. The Bio-Formats (above) library comes with a command line to tool bfconvert that can be used to convert to files to OME-TIFF
The bioformats2raw and raw2ometiff toolchain provided by Glencoe Software allows the more performant conversion of your data, but requires an extra intermediate copy of the data. If you have available space, the toolchain could also be an option to consider.

Cloud (or “object”) storage: If you are storing your data in the cloud, you will likely need a different file format since most current image file formats are not suitable for cloud storage. OME is currently developing a next-generation file format (NGFF) that you can use.

Metadata: If metadata are stored separately from the image data, the format of the metadata should follow the subject-specific standards regarding the schema, vocabulary or ontologies and storage format used such as:

OME model XML-based representation of microscopy data.
4DN-BINA-OME-QUAREP (NBO-Q).
REMBI.

(Meta)Data collection

Description

The acquisition of bioimaging data takes place in various environments. The (usually) light or electron microscope may be in a core facility, in a research lab or even remotely in a different institution. Regardless of where the instrument is located, the acquired imaging data is likely to be stored, at least temporarily, in a local, vendor specific system’s PC next to the acquisition system due to their complexity and size. This is often unavoidable in order to securely store the data as quickly as the acquisition process itself.

Due to the scale of data, keeping track of the image data and the associated data and metadata is essential, particularly in life sciences and medical fields. Organising, storing, sharing, publishing image data and metadata can be very challenging.

Considerations

Consider using an image management software platform. Image management software platforms offer a way to centralize, organize, view, distribute and track all of their digital images and photos. It allows you to take control over how your images are managed, used and shared within research groups.
When evaluating an image management software platforms, check if it allows you to:
- Control the access you wish to give to your data and how you wish to work e.g. PI only can view and annotate my data or you can choose to work on project with some collaborators.
- Access data from anywhere via either Web or Desktop clients and API.
- Store the metadata with your images. For example, analytical results can be linked to your imaging data and can be easily findable.
- Add value to your imaging data by for example linking them to external resources like ontologies.
- Make your data publicly available and slowly moving towards FAIRness.
Try to avoid storing bioimaging data in the local system’s PC.
If possible, make a transfer to central storage mandatory. If not possible, enable automation of data backup to central storage.
Consider support for minimal standards (metadata schemas, file formats, etc.) in your domain.
Consider reusing existing data.

Solutions

Agnostic platforms that can be used to bridge between domain data include:
- iRODS.
- b2share.
Image-specific data management platforms include:
- OMERO - broad support for a large number of imaging formats.
- Cytomine-IMS - image specific.
- XNAT - medical imaging platform, DICOM-based.
- MyTARDIS - largely file-system based platform handling the transfer of data.
- BisQue - resource for management and analysis of 5D biological images.m
Platforms like OMERO, b2share also allow you to publish the data associated with a given project.
Metadata standards can be found at the Metadata Standards Directory Working Group.
Ontologies Resources available at:
- Zooma - Resource to find ontology mapping for free text terms.
- Ontology Lookup Service - Ontology lookup service.
- BioPortal - Biomedical ontologies.
Existing data can be found by using the following resources:
- LINCS.
- Research Data repositories Registry.
Find software tools, image databases for benchmarking, and training materials for bioimage analysis in the BIII registry

Data publication and archiving

Description

Public data archives are an essential component of biological research. However, publishing image data and metadata can be very challenging for multiple reasons, just to mention a few: limited infrastructure for some domains, data support, sparse data.

Bioimaging tools and resources are behind compared to what is available in sequencing for example. mainly due to limited infrastructures capable of hosting the data. There are a few ongoing efforts to breach that gap.

Two distinct types of resources should be considered:

Data archives (“storage”) as a long-lasting storage for data and metadata and making those data easily accessible to the community.
Added-values archives: store enhanced curated data, typically aiming at a scientific community.

Considerations

If you only need to make your data available online and have limited metadata associated, consider publishing in a Data archive.
If your data should be considered as a reference dataset, consider an Added-values archive.
Select and choose the repositories based on the following characteristics:
- Storage vs Added-value resources.
- Images format support.
- Supported licenses e.g. CC0 or CC-BY license. For example the Image Data Resource (IDR) uses Creative Commons Licenses for submitted datasets and encourages submitting authors to choose.
- Which types of access are required for the users e.g. download only, browse search and view data and metadata, API access.
  - Does an entry have an access e.g. idr-xxx, EMPIAR-#####?
  - Does an entry have a DOI (Digital Object Identifier)?

Solutions

Comparative table of some repositories that can be used to deposit imaging data:

Repository	Type	Data Restrictions	Data Upload Restrictions	DOI	Cost
BioImageArchive	Archive	No PIH data	2TB	Yes	Free
Dryad	Archive	No PIH data	300GB	Yes	over 50GB (*)
EMPIAR	Added-value	Electron microscopy imaging data	None	Yes	Free
Image Data Resource (IDR)	Added-value	Cell/Tissue imaging data, no PIH data	None	Yes	Free
SSBD:database	Added-value	Biological dynamics imaging data	None	---	Free
SSBD:repository	Archive	Biological dynamics imaging data	None	---	Free
Zenodo	Archive	None	50GB per dataset	Yes	Free

PIH: Protected health information.
(*) unless submitter is based at member institution.

Your tasks

Data management plan

How to write a Data Management Plan (DMP).

Your tasks

Data organisation

Best practices to name and organise research data.

Your tasks

Data publication

How to prepare data and find repositories for publication.

Your tasks

Existing data

How to find and reuse existing data.

Your tasks

Data transfer

How to transfer data files.

Your tasks

Licensing

How to license research data.

Your tasks

Documentation and metadata

How to document and describe your data.

Your tasks

Data storage

How to find appropriate storage solutions.

Tool assembly

OMERO

OMERO is a software platform for managing, sharing and analysing images data.

Tool assembly

XNAT-PIC

XNAT for Preclinical Imaging Centers (XNAT-PIC) is a of set of tools to store, process and share preclinical imaging studies built on top of the XNAT imaging informatics platform.

More information

Links to FAIR Cookbook

FAIR Cookbook is an online, open and live resource for the Life Sciences with recipes that help you to make and keep data Findable, Accessible, Interoperable and Reusable; in one word FAIR.

Depositing IMI EUBOPEN High-Content Screening data to EBI BioImage Archive

Depositing Covid-19 image data to BioImage Archive

Training

RDMbites for using REMBI

Tools and resources on this page

Tool or resource	Description	Related pages	Registry
4DN-BINA-OME-QUAREP (NBO-Q)	Rigorous record-keeping and quality control are required to ensure the quality, reproducibility and value of imaging data. The 4DN Initiative and BINA have published light Microscopy Metadata Specifications that extend the OME Data Model, scale with experimental intent and complexity, and make it possible for scientists to create comprehensive records of imaging experiments. The Microscopy Metadata Specifications have been adopted by QUAREP-LiMi and are being revised in QUAREP-LiMi in collaboration with instrument manufacturers	OMERO	Standards/Databases
AICSImageIO	Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Pure Python
b2share	Store and publish your research data. Can be used to bridge between domains		Standards/Databases
bfconvert	The bfconvert command line tool can be used to convert files between supported formats.
BIII	The BioImage Informatics Index is a registry of software tools, image databases for benchmarking, and training materials for bioimage analysis		Tool info
Bio-Formats	Bio-Formats is a software tool for reading and writing image data using standardized, open formats		Tool info Training
bioformats2raw	Java application to convert image file formats, including .mrxs, to an intermediate Zarr structure compatible with the OME-NGFF specification.
BioImageArchive	The BioImage Archive stores and distributes biological images that are useful to life-science researchers.	Data publication	Standards/Databases
BioPortal	A comprehensive repository of biomedical ontologies	Health data Documentation and meta...	Tool info Standards/Databases Training
BisQue	Resource for management and analysis of 5D biological images		Tool info
Cytomine-IMS	Image Data management
Dryad	Open-source, community-led data curation, publishing, and preservation platform for CC0 publicly available research data	Biomolecular simulatio... Data publication	Standards/Databases
EMPIAR	Electron Microscopy Public Image Archive is a public resource for raw, 2D electron microscopy images. You can browse, upload and download the raw images used to build a 3D structure	OMERO Structural Bioinformatics Data publication	Tool info Standards/Databases Training
Image Data Resource (IDR)	A repository of image datasets from scientific publications	OMERO Microbial biotechnology	Tool info Standards/Databases
iRODS	Integrated Rule-Oriented Data System (iRODS) is open source data management software for a cancer genome analysis workflow.	TransMed Data storage	Tool info
MyTARDIS	A file-system based platform handling the transfer of data
OMERO	OMERO is an open-source client-server platform for managing, visualizing and analyzing microscopy images and associated metadata	Galaxy OMERO	Tool info Training
Ontology Lookup Service	EMBL-EBI's web portal for finding ontologies	FAIRtracks Health data Documentation and meta...	Tool info Standards/Databases Training
OpenSlide	C library that provides a simple interface to read whole-slide images (also known as virtual slides)
raw2ometiff	Java application to convert a directory of tiles to an OME-TIFF pyramid. This is the second half of iSyntax/.mrxs => OME-TIFF conversion.
SSBD:database	Added-value database for biological dynamics images		Standards/Databases
SSBD:repository	An open data archive that stores and publishes bioimaging and biological quantitative datasets
XNAT	Open source imaging informatics platform. It facilitates common management, productivity, and quality assurance tasks for imaging and associated data.	TransMed XNAT-PIC Cancer data
Zenodo	Generalist research data repository built and developed by OpenAIRE and CERN	FAIRtracks Plant Phenomics Biomolecular simulatio... Plant sciences Single-cell sequencing Data publication Identifiers	Standards/Databases Training
Zooma	Find possible ontology mappings for free text terms in the ZOOMA repository.		Tool info Training

National resources

Tools and resources tailored to users in different countries.

Tool or resource	Description	Related pages	Registry
Technology Hotels	More than 130 Technology Hotels offer access to high-end technology and expertise in the field of bioimaging, bioinformatics, genomics, medical imaging, metabolomics, phenotyping, proteomics, structural biology, and/or systems biology.	Human data Proteomics Researcher Compliance monitoring ...