Tool assembly: Galaxy

What is Galaxy?

Galaxy is a well-known open-source platform for FAIR data analysis that enables users to:

access and collect data from reference databases, external repositories and other data sources;
use tools from various domains that can be plugged into workflows through its graphical web interface;
run code in interactive environments (RStudio, Jupyter…) along with other tools or workflows;
manage data by sharing and publishing results, workflows, and visualisations;
capture the metadata of data analyses, thus ensuring their reproducibility.

Galaxy supports scientists to perform accessible, reproducible, and transparent computational analysis. The Galaxy Community is actively involved in helping the ecosystem improve and sharing scientific discoveries.

Who can use Galaxy?

Galaxy also provides open infrastructure ready to use for researchers worldwide. All what you need is a web browser and an account in a public server:

What can you use Galaxy for?

Galaxy can be used at different stages of the data life cycle, covering from the data collection to the reuse steps.

Collect

Access to databases

UniProt
InterMine
OMERO
OmicsDI
Copernicus
UCSC genome browser (tutorial)
NCBI datasets
International Nucleotide Sequence Database Collaboration (INSDC)
European Nucleotide Archive (ENA)
PDB
3rd-party databases

Customised data access

Data libraries
BYOD (Posix, WebDav, Dropbox, ...)
On-demand reference data
Deferred data from remote locations

LIMS integration

Connect to sequencing facilities
Rich API for integration with LIMS

Process

Data transformation

Data transformation tools
Quality control
Data cleaning
Annotation
Interactive Tools (OpenRefine, RStudio, Jupyter Notebook)

Import workflows

Metadata handling

Provenance tracking
Automatic metadata enrichment
Bulk (meta)data manipulation

Analyse

2,900 domain-specific tools

Preserve

Export artefacts

Workflows
History
Datasets

Formats

Archive file
BioCompute Object
Research Object Crate (RO-Crate)

Export to remote sources

FTP
Dropbox
S3 Bucket
AWS
GDrive
Nextcloud
WebDav
Google Cloud Storage

Share artefacts

Datasets
Histories
Workflows
Visualizations
GA4GH Beacon (WIP)
DRS server

Shareability

RBAC (Role-Based Access Control)

One user
A group of users
Public

Reuse

Account cleaning

Storage dashboard to manage quota
Bulk (permanent) delete
Quota temporarily extendable
Multiple quota per object storage (WIP)

Import artefacts

Histories (own, shared by others)
Workflows from the WorkflowHub

Your tasks

Data analysis

How to make data analysis FAIR.

Your tasks

Data organisation

Best practices to name and organise research data.

Your tasks

Data publication

How to prepare data and find repositories for publication.

Your tasks

Data quality

How to ensure high quality of research data.

Your tasks

Existing data

How to find and reuse existing data.

Your tasks

Identifiers

How to use identifiers for research data.

Your tasks

Machine actionability

How to make machine-actionable (meta)data.

Your tasks

Documentation and metadata

How to document and describe your data.

More information

Training

Galaxy Training Network

Galaxy Mentor Network

Training in TeSS

Tools and resources on this page

Tool or resource	Description	Related pages	Registry
European Nucleotide Archive (ENA)	A record of sequence information scaling from raw sequcning reads to assemblies and functional annotation	Plant Genomics Biodiversity Epitranscriptome data Human pathogen genomics Microbial biotechnology Single-cell sequencing Virology Data brokering Data interlinking Data publication Project data managemen...	Tool info Standards/Databases Training
International Nucleotide Sequence Database Collaboration	The International Nucleotide Sequence Database Collaboration (INSDC) is a long-standing foundational initiative that operates between DDBJ, EMBL-EBI and NCBI. INSDC covers the spectrum of data raw reads, through alignments and assemblies to functional annotation, enriched with contextual information relating to samples and experimental configurations.	Biodiversity Microbial biotechnology Plant sciences Data publication	Training
International Nucleotide Sequence Database Collaboration (INSDC)	A collaborative database of genetic sequence datasets from DDBJ, EMBL-EBI and NCBI	Biodiversity Microbial biotechnology Plant sciences Data publication	Tool info Training
OMERO	OMERO is an open-source client-server platform for managing, visualizing and analyzing microscopy images and associated metadata	OMERO Bioimaging data	Tool info Training
OmicsDI	Omics Discovery Index (OmicsDI) provides a knowledge discovery framework across heterogeneous omics data (genomics, proteomics, transcriptomics and metabolomics)	Data interlinking Existing data Machine actionability	Tool info Standards/Databases Training
PDB	The Protein Data Bank (PDB)	Intrinsically disorder... Structural Bioinformatics Data publication	Tool info Training
Research Object Crate (RO-Crate)	RO-Crate is a lightweight approach to packaging research data with their metadata, using schema.org. An RO-Crate is a structured archive of all the items that contributed to the research outcome, including their identifiers, provenance, relations and annotations.	Microbial biotechnology Plant sciences Data provenance	Standards/Databases Training
UniProt	Comprehensive resource for protein sequence and annotation data	Enzymology and biocata... Intrinsically disorder... Proteomics Single-cell sequencing Structural Bioinformatics Machine actionability	Tool info Standards/Databases Training
WorkflowHub	WorkflowHub is a registry for describing, sharing and publishing scientific computational workflows.	Biodiversity Data analysis Data provenance	Tool info Standards/Databases Training