What is Galaxy?
Galaxy is a well-known open-source platform for FAIR data analysis that enables users to:
- access and collect data from reference databases, external repositories and other data sources;
- use tools from various domains that can be plugged into workflows through its graphical web interface;
- run code in interactive environments (RStudio, Jupyter…) along with other tools or workflows;
- manage data by sharing and publishing results, workflows, and visualizations;
- capture the metadata of data analyses, thus ensuring their reproducibility.
Galaxy supports scientists to perform accessible, reproducible, and transparent computational analysis. The Galaxy Community is actively involved in helping the ecosystem improve and sharing scientific discoveries.
Who can use Galaxy?
Galaxy also provides open infrastructure ready to use for researchers worldwide. All what you need is a web browser and an account in a public server:
What can you use Galaxy for?
Galaxy can be used at different stages of the data life cycle, covering from the data collection to the reuse steps.
Collect
Access to databases
- UniProt
- InterMine
- OMERO
- OmicsDI
- Copernicus
- UCSC genome browser (tutorial)
- NCBI datasets
- International Nucleotide Sequence Database Collaboration (INSDC)
- European Nucleotide Archive (ENA)
- PDB
- 3rd-party databases
Customised data access
- Data libraries
- BYOD (Posix, WebDav, Dropbox, ...)
- On-demand reference data
- Deferred data from remote locations
LIMS integration
- Connect to sequencing facilities
- Rich API for integration with LIMS
Process
Data transformation
- Data transformation tools
- Quality control
- Data cleaning
- Annotation
- Interactive Tools (OpenRefine, RStudio, Jupyter Notebook)
Import workflows
Metadata handling
- Provenance tracking
- Automatic metadata enrichment
- Bulk (meta)data manipulation
Analyse
Preserve
Export artefacts
- Workflows
- History
- Datasets
Formats
- Archive file
- BioCompute Object
- Research Object Crate (RO-Crate)
Export to remote sources
- FTP
- Dropbox
- S3 Bucket
- AWS
- GDrive
- Nextcloud
- WebDav
- Google Cloud Storage
Share
Share artefacts
- Datasets
- Histories
- Workflows
- Visualizations
- GA4GH Beacon (WIP)
- DRS server
Shareability
- RBAC (Role-Based Access Control)
- One user
- A group of users
- Public
Reuse
Account cleaning
- Storage dashboard to manage quota
- Bulk (permanent) delete
- Quota temporarily extendable
- Multiple quota per object storage (WIP)
Import artefacts
- Histories (own, shared by others)
- Workflows from the WorkflowHub
Related pages
More information
Training
Skip tool tableTools and resources on this page
Tool or resource | Description | Related pages | Registry |
---|---|---|---|
European Nucleotide Archive (ENA) | A record of sequence information scaling from raw sequcning reads to assemblies and functional annotation | Plant Genomics Human pathogen genomics Microbial biotechnology Single-cell sequencing Data brokering Data publication Project data managemen... | Tool info Standards/Databases Training |
International Nucleotide Sequence Database Collaboration | The International Nucleotide Sequence Database Collaboration (INSDC) is a long-standing foundational initiative that operates between DDBJ, EMBL-EBI and NCBI. INSDC covers the spectrum of data raw reads, through alignments and assemblies to functional annotation, enriched with contextual information relating to samples and experimental configurations. | Microbial biotechnology Plant sciences Data publication | |
International Nucleotide Sequence Database Collaboration (INSDC) | A collaborative database of genetic sequence datasets from DDBJ, EMBL-EBI and NCBI | Microbial biotechnology Plant sciences Data publication | Tool info |
OMERO | OMERO is an open-source client-server platform for managing, visualizing and analyzing microscopy images and associated metadata | OMERO Bioimaging data | Tool info Training |
OmicsDI | Omics Discovery Index (OmicsDI) provides a knowledge discovery framework across heterogeneous omics data (genomics, proteomics, transcriptomics and metabolomics) | Existing data Machine actionability | Tool info Standards/Databases Training |
PDB | The Protein Data Bank (PDB) | Intrinsically disorder... Structural Bioinformatics Data publication | Tool info Training |
Research Object Crate (RO-Crate) | RO-Crate is a lightweight approach to packaging research data with their metadata, using schema.org. An RO-Crate is a structured archive of all the items that contributed to the research outcome, including their identifiers, provenance, relations and annotations. | Microbial biotechnology Data provenance | Standards/Databases |
UniProt | Comprehensive resource for protein sequence and annotation data | Intrinsically disorder... Proteomics Single-cell sequencing Structural Bioinformatics Machine actionability | Tool info Standards/Databases Training |
WorkflowHub | WorkflowHub is a registry for describing, sharing and publishing scientific computational workflows. | Data analysis Data provenance | Tool info Standards/Databases Training |