Your role: Data Steward: infrastructure
As a infrastructure data steward, I focus on liaising with the people involved in the IT infrastructure, technicians, application managers and other service providers inside and outside my research institute. My task is to translate the requirements of policies and science into suitable IT solutions and tools as well as provide advice. I implement IT infrastructure solutions, give access to data and software, and I may also perform hands-on work in a research project.
- Identify the requirements of an adequate data infrastructure and tool landscape that fits with research data management (RDM) policies
- Ensure the compliance of the data infrastructure and tool landscape with codes of conduct and regulations
- Align the data infrastructure and tool landscape to the FAIR (Findable, Accessible, Interoperable, Reusable) data principles and the principles of Open Science, and facilitate and support FAIR data
- Identify the requirements of and provide access to data infrastructure for RDM for researchers
- Make an inventory for data infrastructure and tools that fit with the researchers RDM needs
- Liaison and align the data infrastructure and tools management in and outside the organisation
- Facilitate the availability of local data-infrastructure and tools for FAIR and long term archiving of data
Institutes across Europe have started hiring professional data stewards. A infrastructure oriented data steward is expected to be competent in the following areas:
- Advise and assist researchers on short and long term actions for data infrastructure and tools
- Continuously monitor data infrastructure and tools available inside and outside the institute, in close collaboration with the responsible (IT) department
- Assess technical knowledge status of researchers and relevant stakeholders (including other data stewards), and if needed give training in technical RDM skills
- Communicate with researchers, technical staff and support staff about data infrastructure and tools
- Knowledge of encryption of data, data protection and data security protocols
- Translate ethical and technical requirements for data infrastructure and tools to technological measures, while understanding the research requirements and limitations
- Translate the FAIR data principles into data infrastructure and tool requirements
- Advise researchers and stakeholders (including other data stewards) on archiving solutions, including (meta)data standards
How to make data analysis fair.
How to make research data compliant to gdpr.
How to transfer data files.
How to use identifiers for research data.
How to find appropriate storage solutions.
Best practices to name and organise research data.
How to make machine-actionable (meta)data.
Relevant tools and resourcesSkip tool table
|Tool or resource||Description||Related pages||Registry|
|Arvados||With Arvados, bioinformaticians run and scale compute-intensive workflows, developers create biomedical applications, and IT administrators manage large compute and storage resources.||Data Steward: policy Researcher Data analysis|
|Aspera Fasp||With fast file transfer and streaming solutions built on the award-winning IBM FASP protocol, IBM Aspera software moves data of any size across any distance||Data transfer|
|Beacon||The Beacon protocol defines an open standard for genomics data discovery.||Researcher Data Steward: research Human data||Tool info Training|
|BIII||The BioImage Informatics Index is a registry of software tools, image databases for benchmarking, and training materials for bioimage analysis||Data analysis||Tool info|
|Bioconda||Bioconda is a bioinformatics channel for the Conda package manager||Data analysis||Tool info Training|
|Bioschemas.org||Bioschemas aims to improve the Findability on the Web of life sciences resources such as datasets, software, and training materials||Machine actionability||Training|
|Bitbucket||Git based code hosting and collaboration tool, built for teams.||Data organisation Data Steward: research||Standards/Databases|
|Box||Cloud storage and file sharing service||Data storage Data transfer||Training|
|BrAPI||Specification for a standard API for plant data: plant material, plant phenotyping data||Plant sciences||Training|
|Castor||Castor is an EDC system for researchers and institutions. With Castor, you can create and customize your own database in no time. Without any prior technical knowledge, you can build a study in just a few clicks using our intuitive Form Builder. Simply define your data points and start collecting high quality data, all you need is a web browser.||Identifiers Data Steward: research||Tool info|
|Common Workflow Language (CWL)||An open standard for describing workflows that are build from command line tools||Researcher Data analysis||Standards/Databases Training|
|Conda||Open source package management system||Data analysis||Training|
|Cookiecutter||A command-line utility that creates projects from cookiecutters (project templates), e.g. creating a Python package project from a Python package project template.||Data organisation Data Steward: research|
|Crop Ontology||The Crop Ontology compiles concepts to curate phenotyping assays on crop plants, including anatomy, structure and phenotype.||Researcher Data Steward: research Plant sciences||Standards/Databases Training|
|cURL||command line tool and library for transferring data with URLs||Data transfer|
|DAISY||Data Information System to keep sensitive data inventory and meet GDPR accountability requirement.||Data Steward: policy Human data Data protection TransMed||Tool info|
|Data Catalog Vocabulary (DCAT)||DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.||Machine actionability|
|Data Stewardship Wizard||Publicly available online tool for composing smart data management plans||Data management plan Researcher Data Steward: research NeLS TSD||Tool info Training|
|DATAVERSE||Open source research data respository software.||Data storage Researcher Data Steward: research IFB||Training|
|dbGAP||The database of Genotypes and Phenotypes (dbGaP) archives and distributes data from studies investigating the interaction of genotype and phenotype in Humans||Data publication Researcher Human data||Tool info Standards/Databases Training|
|Docker||Docker is a software for the execution of applications in virtualized environments called containers. It is linked to DockerHub, a library for sharing container images||Data analysis||Standards/Databases Training|
|DropBox||Cloud storage and file sharing service||Data storage Data transfer|
|e!DAL||Electronic data archive library is a framework for publishing and sharing research data||Data storage||Tool info|
|e!DAL-PGP||Plant Genomics and Phenomics Research Data Repository||Plant sciences Plant Genomics Researcher Data Steward: research Data publication||Standards/Databases|
|ELIXIR Deposition Databases for Biomolecular Data||List of discipline-specific deposition databases recommended by ELIXIR.||Data publication Researcher Data Steward: research COVID-19 Data Portal NeLS IFB CSC||Standards/Databases|
|EUPID||EUPID provides a method for identity management, pseudonymisation and record linkage to bridge the gap between multiple contexts.||Data Steward: policy Human data|
|FAIRDOM-SEEK||Data, model and SOPs management for projects, from preliminary data to publication, support for running SBML models etc.||Data storage NeLS Microbial biotechnology IFB Machine actionability||Tool info Training|
|FileZilla||A free FTP solution||Data transfer|
|Free-IPA||FreeIPA is an integrated Identity and Authentication solution for Linux/UNIX networked environments.||TransMed|
|GA4GH Data Security Toolkit||Principled and practical framework for the responsible sharing of genomic and health-related data.||Data publication Data Steward: policy Data Steward: research Human data Sensitive data|
|GA4GH Genomic Data Toolkit||Open standards for genomic data sharing.||Data Steward: research Human data|
|GA4GH Regulatory and Ethics toolkit||Framework for Responsible Sharing of Genomic and Health-Related Data||Data protection Sensitive data Data Steward: policy Data Steward: research Human data|
|Galaxy||Open, web-based platform for data intensive biomedical research. Whether on the free public server or your own instance, you can perform, reproduce, and share complete analyses.||NeLS Marine Metagenomics Data analysis Researcher IFB||Tool info Training|
|Git||Distributed version control system designed to handle everything from small to very large projects||Data organisation Data Steward: research||Training|
|GitHub||Versioning system, used for sharing code, as well as for sharing of small data||Data publication Data organisation Data Steward: research||Standards/Databases Standards/Databases Training|
|GitLab||GitLab is an open source end-to-end software development platform with built-in version control, issue tracking, code review, CI/CD, and more. Self-host GitLab on your own servers, in a container, or on a cloud provider.||Data organisation Data publication Data Steward: research||Standards/Databases Training|
|Globus||Globus lets you efficiently, securely, and reliably transfer data directly between systems separated by an office wall or an ocean. Focus on your research and offload your data transfer headaches to Globus||Data transfer|
|Identifiers.org||The Identifiers.org Resolution Service provides consistent access to life science data using Compact Identifiers. Compact Identifiers consist of an assigned unique prefix and a local provider designated accession number (prefix:accession).||Identifiers Data Steward: research||Tool info Standards/Databases|
|Informed Consent Ontology||The Informed Consent Ontology (ICO) is an ontology for the informed consent and informed consent process in the medical field.||Data Steward: policy Human data||Standards/Databases|
|iRODS||Integrated Rule-Oriented Data System (iRODS) is open source data management software for a cancer genome analysis workflow.||Data storage TransMed Bioimaging data||Tool info|
|ISA-tools||Open source framework and tools helping to manage a diverse set of life science, environmental and biomedical experiments using the Investigation Study Assay (ISA) standard||Data Steward: research Microbial biotechnology Machine actionability||Standards/Databases|
|Jupyter||Jupyter notebooks allow to share code, documentation||Data analysis||Training|
|Keycloak||Keycloak is an open source identity and data access management solution.||TransMed||Training|
|LUMI||EuroHPC world-class supercomputer||Data analysis Researcher CSC||Tool info|
|maDMP - Research Bridge||Machine-Actionable Data Management Plan | Webinar (2016) on making a good data management plan.||Data management plan|
|Microsoft Azure||Cloud storage and file sharing service from Microsoft||Data storage Data transfer|
|Microsoft OneDrive||Cloud storage and file sharing service from Microsoft||Data storage|
|Molgenis||Molgenis is a modular web application for scientific data. Molgenis provides researchers with user friendly and scalable software infrastructures to capture, exchange, and exploit the large amounts of data that is being produced by scientific organisations all around the world.||Identifiers Data Steward: research||Tool info|
|Multi-Crop Passport Descriptor (MCPD)||The Multi-Crop Passport Descriptor is the metadata standard for plant genetic resources maintained ex situ by genbanks.||Documentation and metadata Researcher Data Steward: policy Plant sciences||Standards/Databases|
|NextCloud||As fully on-premises solution, Nextcloud Hub provides the benefits of online collaboration without the compliance and security risks.||Data storage Data transfer|
|Nextflow||Nextflow is a framework for data analysis workflow execution||Data analysis||Tool info Training|
|OMERO||OMERO is an open-source client-server platform for managing, visualizing and analyzing microscopy images and associated metadata||Documentation and metadata Data Steward: research Data storage OMERO Bioimaging data||Tool info Training|
|ONTOMATON||OntoMaton facilitates ontology search and tagging functionalities within Google Spreadsheets.||Researcher Data Steward: research Documentation and metadata Identifiers|
|OpenEBench||ELIXIR benchmarking platform to support community-led scientific benchmarking efforts and the technical monitoring of bioinformatics reosurces||Data analysis Data Steward: research||Tool info|
|OwnCloud||Cloud storage and file sharing service||Data storage Data transfer Data analysis|
|REDCap||REDCap is a secure web application for building and managing online surveys and databases. While REDCap can be used to collect virtually any type of data in any environment, it is specifically geared to support online and offline data capture for research studies and operations.||Identifiers Data Steward: research Data quality||Tool info|
|REMS||REMS (Resource Entitlement Management System), developed by CSC, is a tool that can be used to manage researchers’ access rights to datasets.||TransMed||Tool info Training|
|Research Data Management Platform (RDMP)||Data management platform for automated loading, storage, linkage and provision of data sets||Data storage||Tool info|
|Rstudio||Rstudio notebooks allow to share code, documentation||Data analysis Researcher||Tool info Training|
|Schema.org||Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond.||Machine actionability||Standards/Databases Training|
|Scientific Data's Recommended Repositories||List of respositories recommended by Scientific Data, contains both discipline-specific and general repositories.||Data publication Researcher Data Steward: research|
|semares||All-in-one platform for life science data management, semantic data integration, data analysis and visualization||Researcher Data Steward: research Documentation and metadata Data analysis Data storage|
|Singularity||Singularity is a container platform.||Data analysis TSD||Training|
|Snakemake||Snakemake is a framework for data analysis workflow execution||Data analysis||Tool info Training|
|The Genomic Standards Consortium (GSC)||Minimum Information about any (x) Sequence||Documentation and metadata Researcher Data Steward: policy Human data||Standards/Databases|
|WinSCP||WinSCP is a popular SFTP client and FTP client for Microsoft Windows! Copy file between a local computer and remote servers using FTP, FTPS, SCP, SFTP, WebDAV or S3 file transfer protocols.||Data transfer|
|ENA upload tool||
The program submits experimental data and respective metadata to the European Nucleotide Archive (ENA).
|Data Steward: research Researcher|
PIPPA, the PSB Interface for Plant Phenotype Analysis, is the central web interface and database that provides the tools for the management of the plant imaging robots on the one hand, and the analysis of images and data on the other hand.
|Plant sciences Data Steward: research Researcher|
Belnet is the privileged partner of higher education, research and administration for connectivity. We provide high-bandwidth internet access and related services for our specific target groups.
|Data Steward: research Researcher Data transfer|
|Flemish Supercomputing Center (VSC)||
VSC is the Flanders’ most highly integrated high-performance research computing environment, providing world-class services to government, industry, and researchers.
|Data Steward: research Data analysis Data storage|
Plant Genomics and Phenomics Research Data Repository
|Data storage Documentation and metadata Researcher Data Steward: research Plant sciences Plant Genomics|
Data management platform for organising, sharing and publishing research datasets, models, protocols, samples, publications and other research outcomes.
|Data storage Documentation and metadata Researcher Data Steward: research|
|Red Española de Supercomputación||
The Spanish Supercomputing Network’s mission is to offer the resources and services of supercomputing and data management necessary for the development of innovative and high-quality scientific and technological projects, through competitive calls based on the scientific excellence of the projects to be developed.
|Researcher Data Steward: research|
Spanish academic and research network that provides advanced communication services to the scientific community and national universities.
|Researcher Data Steward: research|
The national aggregator of open access repositories. This platform brings together all the Spanish digital infrastructures in which open access research results are published and / or deposited.
|Researcher Data Steward: research|
Open data portal of the spanish government. A meeting point for the various actors that make up the open data ecosystem.
|Researcher Data Steward: research|
Chipster is a user-friendly analysis software for high-throughput data such as RNA-seq and single cell RNA-seq. It contains analysis tools and a large reference genome collection.
|CSC Researcher Data analysis|