Description

As a infrastructure data steward, I focus on liaising with the people involved in the IT infrastructure, technicians, application managers and other service providers inside and outside my research institute. My task is to translate the requirements of policies and science into suitable IT solutions and tools as well as provide advice. I implement IT infrastructure solutions, give access to data and software, and I may also perform hands-on work in a research project.

Focus

  • Identify the requirements of an adequate data infrastructure and tool landscape that fits with research data management (RDM) policies
  • Ensure the compliance of the data infrastructure and tool landscape with codes of conduct and regulations
  • Align the data infrastructure and tool landscape to the FAIR (Findable, Accessible, Interoperable, Reusable) data principles and the principles of Open Science, and facilitate and support FAIR data
  • Identify the requirements of and provide access to data infrastructure for RDM for researchers
  • Make an inventory for data infrastructure and tools that fit with the researchers RDM needs
  • Liaison and align the data infrastructure and tools management in and outside the organisation
  • Facilitate the availability of local data-infrastructure and tools for FAIR and long term archiving of data

Learning path

Institutes across Europe have started hiring professional data stewards. A infrastructure oriented data steward is expected to be competent in the following areas:

  • Advise and assist researchers on short and long term actions for data infrastructure and tools
  • Continuously monitor data infrastructure and tools available inside and outside the institute, in close collaboration with the responsible (IT) department
  • Assess technical knowledge status of researchers and relevant stakeholders (including other data stewards), and if needed give training in technical RDM skills
  • Communicate with researchers, technical staff and support staff about data infrastructure and tools
  • Knowledge of encryption of data, data protection and data security protocols
  • Translate ethical and technical requirements for data infrastructure and tools to technological measures, while understanding the research requirements and limitations
  • Translate the FAIR data principles into data infrastructure and tool requirements
  • Advise researchers and stakeholders (including other data stewards) on archiving solutions, including (meta)data standards

If you want to become competent in these areas or build capacity in your institution then the following training resources might be useful:

Common problems

Resources

Relevant tools and resources

Tool or resource Description Tags Registry
Beacon The Beacon protocol defines an open standard for genomics data discovery. researcher data manager IT support human data
Bioconda Bioconda is a bioinformatics channel for the Conda package manager IT support data analysis
Bitbucket Git based code hosting and collaboration tool, built for teams. data organisation data manager IT support
Box Cloud storage and file sharing service storage IT support transfer
BrAPI Specification for a standard API for plant data: plant material, plant phenotyping data IT support plants
Conda Open source package management systemn IT support data analysis
Crop Ontology The Crop Ontology compiles concepts to curate phenotyping assays on crop plants, including anatomy, structure and phenotype. researcher data manager IT support plants
DAISY Data Information System to keep sensitive data inventory and meet GDPR accountability requirement. IT support policy officer human data data protection
DATAVERSE Open source research data respository software. storage researcher data manager IT support
dbGAP The database of Genotypes and Phenotypes (dbGaP) archives and distributes data from studies investigating the interaction of genotype and phenotype in Humans data publication researcher IT support human data
Docker Docker is a software for the execution of applications in virtualized environments called containers. It is linked to DockerHub, a library for sharing container images IT support data analysis
DropBox Cloud storage and file sharing service storage IT support transfer
DS-Wizard Data Stewardship Wizard DMP researcher data manager IT support nels
e!DAL Electronic data archive library is a framework for publishing and sharing research data storage IT support
e!DAL-PGP Plant Genomics and Phenomics Research Data Repository plants researcher data manager IT support
ELIXIR Deposition Databases for Biomolecular Data List of discipline-specific deposition databases recommended by ELIXIR. data publication researcher data manager IT support
EUPID EUPID provides a method for identity management, pseudonymisation and record linkage to bridge the gap between multiple contexts. IT support policy officer human data
GA4GH data security toolkit Principled and practical framework for the responsible sharing of genomic and health-related data. data publication policy officer data manager IT support human data
GA4GH regular and ethical toolkit Framework for Responsible Sharing of Genomic and Health-Related Data data protection sensitive policy officer data manager IT support human data
Galaxy Open, web-based platform for data intensive biomedical research. Whether on the free public server or your own instance, you can perform, reproduce, and share complete analyses. nels data analysis researcher IT support
Git Distributed version control system designed to handle everything from small to very large projects data organisation data manager IT support
GitHub Versioning system, used for sharing code, as well as for sharing of small data data publication data organisation IT support data manager
GitLab GitLab is an open source end-to-end software development platform with built-in version control, issue tracking, code review, CI/CD, and more. Self-host GitLab on your own servers, in a container, or on a cloud provider. data organisation data publication IT support data manager
Informed Consent Ontology The Informed Consent Ontology (ICO) is an ontology for the informed consent and informed consent process in the medical field. IT support policy officer human data
iRODS Integrated Rule-Oriented Data System (iRODS) is open source data management software for a cancer genome analysis workflow. storage IT support
ISA-tools Open source framework and tools helping to manage a diverse set of life science, environmental and biomedical experiments using the Investigation Study Assay (ISA) standard IT support data manager micro biotech
Jupyter Jupyter notebooks allow to share code, documentation IT support data analysis
maDMP - Research Bridge Machine-Actionable Data Management Plan | Webinar (2016) on making a good data management plan. DMP IT support
MCPD The Multi-Crop Passport Descriptor is the metadata standard for plant genetic resources maintained ex situ by genbanks. metadata researcher IT support policy officer plants
Microsoft Azure Cloud storage and file sharing service from Microsoft storage IT support transfer
Microsoft OneDrive Cloud storage and file sharing service from Microsoft storage IT support
NextCloud As fully on-premises solution, Nextcloud Hub provides the benefits of online collaboration without the compliance and security risks. storage IT support transfer
Nextflow Nextflow is a framework for data analysis workflow execution IT support data analysis
ONTOMATON OntoMaton facilitates ontology search and tagging functionalities within Google Spreadsheets. researcher data manager IT support
OpenEBench ELIXIR benchmarking platform to support community-led scientific benchmarking efforts and the technical monitoring of bioinformatics reosurces data analysis data manager IT support
OwnCloud Cloud storage and file sharing service storage IT support transfer data analysis
Research Data Management Platform (RDMP) Data management platform for automated loading, storage, linkage and provision of data sets storage IT support
Rstudio Rstudio notebooks allow to share code, documentation data analysis IT support researcher
Scientific Data's Recommended Repositories List of respositories recommended by Scinetific Data, contains both discipline-specific and general repositories. data publication researcher data manager IT support
FAIRDOM-SEEK Data, model and SOPs management for projects, from preliminary data to publication, support for running SBML models etc. storage IT support nels micro biotech
Singularity Singularity is a container platform. IT support data analysis
Snakemake Snakemake is a framework for data analysis workflow execution IT support data analysis
The Genomic Standards Consortium (GSC) Minimum Information about any (x) Sequence metadata researcher IT support policy officer human data