Tool assembly: TransMed

What is the TransMed data and computing tool assembly?

The TransMed data and computing tool assembly is an infrastructure provided by ELIXIR Luxembourg for clinical and translational projects. TransMed assembly provides the tools for managing ongoing projects that often require the management of cohort recruitment, and processing of samples, data and metadata. This entails GDPR-compliant and secure data collection, storage, curation, standardisation integration and analysis of clinical data and associated molecular, imaging and sensor/mobile data and metadata.

TransMed tool assembly is also a blueprint showing how a collection of tools can be combined to support data lifecycle management in clinical and translational projects.

Who can use the TransMed data and computing tool assembly?

All researchers can use tools in the TransMed assembly individually or in combination depending on their project needs. Most of the tools in the TransMed assembly are open-source and can be re-used. ELIXIR Luxembourg provides know-how transfer and training on the tool assembly upon request from researchers and data steward organisations. To make a request please contact info@elixir-luxembourg.org.

Additionally, ELIXIR Luxembourg provides hosting of the TransMed assembly. Hosting of tools and data is free of charge for national users. For international users hosting of data (up to 10TB) is free on the basis that the data is shared with the wider research community with an appropriate access model such as controlled access. For international users, charges for the hosting tools and hosting of large datasets are evaluated on a case-by-case, please contact info@elixir-luxembourg.org for details.

For what purpose can the TransMed assembly be used?

TransMed tool assembly — Figure 1. TransMed data and computing tool assembly

Data management planning

Translational Biomedicine projects often deal with sensitive data from human subjects. Therefore, data management planning of this type of projects needs to take data protection and GDPR compliance into account .

Typically a TransMed project involves multiple (clinical) study sites and can contain several cohorts. During the planning phase the dataflow for the project and data/metadata collected prospectively or retrospectively needs to be documented. Projects can use the Data Information Sheet DISH to map the project dataflow and collect metadata necessary for GDPR-compliant processing. In addition, a data protection impact assessment needs to be performed taking into account partner roles, responsibilities and the data information collected via the DISH. For this purpose TransMed assembly uses the Data Information System - DAISY, which indexes all information collected by DISH and provides a repository to accumulate GDPR-required project documentation such as ethics approvals and consent templates and subject information sheets and ultimately the project data management plan. TransMed assembly includes the risk management tool MONARC, which can be used to perform Data Protection Impact Assessments (DPIA). DPIAs are a requirement of the GDPR for projects dealing with sensitive human data.

Data collection, transfer and storage

For projects involving patient recruitment the TransMed assembly provides the Smart Scheduling System, SMASCH, tracking availability of resources in clinics and manages patient visits. Pseudonymised clinical data and patient surveys are then collected by the state of the art electronic data capture (EDC) system REDCap through a battery of electronic case report forms (eCRFs). Imaging data from the clinics are deposited into a dedicated imaging platform XNAT. Omics data, both in raw and derived form can be deposited to the data provenance system iRODS. The transfer of data files can be done via various encrypted communication options as outlined in the Data transfer section of the RDMkit. The TransMed assembly most typically utilises (S)FTP, Aspera FASP and ownCloud. Data is also encrypted at rest with hard-ware and also with file-level encryption using either open-source utilities such as gpg or commercial options such as Aspera FASP.

Data curation and harmonisation

To facilitate cross-cohort/cross-study interoperability of data, upon collection, the data needs to be curated and harmonised. For this purpose the TransMed assembly uses a variety of open standards and tools. For data quality and cleansing the assembly uses OpenRefine, which provides an intuitive interface to generate facets of data that support the research to identify quality issues and outliner. It also enables traceable and yet easy data correction. For data Extraction, Transformation and Loading (ETL) the assembly uses Talend Open Studio (for complex and reusable ETLs) as well as R and Python (for ad-hoc and simple transformation). To evaluate and improve FAIRness of datasets, the assembly follows the recipes in the FAIR Cookbook developed by the FAIRplus consortium. Related to standard data models and ontologies the assembly follows the recommendations in the FAIR Cookbook recipe for selecting terminologies and ontologies.

Data integration and analysis

TransMed projects usually require different data types from different cohorts to be integrated into one data platform for the exploring, sub-setting and integrated analysis for hypothesis generation. The TransMed assembly consists of several such tools: Ada Discovery Analytics (Ada) is a web-based tool to provide a performant and highly configurable system for secured integration, visualization, and collaborative analysis of heterogeneous data sets, primarily targeting clinical and experimental sources. The assembly also includes other tools for specific data types, such as Atlas that integrate features from various OHDSI applications for Electronic Health Record data in OMOP-CDM format into a single cohesive experience. tranSMART is a tool that provides easy integration between phenotypic/clinical data and molecular data and a “drag-and-drop” fashion data exploration interface.

Data stewardship

To facilitate the findability of data the TransMed assembly provides a Data Catalog tool that supports the indexing search and discovery of studies, data sets and samples accumulated in the context of projects from different sites and cohorts. The catalog implements a controlled-access model by integration with REMS. Audit trailing of data access is achieved by integration of the DAISY in the access process. The catalog tool can be integrated with various identity management systems such as Keycloak, Life Science Login (LS Login) or Free-IPA.

Your tasks

Compliance monitoring & measurement

How to measure compliance to data management regulations and standards.

Your tasks

Data storage

How to find appropriate storage solutions.

Your tasks

Documentation and metadata

How to document and describe your data.

Your tasks

Data organisation

Best practices to name and organise research data.

Your tasks

Data analysis

How to make data analysis FAIR.

Your tasks

Data sensitivity

How to identify the sensitivity of different research data types

Your tasks

GDPR compliance

How to protect your research data, and how to make research data compliant to GDPR.

Your tasks

Data management plan

How to write a Data Management Plan (DMP).

Your domain

Human data

Data management solutions for human data.

More information

Tools and resources on this page

Tool or resource	Description	Related pages	Registry
Ada Discovery Analytics (Ada)	Ada is a performant and highly configurable system for secured integration, visualization, and collaborative analysis of heterogeneous data sets, primarily targeting clinical and experimental sources.
Atlas	Free, publicly available web-based, open-source software application developed by the OHDSI community to support the design and execution of observational analyses to generate real world evidence from patient level observational data.		Tool info Training
DAISY	Data Information System to keep sensitive data inventory and meet GDPR accountability requirement.	Human data GDPR compliance	Tool info Training
Data Catalog	Unique collection of project-level metadata from large research initiatives in a diverse range of fields, including clinical, molecular and observational studies. Its aim is to improve the findability of these projects following FAIR data principles.		Training
FAIR Cookbook	FAIR Cookbook is an online resource for the Life Sciences with recipes that help you to make and keep data Findable, Accessible, Interoperable and Reusable (FAIR)	Health data Human pathogen genomics Compliance monitoring ...	Tool info Training
Free-IPA	FreeIPA is an integrated Identity and Authentication solution for Linux/UNIX networked environments.
iRODS	Integrated Rule-Oriented Data System (iRODS) is open source data management software for a cancer genome analysis workflow.	Bioimaging data Data storage	Tool info
Keycloak	Keycloak is an open source identity and data access management solution.		Training
Life Science Login (LS Login)	An authentication service from EOSC-Life	IFB NeLS TSD
MONARC	A risk assessment tool that can be used to do Data Protection Impact Assessments	Human data
OHDSI	Multi-stakeholder, interdisciplinary collaborative to bring out the value of health data through large-scale analytics. All our solutions are open-source.	Toxicology data Data quality	Tool info
OMOP-CDM	OMOP is a common data model for the harmonisation for of observational health data.	Cancer data Data quality
OpenRefine	Data curation tool for working with messy data	Data quality Machine actionability	Training
REDCap	REDCap is a secure web application for building and managing online surveys and databases. While REDCap can be used to collect virtually any type of data in any environment, it is specifically geared to support online and offline data capture for research studies and operations.	Cancer data Health data Data quality Identifiers	Tool info Training
REMS	REMS (Resource Entitlement Management System), developed by CSC, is a tool that can be used to manage researchers’ access rights to datasets.		Tool info Training
SMASCH	SMASCH (Smart Scheduling) system, is a web-based tooldesigned for longitudinal clinical studies requiring recurrent follow-upvisits of the participants. SMASCH controls and simplifies the scheduling of big database of patients. Smasch is also used to organize the daily plannings (delegation of tasks) for the different medical professionals such as doctors, nurses and neuropsychologists.	Health data
Talend	Talend is an open source data integration platform.
tranSMART	Knowledge management and high-content analysis platform enabling analysis of integrated data for the purposes of hypothesis generation, hypothesis validation, and cohort discovery in translational research.	Data storage	Tool info
XNAT	Open source imaging informatics platform. It facilitates common management, productivity, and quality assurance tasks for imaging and associated data.	XNAT-PIC Bioimaging data Cancer data