Tool assembly: TransMed
What is the TransMed data and computing tool assembly?
The TransMed data and computing tool assembly is an infrastructure provided by ELIXIR Luxembourg for clinical and translational projects. TransMed assembly provides the tools for managing ongoing projects that often require the management of cohort recruitment, and processing of samples, data and metadata. This entails GDPR-compliant and secure data collection, storage, curation, standardisation integration and analysis of clinical data and associated molecular, imaging and sensor/mobile data and metadata.
TransMed tool assembly is also a blueprint showing how a collection of tools can be combined to support data lifecycle management in clinical and translational projects.
Who can use the TransMed data and computing tool assembly?
All researchers can use tools in the TransMed assembly individually or in combination depending on their project needs. Most of the tools in the TransMed assembly are open-source and can be re-used. ELIXIR Luxembourg provides know-how transfer and training on the tool assembly upon request from researchers and data steward organisations. To make a request please contact firstname.lastname@example.org.
Additionally, ELIXIR Luxembourg provides hosting of the TransMed assembly. Hosting of tools and data is free of charge for national users. For international users hosting of data (up to 10TB) is free on the basis that the data is shared with the wider research community with an appropriate access model such as controlled access. For international users, charges for the hosting tools and hosting of large datasets are evaluated on a case-by-case, please contact email@example.com for details.
For what purpose can the TransMed assembly be used?
Data management planning
Translational Biomedicine projects often deal with sensitive data from human subjects. Therefore, data management planning of this type of projects needs to take data protection and GDPR compliance into account .
Typically a TransMed project involves multiple (clinical) study sites and can contain several cohorts. During the planning phase the dataflow for the project and data/metadata collected prospectively or retrospectively needs to be documented. Projects can use the Data Information Sheet DISH to map the project dataflow and collect metadata necessary for GDPR-compliant processing. In addition, a data protection impact assessment needs to be performed taking into account partner roles, responsibilities and the data information collected via the DISH. For this purpose TransMed assembly uses the Data Information System - DAISY, which indexes all information collected by DISH and provides a repository to accumulate GDPR-required project documentation such as ethics approvals and consent templates and subject information sheets and ultimately the project data management plan. TransMed assembly includes the risk management tool MONARC, which can be used to perform Data Protection Impact Assessments (DPIA). DPIAs are a requirement of the GDPR for projects dealing with sensitive human data.
Data collection, transfer and storage
For projects involving patient recruitment the TransMed assembly provides the Smart Scheduling System, SMASCH, tracking availability of resources in clinics and manages patient visits. Pseudonymised clinical data and patient surveys are then collected by the state of the art electronic data capture (EDC) system REDCap through a battery of electronic case report forms (eCRFs). Imaging data from the clinics are deposited into a dedicated imaging platform XNAT. Omics data, both in raw and derived form can be deposited to the data provenance system iRODS. The transfer of data files can be done via various encrypted communication options as outlined in the Data transfer section of the RDMkit. The TransMed assembly most typically utilises (S)FTP, Aspera FASP and ownCloud. Data is also encrypted at rest with hard-ware and also with file-level encryption using either open-source utilities such as gpg or commercial options such as Aspera FASP.
Data curation and harmonisation
To facilitate cross-cohort/cross-study interoperability of data, upon collection, the data needs to be curated and harmonised. For this purpose the TransMed assembly uses a variety of open standards and tools. For data quality and cleansing the assembly uses OpenRefine, which provides an intuitive interface to generate facets of data that support the research to identify quality issues and outliner. It also enables traceable and yet easy data correction. For data Extraction, Transformation and Loading (ETL) the assembly uses Talend Open Studio (for complex and reusable ETLs) as well as R and Python (for ad-hoc and simple transformation). To evaluate and improve FAIRness of datasets, the assembly follows the recipes in the FAIR Cookbook developed by the FAIRplus consortium. Related to standard data models and ontologies the assembly follows the recommendations in the FAIR Cookbook recipe for selecting terminologies and ontologies.
Data integration and analysis
TransMed projects usually require different data types from different cohorts to be integrated into one data platform for the exploring, sub-setting and integrated analysis for hypothesis generation. The TransMed assembly consists of several such tools: Ada is a web-based tool to provide a performant and highly configurable system for secured integration, visualization, and collaborative analysis of heterogeneous data sets, primarily targeting clinical and experimental sources. The assembly also includes other tools for specific data types, such as ATLAS that integrate features from various OHDSI applications for Electronic Health Record data in OMOP format into a single cohesive experience. Transmart is a tool that provides easy integration between phenotypic/clinical data and molecular data and a “drag-and-drop” fashion data exploration interface.
To facilitate the findability of data the TransMed assembly provides a Data/Sample Catalog tool that supports the indexing search and discovery of studies, data sets and samples accumulated in the context of projects from different sites and cohorts. The catalog implements a controlled-access model by integration with AAI REMS. Audit trailing of data access is achieved by integration of the DAISY tool in the access process. The catalog tool can be integrated with various identity management systems such as Keycloak, ELIXIR-AAI or Free-IPA.
Compliance monitoring & measurement
Measure compliance to data management regulations and standards.
How to find appropriate storage solutions.
Documentation and metadata
How to document and describe your data.
Best practices to name and organise research data.
How to make data analysis fair.
How to identify different research data types.
How to make research data compliant to gdpr.
Data management plan
How to write a data management plan (dmp).
Relevant tools and resourcesSkip tool table
|Tool or resource||Description||Related pages||Registry|
|Ada Discovery Analytics (Ada)||Ada is a performant and highly configurable system for secured integration, visualization, and collaborative analysis of heterogeneous data sets, primarily targeting clinical and experimental sources.||Data analysis|
|Atlas||Free, publicly available web-based, open-source software application developed by the OHDSI community to support the design and execution of observational analyses to generate real world evidence from patient level observational data.||Data Steward: research Researcher||Tool info Training|
|DAISY||Data Information System to keep sensitive data inventory and meet GDPR accountability requirement.||Data Steward: infrastructure Data Steward: policy Human data Data protection||Tool info|
|Data Catalog||Unique collection of project-level metadata from large research initiatives in a diverse range of fields, including clinical, molecular and observational studies. Its aim is to improve the findability of these projects following FAIR data principles.||Documentation and metadata|
|ELIXIR-AAI||The ELIXIR Authentication and Authorisation Infrastructure (AAI)||Sensitive data NeLS TSD||Training|
|FAIR Cookbook||FAIR Cookbook is an online resource for the Life Sciences with recipes that help you to make and keep data Findable, Accessible, Interoperable and Reusable (FAIR)||Compliance monitoring & measurement Data Steward: research|
|Free-IPA||FreeIPA is an integrated Identity and Authentication solution for Linux/UNIX networked environments.||Data Steward: infrastructure|
|iRODS||Integrated Rule-Oriented Data System (iRODS) is open source data management software for a cancer genome analysis workflow.||Data storage Data Steward: infrastructure Bioimaging data||Tool info|
|Keycloak||Keycloak is an open source identity and data access management solution.||Data Steward: infrastructure||Training|
|MONARC||A risk assessment tool that can be used to do Data Protection Impact Assessments||Data protection Data Steward: policy Human data||Standards/Databases|
|OHDSI||Multi-stakeholder, interdisciplinary collaborative to bring out the value of health data through large-scale analytics. All our solutions are open-source.||Researcher Data Steward: research Data analysis Data storage Toxicology data||Tool info|
|OMOP-CDM||OMOP is a common data model for the harmonisation for of observational health data.|
|OpenStack||OpenStack is an open source cloud computing infrastructure software project and is one of the three most active open source projects in the world||Data storage Data analysis IFB||Training|
|REMS||REMS (Resource Entitlement Management System), developed by CSC, is a tool that can be used to manage researchers’ access rights to datasets.||Data Steward: infrastructure||Tool info Training|
|SMASCH||SMASCH (Smart Scheduling) system, is a web-based tooldesigned for longitudinal clinical studies requiring recurrent follow-upvisits of the participants. SMASCH controls and simplifies the scheduling of big database of patients. Smasch is also used to organize the daily plannings (delegation of tasks) for the different medical professionals such as doctors, nurses and neuropsychologists.||Data organisation|
|Talend||Talend is an open source data integration platform.||Data Steward: research Researcher|
|tranSMART||Knowledge management and high-content analysis platform enabling analysis of integrated data for the purposes of hypothesis generation, hypothesis validation, and cohort discovery in translational research.||Researcher Data Steward: research Data analysis Data storage||Tool info|
|XNAT||Open source imaging informatics platform. It facilitates common management, productivity, and quality assurance tasks for imaging and associated data.||Researcher Data analysis XNAT-PIC Bioimaging data|