Skip to content Skip to footer

Your domain: Human data

Introduction

When you do research on data derived from human individuals (hereon human data), there are additional aspects that must be considered during the data life cycle. Note, much of the topics discussed on this page will refer to the General Data Protection Regulation (GDPR) as it is a central piece of legislation that affects basically all research taking place in the European Union (EU) using human data or research with data of individuals residing in the EU. Much of the information on this page is of a general nature when it comes to working with human data, an additional focus is on human genomic data and the sharing of such information for research purposes.

Planning for projects with human data

Description

When working with human data, you must follow established research ethical guidelines and legislations. Preferably, planning for these aspects should be done before starting to handle personal data and in some cases such as in the case of the GDPR, it is an important requirement by laws and regulations.

Considerations

  • Have you got an ethical permit for your research project?
  • The acquisition of data must be legal.
    • Receiving data/samples directly from data subjects requires in most cases informed consents.
      • An informed consent is an agreement from the research subject to participate in and share personal data for a particular purpose. It shall describe the purpose and any risks involved (along with any mitigations to minimise those risks) in such a way that the research subject can make an informed choice about participating. It should also state under what circumstances the data can be used for the initial purpose, as well as for later re-use by others.
        • Consider adoption of formalised machine-readable description of data use conditions. This will greatly improve the possibilities to make the data FAIR later on.
      • Informed consents should be acquired for different purposes:
        • It is a cornerstone of research ethics. Regardless of legal obligations, it is important to ask for informed consents as it is a good research ethics practice and maintains trust in research.
        • Ethical permission legislation to perform research on human subjects demand informed consents in many cases.
        • Personal data protection legislation might have informed consent as one legal basis for processing the personal data.
        • Note that the content of an informed consent, as defined by one piece of legislation, might not live up to the demands of another piece of legislation. For example, an informed consent that is good enough for an ethical permit, might not be good enough for the demands of the GDPR.
    • Receiving data from a collaborator must be covered by a contract. Ensure detailed provisions on data use, retention, re-use and publication are included in the agreements (Data Use agreement, Consortium agreement, Data Sharing agreement, …). This applies also to samples you receive from a collaborator. Related contract (e.g. Material Transfer Agreement - MTA) should cover use of human data generated from these samples. Incomplete legal framework for the data use can require lengthy legal amendments and can result in your in-ability to comply with requirements set out by your funder or targeted publisher.
    • Receiving data from a repository also comes with certain use restrictions. These are either defined in the license attributed to the data or defined in a dataset specific access policy and terms of service of the repository.
  • Personal data protection legislation:
    • Within the EU. If you are performing human data research in the EU, or your data subjects are located in the EU, then you must adhere to the General Data Protection Regulation - GDPR.
      • Requirements for research that fall under the GDPR are outlined in the RDMkit GDPR compliance page.
      • Attributes of the data determines data sensitivity and sensitivity affects the considerations for data handling. The RDMkit Data Sensitivity page provides guidance on determining and reducing data sensitivity.
    • Outside the EU. For countries outside the EU, the International Compilation of Human Research Standards list relevant legislations.

Solutions

Processing and analysing human data

Description

For human data, it is very important to use technical and procedural measures to ensure that the information is kept secure. There might exist legal obligations to document and implement measures to ensure an adequate level of security.

Considerations

  • Establish adequate Information security measures. This should be done for all types of research data, but is even more important for human data.
    • Information security is usually described as containing three main aspects - Confidentiality, Integrity, and Accessibility.
      • Confidentiality is about measures to ensure that data is kept confidential from those that do not have rights to access the data.
      • Integrity is about measures to ensure that data is not corrupted or destroyed.
      • Accessibility is about measures to ensure that data can be accessed by those that have a right to access it, when they need to access it.
    • Information security measures are both procedural and technical.
    • What information security measures that need to be established should be defined at the planning stage (see above), when doing a risk assessment, e.g. the GDPR Data Protection Impact Assessment. This should identify information security risks, and define measures to mitigate those risks.
    • Contact the IT or Information security office at your institution to get guidance and support to address these issues.
    • ISO/IEC 27001 is an international information security standard adopted by data centres of some universities and research institutes.
  • Check whether there are local/national tools and platforms suited to handle human data.
    • Local research infrastructures have established compute and/or storage solutions with strong information security measures tailored for working on human data. The RDMkit national resources page lists the sensitive data support facilities available in various countries. Contact your institute or your ELIXIR node for guidance.
    • There are also emerging alternative approaches to analyse sensitive data, such as doing “distributed” computation, where defined analysis workflows are used to do analysis on datasets that do not leave the place where they are stored.
  • Take data quality into account. When processing human data, data quality is a very important aspect to consider because it can influence the results of the research. Especially in the healthcare sector, some of the data that is used for research was not collected for research purposes, and therefore it is not guaranteed to have sufficient quality. Check the RDMkit Data Quality page to learn more about how to assess the quality of health data.

Solutions

  • EUPID is a tool that allows researchers to generate unique pseudonyms for patients that participate in rare disease studies.
  • RD-Connect Genome Phenome Analysis Platform is a platform to improve the study and analysis of Rare Diseases.
  • DisGeNET is a platform containing collections of genes and variants associated to human diseases.
  • PMut is a platform for the study of the impact of pathological mutations in protein structures.
  • IntoGen collects and analyses somatic mutations in thousands of tumor genomes to identify cancer driver genes.
  • BoostDM is a method to score all possible point mutations in cancer genes for their potential to be involved in tumorigenesis.
  • Cancer Genome Interpreter is designed to identify tumor alterations that drive the disease and detect those that may be therapeutically actionable.
  • GA4GH’s Data Security, and GA4GH Genomic Data Toolkit provide policies, standards for the secure transfer and processing of human genomics data. GA4GH standards are often implemented into multiple tools. For example, the Crypt4GH data encryption standard is implemented both in SAMTools and also provided as a utility from the EGA Archive, Crypt4GH.
  • GA4GH’s Cloud Workstream is a more recent initiative and focuses on keeping data in secure cloud environments and meanwhile bringing computational analysis to the data.
  • The ERPA is a Web-based tool allowing users to create and manage a register of personal data processing activities (ROPA).
  • OTP is a data management platform for running bioinformatics pipelines in a high-throughput setting, and for organising the resulting data and metadata.

Preserving human data

Description

It is a good ethical practice to ensure that data underlying research is preserved, preferably in a way that adheres to the FAIR principles. There might also exist legal obligations to preserve the data. With human data, you have to take extra precautions into account when doing this.

Considerations

  • Depositing data in an international repository
    • To make the data as accessible as possible according to the FAIR principles, do deposit the data in an international repository under controlled access whenever possible, see the section Sharing & Reusing of human data below
  • Legal obligations for preserving research data
    • In some countries there are legal obligations to preserve research data long-term, e.g. for ten years.
    • Even if the data has been deposited in an international repository, this might not live up to the requirements of the law.
    • The legal responsibility for preserving the data would in most cases lie with the research institution where you perform your research. You should consult the Research Data and/or IT support functions of your institution.
  • Information security
    • The solutions you use need to provide information security measures that are appropriate for storing personal data, see the section Processing and Analysing human data above. Note that the providers of the solutions must be made aware that there are probably extra information security measures needed for long-term storage of this type of data.
  • Regardless of where your data is preserved long-term, do ensure that it is associated with proper metadata according to community standards, to promote FAIR sharing of the data.
  • Planning for long-term storage
    • Do address these issues of long-term preservation and data publication as early as possible, preferably already at the planning stage. If you are relying on your research institution to provide a solution, it might need time to plan for this.

Solutions

  • GA4GH Data Security Toolkit
  • ISO/IEC 27001 is an international information security standard adopted by data centres of some universities and research institutes.
  • The European Genome-phenome Archive (EGA) is an international service for secure archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical studies and healthcare centres. All services are free of charge. The EGA stores the data and metadata long-term, without ending date of the service. The data is backed-up in two separate geographical locations. The storing is GDPR-compliant, thanks to the use of Ga4GH encryption standard and continuously kept up-to-date. National repositories working as Federated EGA nodes are available in some countries like Sweden, Norway, Finland, Germany and Spain. Those may address specific additional national legal needs, not included in European regulation.
  • DPIA Knowledge Model is a DSW knowledge model guiding users through a set of questions to collect information necessary for a research project Data Protection Impact Assessment (DPIA).

Sharing and reusing of human data

Description

To make human data reusable for others, it must be discoverable, stored in a safe way, and it must be clear under what circumstances it can be reused.

Considerations

  • Selecting suitable access modes for sharing human data:
    • Human data often carries restrictions to its use and it would need to be shared in a manner that obeys such restrictions. There are three access modes for sharing research data:
      • Open access: Data is shared publicly. Open-access is a rarely used access mode for the sharing of human data. To use open-access researchers need to ensure that the shared data cannot be traced back to individual study participants. In other words the data needs to be anonymised, which is difficult in practice.
      • Registered access: Data is shared with researchers, whose “researcher” status has been vouched for by their institution and who agree to abide by data usage policies of repositories that serve the shared data. Datasets that are shared via registered-access would typically have no restrictions besides the condition that data is to be used for research.
      • Controlled access: Data can only be shared with researchers, whose research is reviewed and approved by a Data Access Committee (DAC) - typically, researchers who are/were involved in the primary collection of data will form the DAC. Use conditions for controlled-access could be a multitude and includes allowed research topics, allowed geographical regions, allowed recipients e.g. non-profit organisations.
  • Publishing human data:
    • It is highly recommended that human data is shared under controlled access. There are emerging models of sharing data through repositories under federated models.
  • Transferring human data:
    • Transferring human data has to be done in a secure way in order to avoid breaches of privacy. Encrypting of human data whilst it is being transferred provides successful protection if the data is intercepted by an external party while the transfer is being done.

Solutions

  • The The European Genome-phenome Archive (EGA) is an international service for secure archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical studies and healthcare centres. All services are free of charge. The EGA platform offers secure and European law-compliant data sharing. Data treatment is FAIR-compliant, thus data is discoverable in the EGA website and shareable with other researchers through authorisation and authentication protocols. The right to allow access to any dataset belongs to the Data controllers (and not to the EGA), who are responsible to sign a Data Access Agreement (DAA) with researchers requesting access to their data. Templates of the legal documents are provided. The EGA hosts data from all around the world and distributes it where and when the data controllers permit.
  • dbGAP and JGA are other international data repositories, based in the USA and Japan respectively, that adopt a controlled-access model based on their national regulations. Due to European GDPR specific requirements, it may not be possible to deposit EU subjects’ data to these repositories.
  • The Beacon project is a GA4GH initiative that enables genomic and clinical data sharing across federated networks. A Beacon is defined as a web-accessible service that can be queried for information about a specific allele with no reference to a specific sample or patient, thereby reducing privacy risks.
  • The Data Use Ontology (DUO) is an international standard, which provides codes to represent data use restrictions for controlled access datasets.
  • Crypt4GH is a Python tool to encrypt, decrypt or re-encrypt files, according to the GA4GH encryption file format.
  • HumanMine is an integrative database of Homo sapiens genomic data, that integrates many types of human data and provides a powerful query engine, export for results, analysis for lists of data and FAIR access via web services.

Related pages

More information

With Data Stewardship Wizard (DSW), you can create, plan, collaborate, and bring your data management plans to life with a tool trusted by thousands of people worldwide — from data management pioneers, to international research institutes.

Skip tool table
Tool or resource Description Related pages Registry
BBMRI-ERIC's ELSI Knowledge Base The ELSI Knowledge Base is an open-access resource platform that aims at providing practical know-how for responsible research. Ethical aspects GDPR compliance
Beacon The Beacon protocol defines an open standard for genomics data discovery. Tool info Standards/Databases Training
BoostDM BoostDM is a method to score all possible point mutations (single base substitutions) in cancer genes for their potential to be involved in tumorigenesis. Tool info
Cancer Genome Interpreter Cancer Genome Interpreter (CGI) is designed to support the identification of tumor alterations that drive the disease and detect those that may be therapeutically actionable. Tool info
Consent Clauses for Genomic Research A resource for researchers when drafting consent forms so they can use language matching cutting-edge GA4GH international standards
Crypt4GH A Python tool to encrypt, decrypt or re-encrypt files, according to the GA4GH encryption file format. Training
DAISY Data Information System to keep sensitive data inventory and meet GDPR accountability requirement. TransMed GDPR compliance Tool info Training
Data Agreement Wizard (DAWID) The Data Agreement Wizard is a tool developed by ELIXIR-Luxembourg to facilitate data sharing agreements. GDPR compliance
Data Use Ontology (DUO) DUO allows to semantically tag datasets with restriction about their usage. Ethical aspects Standards/Databases Training
dbGAP The database of Genotypes and Phenotypes (dbGaP) archives and distributes data from studies investigating the interaction of genotype and phenotype in Humans Tool info Standards/Databases Training
DisGeNET A discovery platform containing collections of genes and variants associated to human diseases. Toxicology data Tool info Standards/Databases Training
DPIA Knowledge Model A DSW knowledge model guiding users through a set of questions to collect information necessary for a research project Data Protection Impact Assessment (DPIA). GDPR compliance
ERPA Web-based tool allowing users to create and manage a register of personal data processing activities (ROPA). GDPR compliance
EU General Data Protection Regulation Regulation (eu) 2016/679 of the european parliament and of the council on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation). TSD Data sensitivity
EUPID EUPID provides a method for identity management, pseudonymisation and record linkage to bridge the gap between multiple contexts.
GA4GH Data Security Toolkit Principled and practical framework for the responsible sharing of genomic and health-related data.
GA4GH Genomic Data Toolkit Open standards for genomic data sharing.
GA4GH Regulatory and Ethics toolkit Framework for Responsible Sharing of Genomic and Health-Related Data Ethical aspects
HumanMine HumanMine integrates many types of human data and provides a powerful query engine, export for results, analysis for lists of data and FAIR access via web services. Tool info Standards/Databases Training
Informed Consent Ontology (ICO) The Informed Consent Ontology (ICO) is an ontology for the informed consent and informed consent process in the medical field. Ethical aspects Standards/Databases
International Compilation of Human Research Standards The International Compilation of Human Research Standards enumerates over 1,000 laws, regulations, and guidelines (collectively referred to as standards) that govern human subject protections in 133 countries, as well as standards from a number of international and regional organizations
IntoGen IntoGen collects and analyses somatic mutations in thousands of tumor genomes to identify cancer driver genes. Tool info
ISO/IEC 27001 International information security standard Compliance monitoring ...
JGA The Japanese Genotype-phenotype Archive (JGA) is a service for permanent archiving and sharing of all types of individual-level genetic and de-identified phenotypic data resulting from biomedical research projects.
MONARC A risk assessment tool that can be used to do Data Protection Impact Assessments TransMed
OTP One Touch Pipeline (OTP) is a data management platform for running bioinformatics pipelines in a high-throughput setting, and for organising the resulting data and metadata. Tool info
PMut Platform for the study of the impact of pathological mutations in protein stuctures. Tool info
Privacy Impact Assessment Tool (PIA) The open source PIA software helps to carry out data protection impact assessment Tool info
RD-Connect Genome Phenome Analysis Platform The RD-Connect GPAP is an online tool for diagnosis and gene discovery in rare disease research. Tool info Training
The European Genome-phenome Archive (EGA) EGA is a service for permanent archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical research projects
CSC TSD Data publication Tool info Standards/Databases Training
Tryggve ELSI Checklist A list of Ethical, Legal, and Societal Implications (ELSI) to consider for research projects on human subjects NeLS TSD GDPR compliance
Skip national tools table

Tools and resources tailored to users in different countries.

Tool or resource Description Related pages Registry
BioMedIT

A secure IT network for the responsible processing of health-related data.

Data analysis Data sensitivity
Federated EGA Finland

FEGA allows you to store and share sensitive data in Finland in a way that fulfils all the requirements of the General Data Protection Regulation (GDPR).

The European Genome-phenome Archive (EGA)
CSC Researcher Data Steward Data sensitivity Data publication Existing data
Findata

The Health and Social Data Permit Authority. Findata offers services and enables secure and efficient utilisation of data materials containing health and social data.

CSC Researcher Data Steward Data sensitivity Existing data
Fingenious

Finnish Biobank Cooperative (FINBB) connects researchers to Finnish biomedical research. Via Fingenious® services the researcher can connect to all Finnish public bio banks.

CSC Researcher Data Steward Data sensitivity
Sensitive Data Services for Research

CSC Sensitive Data Services for Research are designed to support secure sensitive data management through web-user interfaces accessible from the user’s own computer.

CSC Researcher Data Steward Data sensitivity Data analysis Data storage Data publication
23 Things for Research Data Management tool

Shared reference tool for knowledge on data management.

Data management plan Compliance monitoring ...
BBMRI catalogue

Biobanking Netherlands makes biosamples, images and data findable, accessible and usable for health research.

Researcher Data analysis Existing data Data storage
cBioPortal for Cancer Genomics

cBioPortal provides a web-based resource for researchers to explore, visualize, analyze, and share multidimensional cancer genomic datasets, as well as other studies involving multidimensional genomic data.

Researcher Data analysis Existing data Data storage
CBS, Statistics Netherlands

The national statistical office, Statistics Netherlands (CBS), provides reliable statistical information and data in the life sciences and health domain.

Researcher Existing data
Dutch COVID-19 Data Support Programme

To support investigators and health care professionals with tools and services in their search for ways to overcome the pandemic and its health consequences.

Researcher Existing data
Handbook for Adequate Natural Data Stewardship

Guidelines on data stewardship and practical toolbox for researchers at Dutch University Medical Centres (UMCs).

Researcher Data management plan Compliance monitoring ...
Health-RI Service Catalogue

Health-RI provides a set of tools and services available to the biomedical research community.

Researcher Data analysis Existing data Data storage
RIVM Health and Healthcare Data

The Dutch National Institute for Public Health and the Environment (RIVM), together with other organisations, provides numbers and explanation on relevant topics, to prevent duplication of data collection.

Researcher Existing data
Technology Hotels

More than 130 Technology Hotels offer access to high-end technology and expertise in the field of bioimaging, bioinformatics, genomics, medical imaging, metabolomics, phenotyping, proteomics, structural biology, and/or systems biology.

Bioimaging data Proteomics Researcher Compliance monitoring ...
Federated EGA Norway node

Federated instance collects metadata of -omics data collections stored in national or regional archives and makes them available for search through the main EGA portal. With this solution, sensitive data will not physically leave the country, but will reside on TSD.

The European Genome-phenome Archive (EGA)
Data sensitivity Existing data Data publication TSD
HUNTCloud

The HUNT Cloud, established in 2013, aims to improve and develop the collection, accessibility and exploration of large-scale information. HUNT Cloud offers cloud services and lab management. It is a key service that has established a framework for data protection, data security, and data management. HUNT Cloud is owned by NTNU and operated by HUNT Research Centre at the Department of Public Health and Nursing at the Faculty of Medicine and Health Sciences.

Data analysis Data sensitivity Data storage
Norwegian COVID-19 Data Portal

The Norwegian COVID-19 Data Portal aims to bundle the Norwegian research efforts and offers guidelines, tools, databases and services to support Norwegian COVID-19 researchers.

Data sensitivity Existing data Data publication
RETTE

System for Risk and compliance. Processing of personal data in research and student projects at UiB.

Data security GDPR compliance Data sensitivity Policy maker Data Steward
SAFE

SAFE (secure access to research data and e-infrastructure) is the solution for the secure processing of sensitive personal data in research at the University of Bergen. SAFE is based on the “Norwegian Code of conduct for information security in the health and care sector” (Normen) and ensures confidentiality, integrity, and availability are preserved when processing sensitive personal data. Through SAFE, the IT department offers a service where employees, students and external partners get access to dedicated resources for processing of sensitive personal data.

Data analysis Data sensitivity Data storage
TSD

The TSD – Service for Sensitive Data, is a platform for collecting, storing, analysing and sharing sensitive data in compliance with the Norwegian privacy regulation. TSD is developed and operated by UiO.

Data analysis Data sensitivity Data storage TSD
Federated EGA Sweden node

Secure archiving and sharing of genetic and phenotypic data resulting from Swedish biomedical research projects.

The European Genome-phenome Archive (EGA)
Data sensitivity Existing data Data publication
Human Data Guidelines

Guidelines as well as further information on legal considerations when working with human biomedical data.

Data sensitivity
Swedish Pathogens Portal

The Swedish Pathogens Portal provides information, guidelines, tools and services to support researchers to utilise Swedish and European infrastructures for data sharing.

COVID-19 Data Portal Data sensitivity Existing data Data publication
Contributors