Introduction

When you do research on data derived from human individuals, there are some extra aspects that you need to consider during the data life cycle. Note that much of the topics discussed on this page will refer to the EU General Data Protection Regulation (GDPR) as it is a central piece of legislation that affects basically all research done on human subjects in EU and on individuals residing in the EU. Much of the information on this page is of a general nature when it comes to working with human data, an additional focus is on human genomic data and the sharing of such information for research purposes.

Planning for, and collection of, human research data

Description

To do research on human data you must follow established research ethical guidelines and legislation. Preferably planning for these aspects should be done before starting to handle the personal data, and in some cases laws even demand it, such in the case of the GDPR.

Considerations

  • Have you got an ethical permit for your research project?
    • To get an ethical permit you have to apply for an ethical review by an ethical review board.
      • The legislation that governs this differs between countries. Do seek advice from your research institute.
    • In most cases you should get informed consents from your research subjects.
      • An informed consent is an agreement from the research subject to participate in and share personal data for a particular purpose. It shall describe the purpose and any risks involved (along with any mitigations to minimize those risks) in such a way that the research subject can make an informed choice about participating. It should also state under what circumstances the data can be used for the initial purpose, as well as for later re-use by others.
        • Consider describing data use conditions using a machine-readable formalized description such as DUO. This will greatly improve the possibilities to make the data FAIR later on.
      • Informed consents should be aquired for different purposes:
        • It is a cornerstone of research ethics. Even if there are no other legal obligations for aquiring informed consents it is bad research ethics not to do it. You harm the trust in research if you don’t.
        • Ethical permission legislation to perform research on human subjects demand informed consents in many cases.
        • Personal data protection legislation might have informed consent as one legal basis for processing the personal data.
        • Note that the content of an informed consent, as defined by one piece of legislation, might not live up to the demands of another piece of legislation. For example, an informed consent that is good enough for an ethical permit, might not be good enough for the demands of the GDPR.
    • The Global Alliance for Genomics and Health (GA4GH) has recommendations for these issues in their GA4GH regulatory and ethical toolkit.
  • Personal data protection legislation
    • If you are performing research in the EU on human research subjects, or on human research subject in the EU, you must adhere to the General Data Protection Regulation - GDPR.
      • See Data protection for more information on this law.
      • The sensitivity of your data affects what considerations you have make when handling it, see Determining the sensitivity of your data for more information.
      • For some sensitive data you have to perform a Data Protection Impact Assessments. In general, any biomedical research on human subjects will need to do this.
    • Outside EU

Solutions

Processing and analysing human research data

Description

For human data it is very important to use technical and procedural measures to ensure that the information is kept secure. There might exist legal obligations to document and implement measures to ensure an adequate level of security.

Considerations

  • Establish adequate Information security measures. This should be done for all types of research data, but is even more important for human data.
    • Information security is usually described as containing three main aspects - Confidentiality, Integrity, and Accessibility.
      • Confidentiality is about measures to ensure that data is kept confidential from those that do not have rights to access the data.
      • Integrity is about measures to ensure that data is not corrupted or destroyed.
      • Accessibility is about measures to ensure that data can be accessed by those that have a right to access it, when they need to access it.
    • Information security measures are both procedural and technical.
    • What information security measures that need to be established should be defined at the planning stage (see above), when doing a risk assessment, e.g. a GDPR Data Protection Impact Assessment. This should identify information security risks, and define measures to mitigate those risks.
    • Contact the IT or Information security office at your institution to get guidance and support to address these issues.
    • ISO/IEC 27001 is an international information security standard adopted by data centres of some universities and research institutes.
  • Locating tools and platforms suited to handle human data
    • Local research infrastructures might have established compute and/or storage solutions with strong information security measures tailored for working on human data (see some examples below). Contact your institute or your ELIXIR node for guidance.
    • There are also emerging alternative approaches to analyse sensitive data, such as doing “distributed” computation, where defined analysis workflows are used to do analysis on datasets that do not leave the place where they are stored.

Solutions

  • EUPID is a tool that allows researchers to generate unique pseudonyms for patients that participate in rare disease studies.
  • RD-Connect Genome Phenome Analysis Platform is a platform to improve the study and analysis of Rare Diseases.
  • DisGeNET is a platform containing collections of genes and variants associated to human diseases.
  • PMut is a platform for the study of the impact of pathological mutations in protein structures.
  • IntoGen collects and analyses somatic mutations in thousands of tumor genomes to identify cancer driver genes.
  • BoostDM is a method to score all possible point mutations in cancer genes for their potential to be involved in tumorigenesis.
  • Cancer Genome Interpreter is designed to identify tumor alterations that drive the disease and detect those that may be therapeutically actionable.
  • GA4GH data security toolkit
  • GA4GH Cloud workstream

Preserving human research data

Description

It is good research ethical practice to ensure that data underlying research is preserved, preferably in a way that adheres to the FAIR principles. There might also exist legal obligations to preserve the data. With human data you have to take extra precautions into account when doing this.

Considerations

  • Depositing data in an international repository
    • To make the data as accessible as possible according to the FAIR principles, do deposit the data in an international repository under controlled access whenever possible, see the section Sharing & Reusing of human research data below
  • Legal obligations for preserving research data
    • In some countries there are legal obligations to preserve research data long-term, e.g. for ten years.
    • Even if the data has been deposited in an international repository, this might not live up to the requirements of the law.
    • The legal responsibility for preserving the data would in most cases lie with the research institution where you perform your research. You should consult the Research Data and/or IT support functions of your institution.
  • Information security
    • The solutions you use need to provide information security measures that are appropriate for storing personal data, see the section Processing and Analysing human research data above. Note that the providers of the solutions must be made aware that there are probably extra information security measures needed for long-term storage of this type of data.
  • Regardless of where your data is preserved long-term, do ensure that it is associated with proper metadata according to community standards, to promote FAIR sharing of the data.
  • Planning for long-term storage
    • Do address these issues of long-term preservation and data publication as early as possible, preferably already at the planning stage. If you are relying on your research institution to provide a solution, it might need time to plan for this.

Solutions

Sharing & reusing of human research data

Description

To make human research data reusable for others, it must be discoverable, stored in a safe way, and it must be clear under what circumstances it can be reused.

Considerations

  • Selecting suitable access modes for sharing human data
    • Human data often carries restrictions to its use and it would need to be shared in a manner that obeys such restrictions. There are three access modes for sharing research data:
      • Open access: Data is shared publicly. Open-access is a rarely used access mode for the sharing of human data. To use open-access researchers need to ensure that the shared data cannot be traced back to individual study participants. In other words the data needs to be anonymised, which is difficult in practice.
      • Registered access: Data is shared with researchers, whose “researcher” status has been vouched for by their institution and who agree to abide by data usage policies of repositories that serve the shared data. Datasets that are shared via registered-access would typically have no restrictions besides the condition that data is to be used for research.
      • Controlled access: Data can only be shared with researchers, whose research is reviewed and approved by a data access committee (DAC). Typically researchers, who were involved in the primary collection of data will form the DAC. Use conditions for controlled-access could be a multitude and includes allowed research topics, allowed geographical regions, allowed recipients e.g. non-profit organisations.
  • Publishing Human Research Data
    • It is highly recommended that Human Research Data is shared under controlled access. There are emerging models of sharing data through repositories under federated models.
    • The European Genome-phenome Archive (EGA) is the prime repository for human genomic and phenotypic data. The EGA applies a controlled access model.

Solutions

  • (Federated) EGA is a data repository that adopts the controlled-access model for serving human data.
    • It will be a distributed network of repositories for sharing human -omics data and phenotypes, that gathers metadata of -omics data collections stored in national or regional archives and makes them discoverable across the EGA network. The federated EGA will consist of a Central EGA (currently EMBL-EBI and CRG), which offers submission, long-term data archiving and data distribution to the international scientific community, and the federated EGA nodes. The primary motivation for establishing a federated EGA node is to enable the discovery and access of data that for consent or other reasons is required to be archived within the relevant jurisdiction. Publicly shareable metadata about studies/datasets archived at the federated EGA nodes will be shared with Central EGA to enable discovery. These nodes will offer the same APIs and interfaces as the Central EGA, and will provide independent data distribution to users.
  • dbGAP and JGA are international data repositories, which adopt the controlled-access model similar to the EGA. While data from these repositories can be used by researchers in the EU, it may not be possible, due to GDPR restrictions, to deposit EU subjects’ data to these international repositories.
  • Beacon
    • The Beacon Project is a Global Alliance for Genomics & Health (GA4GH) initiative that enables genomic and clinical data sharing across federated networks. A Beacon is defined as a web-accessible service that can be queried for information about a specific allele with no reference to a specific sample or patient, thereby reducing privacy risks.
  • GA4GH Data Use Ontology DUO is an international standard, which provides codes to represent data use restrictions for controlled access datasets.

Relevant tools and resources

Tool or resource Description Tags Registry
BBMRI-ERIC's ELSI Knowledge Base The ELSI Knowledge Base is an open-access resource platform that aims at providing practical know-how for responsible research. data protection sensitive policy officer data manager human data
Beacon The Beacon protocol defines an open standard for genomics data discovery. researcher data manager IT support human data
BIONDA BIONDA is a free and open-access biomarker database, which employs various text mining methods to extract structured information on biomarkers from abstracts of scientific publications storage researcher human data
BoostDM BoostDM is a method to score all possible point mutations (single base substitutions) in cancer genes for their potential to be involved in tumorigenesis. data analysis human data
Cancer Genome Interpreter Cancer Genome Interpreter (CGI) is designed to support the identification of tumor alterations that drive the disease and detect those that may be therapeutically actionable. data analysis human data
Consent Clauses for Genomic Research A resource for researchers when drafting consent forms so they can use language matching cutting-edge GA4GH international standards human data
DAISY Data Information System to keep sensitive data inventory and meet GDPR accountability requirement. IT support policy officer human data data protection
Data Use Ontology DUO allows to semantically tag datasets with restriction about their usage. data manager researcher human data
DAWID The Data Agreement Wizard is a tool developed by ELIXIR-Luxembourg to facilitate data sharing agreements. data protection policy officer human data
dbGAP The database of Genotypes and Phenotypes (dbGaP) archives and distributes data from studies investigating the interaction of genotype and phenotype in Humans data publication researcher IT support human data
DisGeNET A discovery platform containing collections of genes and variants associated to human diseases. data analysis human data researcher
EU General Data Protection Regulation Regulation (eu) 2016/679 of the european parliament and of the council on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation). data protection policy officer human data
EUPID EUPID provides a method for identity management, pseudonymisation and record linkage to bridge the gap between multiple contexts. IT support policy officer human data
GA4GH data security toolkit Principled and practical framework for the responsible sharing of genomic and health-related data. data publication policy officer data manager IT support human data
GA4GH regular and ethical toolkit Framework for Responsible Sharing of Genomic and Health-Related Data data protection sensitive policy officer data manager IT support human data
HumanMine HumanMine integrates many types of human data and provides a powerful query engine, export for results, analysis for lists of data and FAIR access via web services. data organisation data manager researcher human data data analysis
Informed Consent Ontology The Informed Consent Ontology (ICO) is an ontology for the informed consent and informed consent process in the medical field. IT support policy officer human data
International Compilation of Human Research Standards The International Compilation of Human Research Standards enumerates over 1,000 laws, regulations, and guidelines (collectively referred to as standards) that govern human subject protections in 133 countries, as well as standards from a number of international and regional organizations human data
IntoGen IntoGen collects and analyses somatic mutations in thousands of tumor genomes to identify cancer driver genes. data analysis human data
ISO/IEC 27001 International information security standard data protection policy officer human data
MONARC A risk assessment tool that can be used to do Data Protection Impact Assessments data protection policy officer human data
OTP One Touch Pipeline (OTP) is a data management platform for running bioinformatics pipelines in a high-throughput setting, and for organising the resulting data and metadata. human data metadata DMP data analysis
PAA PAA is an R/Bioconductor tool for protein microarray data analysis aimed at biomarker discovery. data analysis researcher human data
PMut Platform for the study of the impact of pathological mutations in protein stuctures. data analysis human data
Privacy Impact Assessment Tool Privacy Impact Assessment Tool is a software, that allows you to carry out Privacy Impact Assessment (PIA) independently. data protection policy officer human data
RD-Connect Genome Phenome Analysis Platform The RD-Connect GPAP is an online tool for diagnosis and gene discovery in rare disease research. researcher human data
The European Genome-phenome Archive (EGA) EGA is a service for permanent archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical research projects data publication human data policy officer
The Genomic Standards Consortium (GSC) Minimum Information about any (x) Sequence metadata researcher IT support policy officer human data
Tryggve ELSI Checklist A list of Ethical, Legal, and Societal Implications (ELSI) to consider for research projects on human subjects sensitive policy officer data manager human data

Training materials on the management of human-subject data