Introduction
When you do research on data derived from human individuals (hereon human data), there are additional aspects that must be considered during the data life cycle. Note, much of the topics discussed on this page will refer to the General Data Protection Regulation (GDPR) as it is a central piece of legislation that affects basically all research taking place in the European Union (EU) using human data or research with data of individuals residing in the EU. Much of the information on this page is of a general nature when it comes to working with human data, an additional focus is on human genomic data and the sharing of such information for research purposes.
Planning for projects with human data
Description
When working with human data, you must follow established research ethical guidelines and legislations. Preferably, planning for these aspects should be done before starting to handle personal data and in some cases such as in the case of the GDPR, it is an important requirement by laws and regulations.
Considerations
- Have you got an ethical permit for your research project?
- To get an ethical permit, you have to apply for an ethical review by an ethical review board.
- The legislation that governs this differs between countries. Do seek advice from your research institute.
- The Global Alliance for Genomics and Health (GA4GH) has recommendations for these issues in their GA4GH regulatory and ethical toolkit, see for instance the Consent Clauses for Genomic Research.
- To get an ethical permit, you have to apply for an ethical review by an ethical review board.
- The acquisition of data must be legal.
- Receiving data/samples directly from data subjects requires in most cases informed consents.
- An informed consent is an agreement from the research subject to participate in and share personal data for a particular purpose. It shall describe the purpose and any risks involved (along with any mitigations to minimise those risks) in such a way that the research subject can make an informed choice about participating. It should also state under what circumstances the data can be used for the initial purpose, as well as for later re-use by others.
- Consider adoption of formalised machine-readable description of data use conditions. This will greatly improve the possibilities to make the data FAIR later on.
- Informed consents should be acquired for different purposes:
- It is a cornerstone of research ethics. Regardless of legal obligations, it is important to ask for informed consents as it is a good research ethics practice and maintains trust in research.
- Ethical permission legislation to perform research on human subjects demand informed consents in many cases.
- Personal data protection legislation might have informed consent as one legal basis for processing the personal data.
- Note that the content of an informed consent, as defined by one piece of legislation, might not live up to the demands of another piece of legislation. For example, an informed consent that is good enough for an ethical permit, might not be good enough for the demands of the GDPR.
- An informed consent is an agreement from the research subject to participate in and share personal data for a particular purpose. It shall describe the purpose and any risks involved (along with any mitigations to minimise those risks) in such a way that the research subject can make an informed choice about participating. It should also state under what circumstances the data can be used for the initial purpose, as well as for later re-use by others.
- Receiving data from a collaborator must be covered by a contract. Ensure detailed provisions on data use, retention, re-use and publication are included in the agreements (Data Use agreement, Consortium agreement, Data Sharing agreement, …). This applies also to samples you receive from a collaborator. Related contract (e.g. Material Transfer Agreement - MTA) should cover use of human data generated from these samples. Incomplete legal framework for the data use can require lengthy legal amendments and can result in your in-ability to comply with requirements set out by your funder or targeted publisher.
- Receiving data from a repository also comes with certain use restrictions. These are either defined in the license attributed to the data or defined in a dataset specific access policy and terms of service of the repository.
- Receiving data/samples directly from data subjects requires in most cases informed consents.
- Personal data protection legislation:
- Within the EU. If you are performing human data research in the EU, or your data subjects are located in the EU, then you must adhere to the General Data Protection Regulation - GDPR.
- Requirements for research that fall under the GDPR are outlined in the RDMkit GDPR compliance page.
- Attributes of the data determines data sensitivity and sensitivity affects the considerations for data handling. The RDMkit Data Sensitivity page provides guidance on determining and reducing data sensitivity.
- Outside the EU. For countries outside the EU, the International Compilation of Human Research Standards list relevant legislations.
- Within the EU. If you are performing human data research in the EU, or your data subjects are located in the EU, then you must adhere to the General Data Protection Regulation - GDPR.
Solutions
- Tryggve ELSI Checklist is a list of Ethical, Legal, and Societal Implications (ELSI) to consider for research projects on human subjects.
- DAISY is software tool from ELIXIR that allows the record keeping of data processing activities in research projects.
- Data Agreement Wizard (DAWID) is a software tool from ELIXIR that allows generation of tailor-made data sharing agreements
- Privacy Impact Assessment Tool (PIA) is a software tool to make Data Protection Impact Assessments.
- MONARC is a risk assessment tool that can be used to do Data Protection Impact Assessments
- Data Use Ontology (DUO)
- Informed Consent Ontology (ICO)
- GA4GH Regulatory and Ethics toolkit
- EU General Data Protection Regulation
- BBMRI-ERIC’s ELSI Knowledge Base contains a glossary, agreement templates and guidance.
Processing and analysing human data
Description
For human data, it is very important to use technical and procedural measures to ensure that the information is kept secure. There might exist legal obligations to document and implement measures to ensure an adequate level of security.
Considerations
- Establish adequate Information security measures. This should be done for all types of research data, but is even more important for human data.
- Information security is usually described as containing three main aspects - Confidentiality, Integrity, and Accessibility.
- Confidentiality is about measures to ensure that data is kept confidential from those that do not have rights to access the data.
- Integrity is about measures to ensure that data is not corrupted or destroyed.
- Accessibility is about measures to ensure that data can be accessed by those that have a right to access it, when they need to access it.
- Information security measures are both procedural and technical.
- What information security measures that need to be established should be defined at the planning stage (see above), when doing a risk assessment, e.g. the GDPR Data Protection Impact Assessment. This should identify information security risks, and define measures to mitigate those risks.
- Contact the IT or Information security office at your institution to get guidance and support to address these issues.
- ISO/IEC 27001 is an international information security standard adopted by data centres of some universities and research institutes.
- Information security is usually described as containing three main aspects - Confidentiality, Integrity, and Accessibility.
- Check whether there are local/national tools and platforms suited to handle human data.
- Local research infrastructures have established compute and/or storage solutions with strong information security measures tailored for working on human data. The RDMkit national resources page lists the sensitive data support facilities available in various countries. Contact your institute or your ELIXIR node for guidance.
- There are also emerging alternative approaches to analyse sensitive data, such as doing “distributed” computation, where defined analysis workflows are used to do analysis on datasets that do not leave the place where they are stored.
- The GA4GH is developing standards for this in their GA4GH Cloud workstream
- Take data quality into account. When processing human data, data quality is a very important aspect to consider because it can influence the results of the research. Especially in the healthcare sector, some of the data that is used for research was not collected for research purposes, and therefore it is not guaranteed to have sufficient quality. Check the RDMkit Data Quality page to learn more about how to assess the quality of health data.
Solutions
- EUPID is a tool that allows researchers to generate unique pseudonyms for patients that participate in rare disease studies.
- RD-Connect Genome Phenome Analysis Platform is a platform to improve the study and analysis of Rare Diseases.
- DisGeNET is a platform containing collections of genes and variants associated to human diseases.
- PMut is a platform for the study of the impact of pathological mutations in protein structures.
- IntoGen collects and analyses somatic mutations in thousands of tumor genomes to identify cancer driver genes.
- BoostDM is a method to score all possible point mutations in cancer genes for their potential to be involved in tumorigenesis.
- Cancer Genome Interpreter is designed to identify tumor alterations that drive the disease and detect those that may be therapeutically actionable.
- GA4GH’s Data Security, and GA4GH Genomic Data Toolkit provide policies, standards for the secure transfer and processing of human genomics data. GA4GH standards are often implemented into multiple tools. For example, the Crypt4GH data encryption standard is implemented both in SAMTools and also provided as a utility from the EGA Archive, Crypt4GH.
- GA4GH’s Cloud Workstream is a more recent initiative and focuses on keeping data in secure cloud environments and meanwhile bringing computational analysis to the data.
- The ERPA is a Web-based tool allowing users to create and manage a register of personal data processing activities (ROPA).
- OTP is a data management platform for running bioinformatics pipelines in a high-throughput setting, and for organising the resulting data and metadata.
Preserving human data
Description
It is a good ethical practice to ensure that data underlying research is preserved, preferably in a way that adheres to the FAIR principles. There might also exist legal obligations to preserve the data. With human data, you have to take extra precautions into account when doing this.
Considerations
- Depositing data in an international repository
- To make the data as accessible as possible according to the FAIR principles, do deposit the data in an international repository under controlled access whenever possible, see the section Sharing & Reusing of human data below
- Legal obligations for preserving research data
- In some countries there are legal obligations to preserve research data long-term, e.g. for ten years.
- Even if the data has been deposited in an international repository, this might not live up to the requirements of the law.
- The legal responsibility for preserving the data would in most cases lie with the research institution where you perform your research. You should consult the Research Data and/or IT support functions of your institution.
- Information security
- The solutions you use need to provide information security measures that are appropriate for storing personal data, see the section Processing and Analysing human data above. Note that the providers of the solutions must be made aware that there are probably extra information security measures needed for long-term storage of this type of data.
- Regardless of where your data is preserved long-term, do ensure that it is associated with proper metadata according to community standards, to promote FAIR sharing of the data.
- Planning for long-term storage
- Do address these issues of long-term preservation and data publication as early as possible, preferably already at the planning stage. If you are relying on your research institution to provide a solution, it might need time to plan for this.
Solutions
- GA4GH Data Security Toolkit
- ISO/IEC 27001 is an international information security standard adopted by data centres of some universities and research institutes.
- The European Genome-phenome Archive (EGA) is an international service for secure archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical studies and healthcare centres. All services are free of charge. The EGA stores the data and metadata long-term, without ending date of the service. The data is backed-up in two separate geographical locations. The storing is GDPR-compliant, thanks to the use of Ga4GH encryption standard and continuously kept up-to-date. National repositories working as Federated EGA nodes are available in some countries like Sweden, Norway, Finland, Germany and Spain. Those may address specific additional national legal needs, not included in European regulation.
- DPIA Knowledge Model is a DSW knowledge model guiding users through a set of questions to collect information necessary for a research project Data Protection Impact Assessment (DPIA).
Sharing and reusing of human data
Description
To make human data reusable for others, it must be discoverable, stored in a safe way, and it must be clear under what circumstances it can be reused.
Considerations
- Selecting suitable access modes for sharing human data:
- Human data often carries restrictions to its use and it would need to be shared in a manner that obeys such restrictions. There are three access modes for sharing research data:
- Open access: Data is shared publicly. Open-access is a rarely used access mode for the sharing of human data. To use open-access researchers need to ensure that the shared data cannot be traced back to individual study participants. In other words the data needs to be anonymised, which is difficult in practice.
- Registered access: Data is shared with researchers, whose “researcher” status has been vouched for by their institution and who agree to abide by data usage policies of repositories that serve the shared data. Datasets that are shared via registered-access would typically have no restrictions besides the condition that data is to be used for research.
- Controlled access: Data can only be shared with researchers, whose research is reviewed and approved by a Data Access Committee (DAC) - typically, researchers who are/were involved in the primary collection of data will form the DAC. Use conditions for controlled-access could be a multitude and includes allowed research topics, allowed geographical regions, allowed recipients e.g. non-profit organisations.
- Human data often carries restrictions to its use and it would need to be shared in a manner that obeys such restrictions. There are three access modes for sharing research data:
- Publishing human data:
- It is highly recommended that human data is shared under controlled access. There are emerging models of sharing data through repositories under federated models.
- Transferring human data:
- Transferring human data has to be done in a secure way in order to avoid breaches of privacy. Encrypting of human data whilst it is being transferred provides successful protection if the data is intercepted by an external party while the transfer is being done.
Solutions
- The The European Genome-phenome Archive (EGA) is an international service for secure archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical studies and healthcare centres. All services are free of charge. The EGA platform offers secure and European law-compliant data sharing. Data treatment is FAIR-compliant, thus data is discoverable in the EGA website and shareable with other researchers through authorisation and authentication protocols. The right to allow access to any dataset belongs to the Data controllers (and not to the EGA), who are responsible to sign a Data Access Agreement (DAA) with researchers requesting access to their data. Templates of the legal documents are provided. The EGA hosts data from all around the world and distributes it where and when the data controllers permit.
- dbGAP and JGA are other international data repositories, based in the USA and Japan respectively, that adopt a controlled-access model based on their national regulations. Due to European GDPR specific requirements, it may not be possible to deposit EU subjects’ data to these repositories.
- The Beacon project is a GA4GH initiative that enables genomic and clinical data sharing across federated networks. A Beacon is defined as a web-accessible service that can be queried for information about a specific allele with no reference to a specific sample or patient, thereby reducing privacy risks.
- The Data Use Ontology (DUO) is an international standard, which provides codes to represent data use restrictions for controlled access datasets.
- Crypt4GH is a Python tool to encrypt, decrypt or re-encrypt files, according to the GA4GH encryption file format.
- HumanMine is an integrative database of Homo sapiens genomic data, that integrates many types of human data and provides a powerful query engine, export for results, analysis for lists of data and FAIR access via web services.
Related pages
More information
Links to DSW
With Data Stewardship Wizard (DSW), you can create, plan, collaborate, and bring your data management plans to life with a tool trusted by thousands of people worldwide — from data management pioneers, to international research institutes.
Training
Tools and resources on this page
Tool or resource | Description | Related pages | Registry |
---|---|---|---|
BBMRI-ERIC's ELSI Knowledge Base | The ELSI Knowledge Base is an open-access resource platform that aims at providing practical know-how for responsible research. | Ethical aspects GDPR compliance | |
Beacon | The Beacon protocol defines an open standard for genomics data discovery. | Data discoverability | Tool info Standards/Databases Training |
BoostDM | BoostDM is a method to score all possible point mutations (single base substitutions) in cancer genes for their potential to be involved in tumorigenesis. | Tool info | |
Cancer Genome Interpreter | Cancer Genome Interpreter (CGI) is designed to support the identification of tumor alterations that drive the disease and detect those that may be therapeutically actionable. | Tool info | |
Consent Clauses for Genomic Research | A resource for researchers when drafting consent forms so they can use language matching cutting-edge GA4GH international standards | ||
Crypt4GH | A Python tool to encrypt, decrypt or re-encrypt files, according to the GA4GH encryption file format. | Training | |
DAISY | Data Information System to keep sensitive data inventory and meet GDPR accountability requirement. | TransMed GDPR compliance | Tool info Training |
Data Agreement Wizard (DAWID) | The Data Agreement Wizard is a tool developed by ELIXIR-Luxembourg to facilitate data sharing agreements. | GDPR compliance | |
Data Use Ontology (DUO) | DUO allows to semantically tag datasets with restriction about their usage. | Ethical aspects | Standards/Databases Training |
dbGAP | The database of Genotypes and Phenotypes (dbGaP) archives and distributes data from studies investigating the interaction of genotype and phenotype in Humans | Tool info Standards/Databases Training | |
DisGeNET | A discovery platform containing collections of genes and variants associated to human diseases. | Toxicology data | Tool info Standards/Databases Training |
DPIA Knowledge Model | A DSW knowledge model guiding users through a set of questions to collect information necessary for a research project Data Protection Impact Assessment (DPIA). | GDPR compliance | |
ERPA | Web-based tool allowing users to create and manage a register of personal data processing activities (ROPA). | GDPR compliance | |
EU General Data Protection Regulation | Regulation (eu) 2016/679 of the european parliament and of the council on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation). | TSD Data sensitivity | |
EUPID | EUPID provides a method for identity management, pseudonymisation and record linkage to bridge the gap between multiple contexts. | ||
GA4GH Data Security Toolkit | Principled and practical framework for the responsible sharing of genomic and health-related data. | ||
GA4GH Genomic Data Toolkit | Open standards for genomic data sharing. | ||
GA4GH Regulatory and Ethics toolkit | Framework for Responsible Sharing of Genomic and Health-Related Data | Ethical aspects | |
HumanMine | HumanMine integrates many types of human data and provides a powerful query engine, export for results, analysis for lists of data and FAIR access via web services. | Tool info Standards/Databases Training | |
Informed Consent Ontology (ICO) | The Informed Consent Ontology (ICO) is an ontology for the informed consent and informed consent process in the medical field. | Ethical aspects | Standards/Databases |
International Compilation of Human Research Standards | The International Compilation of Human Research Standards enumerates over 1,000 laws, regulations, and guidelines (collectively referred to as standards) that govern human subject protections in 133 countries, as well as standards from a number of international and regional organizations | ||
IntoGen | IntoGen collects and analyses somatic mutations in thousands of tumor genomes to identify cancer driver genes. | Tool info | |
ISO/IEC 27001 | International information security standard | Compliance monitoring ... | |
JGA | The Japanese Genotype-phenotype Archive (JGA) is a service for permanent archiving and sharing of all types of individual-level genetic and de-identified phenotypic data resulting from biomedical research projects. | ||
MONARC | A risk assessment tool that can be used to do Data Protection Impact Assessments | TransMed | |
OTP | One Touch Pipeline (OTP) is a data management platform for running bioinformatics pipelines in a high-throughput setting, and for organising the resulting data and metadata. | Tool info | |
PMut | Platform for the study of the impact of pathological mutations in protein stuctures. | Tool info | |
Privacy Impact Assessment Tool (PIA) | The open source PIA software helps to carry out data protection impact assessment | Tool info | |
RD-Connect Genome Phenome Analysis Platform | The RD-Connect GPAP is an online tool for diagnosis and gene discovery in rare disease research. | Tool info Training | |
The European Genome-phenome Archive (EGA) | EGA is a service for permanent archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical research projects
|
CSC TSD Data publication | Tool info Standards/Databases Training |
Tryggve ELSI Checklist | A list of Ethical, Legal, and Societal Implications (ELSI) to consider for research projects on human subjects | NeLS TSD GDPR compliance |
National resources
Tools and resources tailored to users in different countries.
Tool or resource | Description | Related pages | Registry |
---|---|---|---|
BioMedIT | A secure IT network for the responsible processing of health-related data. |
Data analysis Data sensitivity | |
Federated EGA Finland | FEGA allows you to store and share sensitive data in Finland in a way that fulfils all the requirements of the General Data Protection Regulation (GDPR).
The European Genome-phenome Archive (EGA)
|
CSC Researcher Data Steward Data sensitivity Data publication Existing data | |
Findata | The Health and Social Data Permit Authority. Findata offers services and enables secure and efficient utilisation of data materials containing health and social data. |
CSC Researcher Data Steward Data sensitivity Existing data | |
Fingenious | Finnish Biobank Cooperative (FINBB) connects researchers to Finnish biomedical research. Via Fingenious® services the researcher can connect to all Finnish public bio banks. |
CSC Researcher Data Steward Data sensitivity | |
Sensitive Data Services for Research | CSC Sensitive Data Services for Research are designed to support secure sensitive data management through web-user interfaces accessible from the user’s own computer. |
CSC Researcher Data Steward Data sensitivity Data analysis Data storage Data publication | |
23 Things for Research Data Management tool | Shared reference tool for knowledge on data management. |
Data management plan Compliance monitoring ... | |
BBMRI catalogue | Biobanking Netherlands makes biosamples, images and data findable, accessible and usable for health research. |
Researcher Data analysis Existing data Data storage | |
cBioPortal for Cancer Genomics | cBioPortal provides a web-based resource for researchers to explore, visualize, analyze, and share multidimensional cancer genomic datasets, as well as other studies involving multidimensional genomic data. |
Researcher Data analysis Existing data Data storage | |
CBS, Statistics Netherlands | The national statistical office, Statistics Netherlands (CBS), provides reliable statistical information and data in the life sciences and health domain. |
Researcher Existing data | |
Dutch COVID-19 Data Support Programme | To support investigators and health care professionals with tools and services in their search for ways to overcome the pandemic and its health consequences. |
Researcher Existing data | |
Handbook for Adequate Natural Data Stewardship | Guidelines on data stewardship and practical toolbox for researchers at Dutch University Medical Centres (UMCs). |
Researcher Data management plan Compliance monitoring ... | |
Health-RI Service Catalogue | Health-RI provides a set of tools and services available to the biomedical research community. |
Researcher Data analysis Existing data Data storage | |
RIVM Health and Healthcare Data | The Dutch National Institute for Public Health and the Environment (RIVM), together with other organisations, provides numbers and explanation on relevant topics, to prevent duplication of data collection. |
Researcher Existing data | |
Technology Hotels | More than 130 Technology Hotels offer access to high-end technology and expertise in the field of bioimaging, bioinformatics, genomics, medical imaging, metabolomics, phenotyping, proteomics, structural biology, and/or systems biology. |
Bioimaging data Proteomics Researcher Compliance monitoring ... | |
Federated EGA Norway node | Federated instance collects metadata of -omics data collections stored in national or regional archives and makes them available for search through the main EGA portal. With this solution, sensitive data will not physically leave the country, but will reside on TSD.
The European Genome-phenome Archive (EGA)
|
Data sensitivity Existing data Data publication TSD | |
HUNTCloud | The HUNT Cloud, established in 2013, aims to improve and develop the collection, accessibility and exploration of large-scale information. HUNT Cloud offers cloud services and lab management. It is a key service that has established a framework for data protection, data security, and data management. HUNT Cloud is owned by NTNU and operated by HUNT Research Centre at the Department of Public Health and Nursing at the Faculty of Medicine and Health Sciences. |
Data analysis Data sensitivity Data storage | |
Norwegian COVID-19 Data Portal | The Norwegian COVID-19 Data Portal aims to bundle the Norwegian research efforts and offers guidelines, tools, databases and services to support Norwegian COVID-19 researchers. |
Data sensitivity Existing data Data publication COVID-19 Data Portal | |
RETTE | System for Risk and compliance. Processing of personal data in research and student projects at UiB. |
Data security GDPR compliance Data sensitivity Policy maker Data Steward | |
SAFE | SAFE (secure access to research data and e-infrastructure) is the solution for the secure processing of sensitive personal data in research at the University of Bergen. SAFE is based on the “Norwegian Code of conduct for information security in the health and care sector” (Normen) and ensures confidentiality, integrity, and availability are preserved when processing sensitive personal data. Through SAFE, the IT department offers a service where employees, students and external partners get access to dedicated resources for processing of sensitive personal data. |
Data analysis Data sensitivity Data storage | |
TSD | The TSD – Service for Sensitive Data, is a platform for collecting, storing, analysing and sharing sensitive data in compliance with the Norwegian privacy regulation. TSD is developed and operated by UiO. |
Data analysis Data sensitivity Data storage TSD | |
Federated EGA Sweden node | Secure archiving and sharing of genetic and phenotypic data resulting from Swedish biomedical research projects.
The European Genome-phenome Archive (EGA)
|
Data sensitivity Existing data Data publication | |
Human Data Guidelines | Guidelines as well as further information on legal considerations when working with human biomedical data. |
Data sensitivity | |
Swedish Pathogens Portal | The Swedish Pathogens Portal provides information, guidelines, tools and services to support researchers to utilise Swedish and European infrastructures for data sharing. |
COVID-19 Data Portal Data sensitivity Existing data Data publication |