Skip to content Skip to footer

Your tasks: Data sensitivity

Is your data sensitive?

Description

In general, data can be categorised into two types i.e. sensitive data and non-sensitive data. Non-sensitive data can be shared openly without a risk of any harm. The term sensitive data is used when making data publicly available could put people, organisations, countries, and/or ecosystems at risk - this could be for example, personal or commercial information, and it could also be information about habitat, geographical location, and breeding grounds of endangered/vulnerable species. Such data sensitivity must be protected against unauthorized access, and therefore one should be cautious when deadling with potentitally sensitive or sensitive information. It is important to identifty, at early stage of data management process, that at which point data becomes sensitive or what parts of (existing or newly generated) data are sensitive. What is considered sensitive information is usually regulated by national laws and may differ between countries, so it is important to take into consideration both global and local regulations and policies.

Considerations

  • If you deal with any information about individuals from the EU, you are bound by the EU General Data Protection Regulation. In GDPR, such data is called “personal data”.
  • In the context of GDPR “special category data” is a subclass of “personal data” that is potentially even more harmful, and GDPR prescribes very strict rules for dealing with this data. Article 9 of GDPR defines the special categories as data consisting of racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, genetic data, biometric data, data concerning health or data concerning a natural person’s sex life or sexual orientation. Confusingly, these special categories are sometimes colloquially called “sensitive data”. Note that this page is concerned with the broader definition of “sensitive data”.
  • Information in Life Science projects are for the most part categorised under health and genetic data and are considered special category data under the GDPR.
  • You need to assess whether or not your dataset contains attributes that can lead to the identification of a person. Note that combinations of attributes that are themselves not identifiable can be identifiable together. See the definitions described in the How can you de-identify your data section.
  • You need to know the de-identification status of your data. Life Science research data rarely contains directly identifying attributes. Research data would typically be pseudonymised or anonymised. If you work with personal data, you must understand the difference between these two (see under de-identification below).
  • For some studies there is a cohort owner, often a clinical party or a trusted third party that can map study participant keys back to names and surnames. Such data is considered pseudonymous.
  • If there are no means to map the data back to individuals, then the data is considered anonymous and is out of the scope of the GDPR.
  • You should keep in mind that anonymising data is a notoriously difficult task. Does your dataset contain a wide array of attributes, or exhibit unique traits/patterns such that one can reasonably expect that not more than a dozen people in the world have those together? In that case, you can not assume that it is anonymous. Such data run the risk of being linked back to individuals through various technical means. You need to take into account that technical means to identify people in the future may be more powerful than than they are right now: i.e. data that is anonymous right now may not be anonymous forever.

Solutions

  • Identify what legislations and regulations there are that you are expected to follow. Your institution’s website may give you hints on where you can look for information about data sensitivity.
  • If you cannot determine if your data is sensitive, contact someone with expert knowledge in that area.

How can you de-identify your data?

Description

Data anonymization is the process of irreversibly modifying personal data in such a way that subjects cannot be identified directly or indirectly by anyone, including the study team. If data are anonymized, no one can link data back to the subject.

Pseudonymization is a process where identifying-fields within data records are replaced by artificial identifiers called pseudonyms or pseudonymized IDs. Pseudonymization ensures no one can link data back to the subject, apart from nominated members of the study team who will be able to link pseudonyms to identifying records, such as name and address.

Data anonymization involves modifying a dataset so that it is impossible to identify a subject from their data. Pseudonymization involves replacing identifying data with artificial IDs, for example, replacing a healthcare record ID with an internal participant ID only known to a named clinician working in the study.

Considerations

Both anonymization and pseudonymization are approaches that comply with the GDPR. Simply removing identifiers cannot guarantee data anonymity. A dataset may contain unique traits/patterns that could identify individuals. An example of this would be recording 2 potentially unrelated attributes such as the instance of a rare disease and country of residence, where there is only a single case of this disease in this country. Data that is anonymous currently may not be anonymous in the future. Future datasets on the same individual may disclose their identity. Anonymization techniques can sometimes damage the statistical properties of the data, for example, translating current participant age into an age range.

Solutions

An example of pseudonymization is where participants in a study are assigned a non-identifying ID and all identifying data (such as name and address) are removed from the metadata to be shared. The mapping of this ID to personal data is held separately and securely by a named researcher who will not share this data. There are well-established data anonymization approaches, such as k-anonymity, l-diversity, and differential privacy.

Related pages

More information

Tools and resources on this page

Skip tool table
Tool or resource Description Related pages Registry
EU General Data Protection Regulation Regulation (eu) 2016/679 of the european parliament and of the council on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation). TSD Human data
National resources
BioMedIT

A secure IT network for the responsible processing of health-related data.

Human data Data analysis
Federated EGA Finland

FEGA allows you to store and share sensitive data in Finland in a way that fulfils all the requirements of the General Data Protection Regulation (GDPR).

The European Genome-phenome Archive (EGA)
CSC Researcher Data Steward Data publication Existing data Human data
Findata

The Health and Social Data Permit Authority. Findata offers services and enables secure and efficient utilisation of data materials containing health and social data.

CSC Researcher Data Steward Existing data Human data
Fingenious

Finnish Biobank Cooperative (FINBB) connects researchers to Finnish biomedical research. Via Fingenious® services the researcher can connect to all Finnish public bio banks.

CSC Researcher Data Steward Human data
Sensitive Data Services for Research

CSC Sensitive Data Services for Research are designed to support secure sensitive data management through web-user interfaces accessible from the user’s own computer.

CSC Researcher Data Steward Data analysis Data storage Data publication Human data
Luxembourg Covid-19 data portal

The Luxembourgish COVID-19 Data Portal acts as a collection of links and provides information to support researchers to utilise Luxembourgish and European infrastructures for data sharing.

Educloud Research

Educloud Research is a platform provided by the Centre for Information Technology (USIT) at the University of Oslo (UiO). This platform provides access to a work environment accessible to collaborators from other institutions or countries. This service provides a storage solution and a low-threshold HPC system that offers batch job submission (SLURM) and interactive nodes. Data up to the red classification level can be stored/analysed.

Data analysis Data storage
Federated EGA Norway node

Federated instance collects metadata of -omics data collections stored in national or regional archives and makes them available for search through the main EGA portal. With this solution, sensitive data will not physically leave the country, but will reside on TSD.

The European Genome-phenome Archive (EGA)
Human data Existing data Data publication TSD
HUNTCloud

The HUNT Cloud, established in 2013, aims to improve and develop the collection, accessibility and exploration of large-scale information. HUNT Cloud offers cloud services and lab management. It is a key service that has established a framework for data protection, data security, and data management. HUNT Cloud is owned by NTNU and operated by HUNT Research Centre at the Department of Public Health and Nursing at the Faculty of Medicine and Health Sciences.

Human data Data analysis Data storage
Nettskjema

Nettskjema is a solution for designing and managing data collections using online forms and surveys. It can be used for collecting sensitive data and offers a high degree of security and privacy.

TSD
Norwegian COVID-19 Data Portal

The Norwegian COVID-19 Data Portal aims to bundle the Norwegian research efforts and offers guidelines, tools, databases and services to support Norwegian COVID-19 researchers.

Human data Existing data Data publication
RETTE

System for Risk and compliance. Processing of personal data in research and student projects at UiB.

Human data Data security GDPR compliance Policy maker Data Steward
SAFE

SAFE (secure access to research data and e-infrastructure) is the solution for the secure processing of sensitive personal data in research at the University of Bergen. SAFE is based on the “Norwegian Code of conduct for information security in the health and care sector” (Normen) and ensures confidentiality, integrity, and availability are preserved when processing sensitive personal data. Through SAFE, the IT department offers a service where employees, students and external partners get access to dedicated resources for processing of sensitive personal data.

Human data Data analysis Data storage
TSD

The TSD – Service for Sensitive Data, is a platform for collecting, storing, analysing and sharing sensitive data in compliance with the Norwegian privacy regulation. TSD is developed and operated by UiO.

Human data Data analysis Data storage TSD
usegalaxy.no

Galaxy is an open-source, web-based platform for data-intensive biomedical research. This instance of Galaxy is coupled with NeLS for easy data transfer.

Galaxy
Data analysis Existing data Data publication NeLS
Federated EGA Sweden node

Secure archiving and sharing of genetic and phenotypic data resulting from Swedish biomedical research projects.

The European Genome-phenome Archive (EGA)
Human data Existing data Data publication
Human Data Guidelines

Guidelines as well as further information on legal considerations when working with human biomedical data.

Human data
NBIS Data Management Consultation

Free consultation service regarding data management questions in life science research.

Data management plan Data publication
Swedish Pathogens Portal

The Swedish Pathogens Portal provides information, guidelines, tools and services to support researchers to utilise Swedish and European infrastructures for data sharing.

COVID-19 Data Portal Human data Existing data Data publication
Contributors