Data life cycle: Preserving
What is data preservation?
Data preservation consists of a series of activities necessary to ensure safety, integrity and accessibility of data for as long as necessary, even decades. Data preservation is indeed more than just data storage and backup, since data can be stored and backed up without being preserved. Data preservation prevents data from becoming unavailable and unusable over time through appropriate steps.
- Ensure data safety and integrity.
- Change the file format (format migration) and update software to make sure that they do not become outdated or obsolete.
- Change hardware and other storage media (such as paper, magnetic tape, etc.) to avoid degradation.
- Ensure that data is organised and described with appropriate metadata and documentation to be always understandable and reusable.
Why is data preservation important?
There are several important reasons to preserve research data.
- Guarantee that your data can be verified and reproduced for several years after the end of the project.
- Allow the reuse of the data in the future for different purposes, such as teaching or further research.
- Funders, publishers, institutions and organisations could require a specific period for preservation of certain data for a specific purpose.
- Preserve data that have significant value for an organisation, a Nation, the environment or for the entire society.
What should be considered for preserving data?
Not all data should be preserved. Preservation should be applied to an appropriate selection of data, since it takes relevant effort and costs. Common criteria to select the data to preserve for a certain amount of time are:
- data requared to be preserved by funder, publisher and institution policies (usually, data should be preserved for at least 5 or 10 years after the end of the project);
- data preservation of which is needed by legal or ethical requirements (e.g. clinical trial data);
- unique data or that cannot be easily re-generated (e.g. raw data, analysis workflow);
- data that will probably being reused in the future;
- data of great value for society (scientifically, historically or culturally).
Data preservation must be done by experts and dedicated services. Preservation of digital information requires planning, policies, resources (time, funds, people) as well as the right technology to ensure that the data stays functional and that it can be accessed (see ISO Standards for quality, preservation and integrity of information). Hence, special long term data repositories should be used for digital preservation, where the data is actively maintained and information integrity is monitored. Therefore, it is best to consider different options.
- Contact the IT department or the library or the data center of your institution.
- Check if national services are available.
- Choose trustworthy research data repositories or deposition databases, based on your data type. Repositories could be publicly accessible and allow you to also publish your data.
When preparing data for preservation several requirements need to be fulfilled.
- Do not include data that are temporary or mutable.
- Ensure well described and self-explanatory documentation.
- Include information about provenance.
- Include sufficient licensing information.
- Ensure that data is well organised.
- Ensure that a consistent naming scheme is used.
- Use standard, open source, file formats instead of proprietary ones.
If you need to preserve non-digital data (e.g. paper), consider whether digitalising the data is feasible or consult with data management support services in your institution.
If you need to preserve materials, such as micro-organisms, biomaterials or biomolecules, consult with data management support services in your institution to find appropriate centers or biobanks.
Best practices to name and organise research data. Data protection
How to make research data compliant to GDPR. Data publication
How to prepare data and find repositories for publication. Documentation and metadata
How to document and describe your data. Data storage
How to find appropriate storage solutions. Identifiers
How to use identifiers for research data. Licensing
How to license research data.