Skip to content Skip to footer

Data life cycle: Collecting

What is data collection?

Data collection is the process where information is gathered about specific variables of interest either using instrumentation or other methods (e.g. questionnaires, patient records). While data collection methods depend on the field and research subject, it is important to ensure data quality.

You can also reuse existing data in your project. This can either be individual earlier collected datasets, reference data from curated resources or consensus data like reference genomes. For more information see Reuse in the data life cycle.

Why is data collection important?

Apart from being the source of information to build your findings on, the collection phase lays the foundation for the quality of both the data and its documentation. It is important that the decisions made regarding quality measures are implemented, and that the collect procedures are appropriately recorded.

What should be considered for data collection?

Appropriate tools or integration of multiple tools (also called tool assembly or ecosystem) can help you with data management and documentation during data collection. Suitable tools for data management and documentation during data collection are Electronic Lab Notebooks (ELNs), Electronic Data Capture (EDC) systems, Laboratory Information Management Systems (LIMS). Moreover, online platforms for collaborative research and file sharing services could also be used as ELN or data management systems.

Independently of the tools you will use, consider the following, while collecting data.

  • Capture the provenance e.g. of samples, researchers and instruments.
  • Ensure data quality, since data can either be generated by yourself, or by another infrastructure or facility with this specialisation.
  • Check options for reusing data instead of generating new data.
  • Define the experimental design including a collection plan (e.g. repetitions, controls, randomisation) in advance.
  • Calibrate the instruments.
  • Check data protection and security issues if you work with sensitive or confidential data.
  • Check permissions or consent if you work with human-related data.
  • Define how to store the data e.g. format and volume.
  • Find suitable repository to store the data.
  • Identify suitable metadata standards.

Related pages

More information

With Data Stewardship Wizard (DSW), you can create, plan, collaborate, and bring your data management plans to life with a tool trusted by thousands of people worldwide — from data management pioneers, to international research institutes.

Contributors