Data life cycle: Collecting
What is data collection?
Data collection is the process where information is gathered about specific variables of interest either using instrumentation or other methods (e.g. questionnaires, patient records). While data collection methods depend on the field and research subject, it is important to ensure data quality.
You can also reuse existing data in your project. This can either be individual earlier collected datasets, reference data from curated resources or consensus data like reference genomes. For more information see Reuse in the data life cycle.
Why is data collection important?
Apart from being the source of information to build your findings on, the collection phase lays the foundation for the quality of both the data and its documentation. It is important that the decisions made regarding quality measures are implemented, and that the collect procedures are appropriately recorded.
What should be considered for data collection?
Appropriate tools or integration of multiple tools (also called tool assembly or ecosystem) can help you with data management and documentation during data collection. Suitable tools for data management and documentation during data collection are Electronic Lab Notebooks (ELNs), Electronic Data Capture (EDC) systems, Laboratory Information Management Systems (LIMS). Moreover, online platforms for collaborative research and file sharing services could also be used as ELN or data management systems.
Independently of the tools you will use, consider the following, while collecting data.
- Capture the provenance e.g. of samples, researchers and instruments.
- Ensure data quality, since data can either be generated by yourself, or by another infrastructure or facility with this specialisation.
- Check options for reusing data instead of generating new data.
- Define the experimental design including a collection plan (e.g. repetitions, controls, randomisation) in advance.
- Calibrate the instruments.
- Check data protection and security issues if you work with sensitive or confidential data.
- Check permissions or consent if you work with human-related data.
- Define how to store the data e.g. format and volume.
- Find suitable repository to store the data.
- Identify suitable metadata standards.
Best practices to name and organise research data. Data quality
How to ensure high quality of research data. Existing data
How to find and reuse existing data. Identifiers
How to use identifiers for research data. Documentation and metadata
How to document and describe your data. Sensitive data
How to identify the sensitivity of different research data types Data storage
How to find appropriate storage solutions. Data provenance
How to record information about data provenance.