How can you ensure data protection and security during collection, storage, and transfer?
Data collection may consist of the re-use of existing data and/or the generation of new data.
For data to be considered valid and reliable, data collection should occur consistently and systematically throughout the course of the research project. Within disciplines, there are established methodologies, procedures and techniques that help researchers ensure high quality of collected data. In general, important aspects of data collection include:
- Standardisation: codebooks & protocols
- Structure / organisation of the data
- Data quality assurance methods
- Documentation & metadata
- Storage & protection
Systematic data collection is essential for ensuring the reproducibility of research. When data is collected in a consistent and organized manner, it improves the quality and reliability of the research, making the data easier to share and reproduce by others. High-quality data also contributes to making data FAIR (Findable, Accessible, Interoperable, and Reusable), as well-organized and well-documented data is more likely to be reused effectively. The principles of making data FAIR are discussed in detail under the topic FAIR Principles.
Data Collection Tools
The tools being used in research to collect data are immensely diverse. For that reason, we will not provide an exhaustive overview here. What is important for data collection tools in relation to RDM is where such tools store the data that you collect and in which format. The storage location is particularly important when you are working with personal data. For example, the privacy legislation in the United States is very different from the European General Data Protection Regulation (GDPR). Hence, personal data collected in a Dutch research institute may not be stored on American servers. It is important to keep that in mind when you are contemplating which tool to use for your data collection.
If you are collecting personal data and you decide to use a tool for which no contract exists between VU Amsterdam and the provider of the software or tool, a service agreement and a processing agreement must be drawn up. Contact the 🔒 privacy champion of your faculty for more information and a model processing agreement.
Questionnaire tools
The Faculty of Behavioural and Movement Sciences has developed a document with tips for safe use of the questionnaire tools Qualtrics and Survalyzer. The document was made for FGB researchers specifically but can also be helpful for others. Consult this document if you need a questionnaire tool to collect your data.
Data Collection in Collaboration
Some research projects involve the participation of multiple organisations or institutes and may include even cross-border co-operation. When data is collected by several organisations, a Data Management Plan should provide information on who is responsible for which part of the data collection and storage. It should also provide information on how specific data collections are related to which part(s) of the research goal(s). Describing this precisely will help you to determine if a consortium agreement or joint controller agreement is necessary. You see a general example of such a specification in the table below:
Data Stage | Dataset description | Responsible organization for collection | Data origin | Data purpose |
---|---|---|---|---|
Raw data | Community level surveys | Vrije Universiteit Amsterdam | Amsterdam, The Hague, Rotterdam | Identifying perceived problems, System responsiveness |
Raw data | Trials & Focus Group Interviews | London School of Hygiene and Tropical Medicine (LSHTM) | Germany, Switzerland | Trials to evaluate programs on . . ., Focus Group interviews to identify barriers to . . . |
Raw data | Pollution measurements using fish | Oceanographic Institute of Sweden | Coastal waters, Northeast Spain | Establish pollution levels of plastic |
Data Collection Protocols
Regardless of the field of study or preference for defining data (quantitative, qualitative), accurate data collection is essential to maintaining the integrity (structure) of research. Both the selection of appropriate data collection instruments (existing, modified, or newly developed) and clearly delineated instructions for their correct use reduce the likelihood of errors.
There are two approaches for reducing and/or detecting errors in data which can help to preserve the integrity of your data and ensure scientific validity. These are:
- Quality assurance - activities that take place before data collection begins
- Quality control - activities that take place during and after data collection
Quality assurance precedes data collection and its main focus is ‘prevention’ (i.e., forestalling problems with data collection). Prevention is the most cost-effective activity to ensure the integrity of data collection. This proactive measure is best demonstrated by the standardization of protocol developed in a comprehensive and detailed procedures manual for data collection.
While quality control activities (detection/monitoring and action) occur during and after data collection, the details should be carefully documented in the procedures manual. A clearly defined communication structure is a necessary pre-condition for monitoring and tracking down errors. Quality control also identifies the required responses, or ‘actions’ necessary to correct faulty data collection practices and also minimise future occurrences.
Some sources for protocols:
- HANDS Handbook for Adequate Natural Data Stewardship by the Federation of Dutch University Medical Centers (UMCs)
- Protocols.io - an open access repository of protocols
- Protocols Online - website with protocols available on the internet, sorted by discipline.
- Springer Protocols - free and subscribed protocols collected by Springer.
Storage During Research
VU Amsterdam offers several options to store your research data. The choice for a specific option may depend on factors such as:
- Does a project involve multiple organisations or departments?
- The sensitivity of the data: does it involve personal data or copyrighted / commercial data?
- Are there any research partners with whom data need to be shared?
- Are any commercial parties involved?
- Does the research project involve multiple locations (inside or maybe even outside the EU)?
- Will there be (lab) devices producing data that need to be stored as well?
- What will be the volume of the data?
- Will there be lots of interactions with the data (using software/tools)?
Storage options may take several forms, for example:
- Local storage on computers, networks or servers;
- Cloud storage offered by the VU;
- Locations where physical data samples are stored (fridges, lockers, etc.).
Researchers, including PhD candidates, have multiple options that can be used, some of which are listed below. More information about these storage options is available behind their respective links. The Storage finder is a tool that will give you a number of storage options suitable for your research. For more individual guidance, please get in touch with the Research Data Management Support Desk for advice, particularly when you are working with commercial, personal or otherwise sensitive data, or when you have a complex IT setup.
Standard services offered by the VU
VU IT offers several services for employees to store their files. Examples are:
🔒 OneDrive: personal storage for all VU employees and part of the Microsoft 365 platform. OneDrive allows you to store files locally and in the Microsoft cloud, and share folders and documents with colleagues. Since this is personal storage, tied to someone’s personal VU account, we don’t usually recommend storing research data in OneDrive: if the account holder leaves the VU, the account and all the data on it, disappear.
🔒 Teams. Faculties, divisions and departments have their own Team - part of the Microsoft 365 platform - where they store shared documents and where they can interact and chat. Projects may also request a project team. But note that Teams is not always the best location to store your research data and has several limitations, especially when it comes to working with non-Microsoft file formats, large volumes of data, interacting with data, and collaborating with partners outside of the VU. Contact the RDM Support Desk to find out more about the suitability of Teams for your project.
🔒 Surfdrive: is a personal cloud storage service for the Dutch education and research community, offering staff, researchers and students an easy way to store, synchronise and share files in the secure and reliable SURF community cloud. All users receive storage space of up to 500 GB. Surfdrive is automatically offered to all VU employees. Since Surfdrive is personal storage, like OneDrive, we do not usually recommend it for research data
Research data-specific storage options
The options above are standard data storage options at the VU to which all employees have access. But the VU also offers storage specifically for research data. Some of them are hosted locally at the VU, while others are SURF cloud services. When selecting a cloud-based service it is important to remember to check where the data will be hosted. If the research project involves sensitive data it may be necessary to choose cloud-based options that guarantee that the data will stay in the EEA or on servers based in the EEA.
🔒 SciStor (short for ‘Storage for Scientists’): This is storage hosted by IT for Research (ITvO) and allows for inexpensive storage of large volumes of data. There are various levels of security possible and various ways to get access to the files. SciStor is only intended for ongoing research, not for archiving.
Yoda (short for Your Data) is a cloud storage at SURF and is suitable for storing large-scale and sensitive datasets. Yoda also supports collaborating on projects in and outside the VU and adding contextual information (metadata) about your dataset as you go. Yoda is usually the best choice if your research data are very sensitive.
🔒 Research Drive is a cloud storage at Surf for research projects and is suitable for collaboration in and outside the VU, for storing sensitive data and large-scale research projects. You can also encrypt data in Research Drive using several tools. You are able to request storage space in Research Drive via a 🔒 web form in the selfservice portal (VU employees only). Research Drive is the best choice if you need to manage access rights on a folder level. More general information about Research Drive can be found here, and its wiki pages, including tutorials, are here.
There are differences between Research Drive and Yoda and each one may support certain projects better than others. The storage finder can help you to get an idea of what would be the best choice for your project, but get in touch with the RDM Support Desk for more details.
Sending research data to partners
Some projects may require data sharing with partners. Although Research Drive and Yoda support sharing data all through the project, it may also be the case that some data only need to be sent to a partner once. There are some secure options to send data to research partners:
🔒 Surf Filesender: cloud service that allows you to send files of 1 Terabyte to other researchers and encrypted files of up to 250 GB.
🔒 Zivver: All employees of Vrije Universiteit Amsterdam can use Zivver, the encryption programme that allows you to send email or data (sensitive or otherwise) securely by email. Attachments will also be encrypted and can be several Terabytes in size (max = 5 TB). Specific information on how to get and use Zivver are available on the selfservice portal. General explanations on how to use it are available at the Zivver website.
What is Data Protection?
Protection from what? From whom? When, and why? Before we talk about data protection, let us consider security first. More often than not, ‘security’ is regarded as a fixed state. In reality, security is an assessment of the level of protection against a certain threat, that you consider to deal with that threat adequately enough. Whether or not security is accurate depends on the value of the data and the quality of protective measures.
The question for you as a researcher is ‘when are the measures that you take secure enough?’. In order to answer this, please be aware that there are three entities that have an opinion about what is ‘secure enough’, namely: the law, the University, and you yourself as the data processor.
The University has a Security Baseline that sets a norm for levels of protection for every application it uses. The Baseline is based on international standards. For each of these applications, the University is considering for which means the security of these applications are adequate enough.
The legal requirements for the processing of personal data can be found in the section ‘GDPR and Privacy’ under Plan & Design There are additional laws and regulations as well. The assumption is that you are familiar with these, especially with laws regulating medical and criminal research.
What you personally consider to be secure might be very different from what your colleagues, the Faculty or the University considers to be secure enough and the norms will vary with the variety of data that is being processed by different researchers and Faculties of the VU. Very generally speaking, there are three points of protection to consider:
- Protection against data loss, for which you need a back up periodically.
- Protection against data leakage, for which you need to consider all storage places and their access points.
- Protection of data integrity, for which you need version control and synchronisation management.
The security of your protection measures depends on the threat you face. We often think of threats as active, and motivated by bad intentions. But most common forms of data loss are accidental and most leakage is caused by trusting others. In reality, devices just get lost or break down, people download malware by accident, and each one of us forgets to save a document at times or gets confused about which version was last updated.
In all cases, protection starts with oversight on where your data is stored and processed. If you forget that you temporarily stored it in a certain place, you have then lost oversight of where that data is. The opposite is also true: if you know where you data is, you have insight in the level of security of the space in which you store it. As you can see, protection begins with organising your work in a reliable manner and thinking through your steps.
For example, if you data is on your laptop and synchronised with your phone, then it is stored in two places. Perhaps this is enough back up, perhaps not. If you put both you devices in the same bag and you lose your bag, you have no backup. A backup to an online storage might be a good solution, but might also mean your data leaks via the internet of via the storage provider who sells the data and your behavioural data for profit. Most importantly, there is no absolute security. It is best if you consider your personal behaviour and then think of scenarios that are more or less likely to happen and what would impact you most. If you frequently work in public places you should make it a habit to lock your device each time you leave it. If you eat and drink behind your desk often, better work with a remote keyboard to protect your laptop from the unavoidable coffee shower. Do you save your respondents’ contact details on your personal phone? Then protect it with a pin.
Here are some basic protection guidelines:
- Data are very difficult to erase. You have probably never done it.
- Decide how to back up data and test it before you rely on it.
- Do not give others your log-in credentials. If you have done so and your family members use your work device, then change it.
- Do not use passwords twice, do not use your birthday, initials, streetname, hobby.
- Encryption sounds secure, but it fails completely without good password management.
Data Protection
There can be many reasons why the data of a project needs to be kept protected:
- Sensitivity of the data collected
- Protection of the research data from competition
- Commercial reasons / Intellectual property
- Etc.
There are also many levels of security that may be implemented, depending on the needs. Sometimes it will be enough to use a password-protected cloud-based server. In extreme cases encryption may be needed and also when data is transmitted between researchers or organisations. You should contact the RDM Support Desk to discuss available options, who may connect you to legal experts where sensitive data is concerned. Check the Data Storage topic for links to find out more on campus solutions and cloud-based options.
Safe Transportation and Transfer
It is important to protect your data during the entire data life cycle. To find out whether your data are secure during all stages of your research, think about your data flow: where do your data originate and where do they go to? If data need to be transported from one physical place to the other, or need to be transferred from one device to another, these actions should happen in a secure way.
Transferring digital data
Online connection on campus
If data collection takes place through a certain measurement device (e.g. MRI scanner, EEG scanner, eye tracker), the data need to be transferred from the measurement device to the storage location that you will use during your research project. Make sure that this transfer takes place in a secure way and also make a plan for the data on the measurement device; find out whether they need to be destroyed or can remain there.
Online connection outside campus (with and without VUnetID)
If you are doing fieldwork outside the campus and you have reliable and secure internet access, it is a good idea to upload the data to a storage location that is regularly backed up and secure, in order to prevent data loss. If you have a VUnetID, you can for example use:
- SURFdrive to store your data in a secure cloud service
- SURFfilesender to send you data to a colleague or consortium partner, who can store your data in an appropriate place
You can find more information about each of these storage options in the Data Storage topic.
If you need to receive data from colleagues in your project who don’t have access to these tools (e.g. because they are students, don’t work for a Dutch educational institution, or have no VUnetID), SURFdrive, SURFfilesender and Edugroepen can also be used:
- SURFdrive: you can set up a ‘File drop’ folder. By sharing the link of this folder to the researchers who need to upload documents, you enable them to do anonymous uploads to this folder. These users have solely upload rights, no view or download rights. The folder can be protected with a password, which you preferably share with the uploaders through another channel.
- SURFfilesender: as a SURFfilesender user, you can send a voucher to someone who doesn’t have access to this tool. This person can use this voucher to send documents to you. These files can be encrypted.
- 🔒 Zivver is an email plugin with which you can encrypt emails and attachments.
Offline data outside campus
If you are doing fieldwork in an area with limited internet access, you might use a portable device to initially store your data during the phase of data collection, such as a USB drive or an external hard drive. These data can be transferred to a storage location that is connected to the internet (e.g. G-drive, SURFdrive) later. Please make sure that the data on such portable devices are secured, by using encryption (and by transporting them safely by using a lockable briefcase or backpack).
Transporting physical data
If physical objects need to be transported, you should check with the data manager at your department (if available) what options are available. Special briefcases that can be locked or secure backpacks may need to be used to keep informed consent forms or other sensitive data objects (USB drives etc.) secure during transport.
Data transportation and transfer across borders
Some countries have rules to control the movement of encryption technology that enter or exit their borders. If you need to travel with an encrypted laptop to secure your data, for example during fieldwork abroad, please keep this in mind. If you need to transfer data in and out of such countries, please get advice on encryption and secure transportation at the IT Service Desk.
If you have general questions about how to protect your data when transporting or transferring them, you can contact the IT Service Desk. In case of complex situations for which you need tailored support, you can consult the IT Relationship Manager representing the research domain, who can request capacity at IT for setting up an information security plan. Such a plan is usually based on documents which need to be completed beforehand, like a Data Protection Impact Assessment and a Data Classification. Please note that IT-capacity for tailored support is a paid service for which budget needs to be reserved.