Data Management Plan
When planning research, it is important to consider carefully and document the ways of collecting and processing data during the research project, to specify who has access to these data and who is responsible for them, what will happen to the data after the closure of the project, etc. To do all this, it is necessary to create the data management plan and to follow it throughout the project. It is a good idea to use a tool to create a data management plan DMPONLINE Digital Curation UK
I WHAT TYPE OF DATA TO COLLECT AND HOW TO DESCRIBE THEM?
- I'll collect it myself
- (re)use my previously collected data
- I use public open data (Estonian Open Government Data Portal)
- (re)using data collected by others, (re3data)
- I buy the data
-
Keep in mind
- which version of data you reuse or purchase
- what if the author of the data uploads a new version
- store the version used and the vendor documentation on your server
- check copyrights, licenses, restrictions (access, reuse)
- check machine readability and interoperability with the planned information system -
How will the data be collected or created
- name the existing standard procedures and methods
- are there any data standards available
- how to ensure data quality (availability, integrity, confidentiality)
- how do you handle errors (input errors, problematic values) -
Data description
- data types (experiment, observation data, survey data, video files, etc.)
- how new data integrates with existing data
- which data deserve long-term preservation
- if some datasets are subject to copyright or intellectual property rights, show that you have permission to use the data
II HOW TO STORE AND SECURE YOUR DATA
-
Data formats
- point out and explain the data formats you have chosen
- use open formats
- use standard formats
- use machine-readable formats
- find out if the format allows automatic metadata insertion
- check if the repositories support the selected formats
Recommended data formats:
File Formats. Open Data Handbook
File Formats. Data Archiving and Networked Services -
Estimate the data volume at the end of the project. It implicates several aspects:
- preservation
- access
- backup
- data exchange
- hardware and software
- technical support
- expenses -
Organization of data
- be systematic and consistent
- naming files: simple, logical, without abbreviations or with standard abbreviations (countries, languages, units of measurement, methods)
- abbreviations in one language throughout
- file organization (options: project name, time, place, collector, material type, format, version)
- folder structure should be hierarchical, simple, logical, short
- copying files to multiple locations is not a good practice; store in one location, create shortcuts
- version control system git
- cloud-based code repository GitHub
- metadata (who is responsible for adding metadata)
Article: Data Organization in Spreadsheets -
Data documentation
- Use this guide for data documentation:
Siiri Fuchs, & Mari Elisa Kuusniemi. (2018, December 4). Making a research project understandable - Guide for data documentation (Version 1.2). Zenodo. DOI: http://doi.org/10.5281/zenodo.1914401
- a README text file is included with the data files and should contain as much information as possible about the data files to allow others to understand the data. Create one README.txt file for each database - always name it as README.txt or README.md (Markdown), not readme, ABOUT, etc.
The README.txt file should contain the following information:
- title of the dataset
- dataset overview (abstract)
- file structure and relationships between files
- methods of data collection
- software and versions used
- standards
- specific information about data (units of measurement, explanations of abbreviations and codes, etc.)
- possibilities and limitations of data reuse
- contact information for the uploader of the dataset
Guidelines for creating a README file -
Metadata
- administrative metadata, project details (ID, funder, rights and licences)
- technical metadata (hardware and software, instruments, tools, access rights)
- descriptive metadata (author, title, abstract, subject terms)
- DataCite Metadata Framework (mandatory, recommended, optional metadata) on DataCite Estonia Consortium webpage
- metadata standards indicate which fields should be filled: Directory of Metadata standards
- free online efix reviewer: all hidden metadata info of document, audio, video, e-book, spreadsheet and image files)
- controlled metadata dictionaries and classifications tell you what to write in these fields, using standard terminology. BARTOC (Basel Register of Thesauri, Ontologies & Classifications)
Examples:
- Estonian Subject Thesaurus
- Agrovoc thesaurus
- Mammal Species of the World
- JACS education subject classifications
- GeoName
III ARE YOU PERMITTED TO GIVE ACCESS TO YOUR DATA AFTER THE END OF THE PROJECT.? WHO ACCESSES THEM, UNDER WHICH CONDITIONS AND FOR HOW LONG?
-
Secure storage, backup, transfer and recovery
The goal is to maintain data quality:
- availability and accessibility
- integrity (correctness, completeness and timeliness)
- confidentiality (only available to authorized persons or systems, key management, storage of log files)
Storage:
- cloud environments
- central servers
- sensitive data servers
- hard disk drive
- external hard drive
- mobile devices
Backup: creating a copy of the current status of data and/or programs that, after an security incident, allows you to restore it to its known current state
- maintaining and backing up the master file
- rule 3-2-1 (store your data in 3 copies on 2 different memory devices from which 1 is afar)
- who is responsible, especially for mobile devices
Carry out a risk analysis: what if ....
- IT systems are down
- power outages, water and fire accidents
- the device is lost or stolen
- malware is discovered in devices
- a team member leaves or dies, etc.
Risk weighing (probability and losses)
Risk assessment: threats and their likelihood, weaknesses, measures
Information security standard ISO / IEC 27001 -
Access to data, information security
- management of access rights (same for all, contractual rights, temporary labor rights)
- storing log files
- pseudonymization, encryption, key management
- data exchange, personal data, third countries
- organizational and physical security: training of a new employee, possible problems with the outgoing workers, internal rules of procedure, fire safety, locking the doors.
- who is responsible for information security? -
Long-term preservation
FAIR Data
- what data has long-term value? Preserving and sharing it for reuse
- preparing data for sharing, FAIR data
- repository selection
How to make data findable (F)
- the data have a permanent identifier DOI. See DataCite Estonia
- metadata is in the DataCite registry
- standard metadata like Dublin Core ore use other standards
- machine-readable metadata
- data and relevant metadata are in separate files but linked
- keywords and subject terms
- version management
How to make the data accessible (A)
- choose the repository where the data is stored
- which data is open access e. open data
- which data will remain closed and for what reason
- metadata must be open even when the data is not open (exceptions like rare species location)
- technical metadata: required software (version), instrument specifications, software tools
How to make data interoperable with other computer systems (I)
- mainly the task of the repository
- what data and metadata standards, controlled vocabularies and taxonomies are used
- description of data types: if not standard, how interoperability is ensured
- linking to other data, metadata, and specifications
- data exchange standards
How to ensure data reusability. Partially repository task (R)
- partly a task of the repository
- is it raw, cleaned or processed data
- embargo period, grounds
- licenses
- citing: DataCite citation formater
- standard metadata, which (domain) standards are used
- provenance of the data (who, where, what, where, published)
- which software version is used
- how long is the data available for re-use
- data quality assurance (availability, integrity, confidentiality)
- suggestions who might need this data (in README.txt)
-
Data sharing
- is the data shared in a repository, or as a supplementary data of an article, or as a separate data article in a data journal
- in which repository is the data stored
- who might find this data useful
- how do you share your data (open data, or you have to ask for data)
- when do you share (at once, after publication of the article, after embargo period)
- is the data linked to a publication
- link to your ORCID account -
Access restrictions
- which data is open access, open data
- which data will remain closed and for what reason
- any encrypted data
- authentication, who gives access rights
- whether you need to create a user account under certain terms -
Who will be responsible for data management
- by positions
- principal investigator (PI): Data Management Policy, DMP, contracts, costs, training
- researchers: follow and improve DMP, data management, problem solving
- data manager: training, consulting, information security, backup, hardware and software
- laboratory assistant, support staff: according to their tasks
- by workflow
- who is responsible for data collection, documentation, metadata, data security, etc.
Look also TU Delft RD Policy -
Planned costs
- costs are mainly related to manpower, hardware and software
- guides, training, lawyer and/or DPO consultation, translation service APC
- data collection: purchase of data, transcription of recorded interviews
- digitization and OCR: hardware and software, manpower
- software development or software purchase, user licenses
- hardware: computers, servers, instruments, field work equipment
- data analysis: hardware and software, outsourced services
- data storage and backup: predictable data volume, rule 3-2-1
- long-term storage of data: preparation for sharing (formatting), anonymisation
- data storage in a repository
- partner meetings, conferences
- project data manager
- consideration: 5% of the project budget
Data Management Plans examples and instructions:
DMP Tuuli Public DMP templates
Digital Curation UK Example DMPs and guidance
Digital Curation UK: Data Management Plans
Public Data Management Plans created with the DMPTool - RIO
Public DMPs: Royal Danish Library / Technical University of Denmark
Public DMPs: DMPTool
Research Data Nederlands The what, why and how of data management planning
Source: Data Management Plan (DMP) University of TartuLibrary
ASK ABOUT RESEARCH DATA MANAGEMENT, STORAGE, DATA MANAGEMENT PLAN AND REPOSITORY SELECTION
Katrin Bobrov
katrin.bobrov@taltech.ee
620 3551