Why is it relevant?

Citizen Observatories generate data that need to be managed in a way that allows for discovering and accessing but also preserving and curating it. Good data management principles and practices maximise the value and benefit of data by ensuring that data remain robust, useful, up to date, understandable and long-lasting for our research purposes and for future uses. This will ensure that data from different origins and types can be integrated into scientific models and eventually generate applications to derive decision support tools.

How can this be done?

You can adopt a set of data management principles for your Citizen Observatory, such as those developed and adopted by GEOSS, to enhance discoverability, accessibility, usability, curation and secure preservation of the data. This involves the elaboration of a data management plan – a time-consuming process that forces you to anticipate required practices and to recognise the need to plan for the resources to put the plan into practice. Below you can find a set of data management principles and practices that will help you to manage your data in the most effective and appropriate way.

The following principles are based on the GEOSS data management principles and are presented here in a tailored version for Citizen Observatory data:

Data discoverability: To make data discoverable, metadata about the data should be elaborated and made public in a catalogue for search engines to find it. Metadata should also state how data should be accessed, used, understood and processed, preferably via formal, structured metadata based on open standards. To avoid losing information and creating confusion, metadata should be produced from the start.

Data access: Data should not be kept in silos but should be accessible via online services, including, at minimum, direct download but preferably user-customisable services for visualisation and computation. Do not wait until your data is perfect. Instead, data should be made available in advance of quality control and flagged in metadata as unchecked. Afterwards, quality-controlled data and the results of quality control will also be shared. The conditions for use, including licenses, should be decided upon and clearly included in the metadata that describes the data. Moreover, the use conditions of sensitive information (e.g. location of endangered species) need to be carefully chosen and indicated.

Data format: Data should be distributed using encodings that are widely accepted in the target user community. The use of open standards will lower the access barrier.

The generation of data should be guided by scientists and eventually exposed in scientific peer-reviewed publications that describe the origin and processing history of raw observations and derived products and their many results and outcomes. During this process, persistent, resolvable identifiers should be assigned to the data.

Acknowledgement: Data contributors should receive acknowledgement for the use of their data if they express a desire for that. Personal information should be kept secure and managed in conformance with the GDPR.

Curation: Data should be protected from loss and preserved for future use. The cost of preservation should not be underestimated and needs to be planned ahead. If the data curator cannot continue, transfer procedures should be activated.

Data should be periodically verified to ensure integrity, authenticity and readability. Data should be kept up to date in accordance with reviews, and reprocessed as needed.

Managing Citizen Observatory data should begin by planning the processes and steps for managing data: from the collection of data; the data model used; the tools needed to collect it; the metadata recorded; the means for storing, sharing and accessing it; and the visualisation, reuse and preservation of the data. You can do this by drafting a Data Management Plan (DMP).

A DMP must also take into account a common issue in citizen science projects: the treatment of personal and sensitive information, which in this case can come from the collection of personal data or the location of people, protected species or private properties. The need for privacy should be flexible, allowing citizens to opt in for programmes that track authorship in the data collection. Authorship is used in quality control estimations or to give acknowledgement for published contributions.

Some useful tools are available to facilitate the creation of a DMP, both for Citizen Observatory data and for other types of data: for example, DMPTool, OpenAIRE ARGOS, easyDMP or DMPOnline, which also includes many real DMPs as concrete examples. Having a DMP in place will ensure that you think about data management-related issues from the start. This way, you will be prepared and know subsequent budget needs.

Another good practice in data management is the selection and provision of appropriate metadata for describing data (information about the data). Providing adequate metadata both for the individual observations and for the overall data set will simplify sharing operations and allow data repositories to work together. This also helps scientists to understand the data collected and makes the data usable.

The work done in the Citizen Science Cost Action CA15212 has led to the definition of and evolution of the Data Standard for Public Participation in Scientific Research (PPSR Core), which includes metadata models for describing projects, datasets and observations.

CDM

Example from the GBIF initiative

The Global Biodiversity Information Facility (GBIF) GBIF.org is a good example of data management at full scale. Its associated services aggregates data from the GBIF network of participants and publishers (many of them being citizen science initiatives). Their data management rules and conventions support thousands of different datasets drawn from hundreds of institutions around the world. All of the descriptions of datasets in GBIF.org rely on metadata – that is, the information about data – using the open-source EML standard. Each Darwin Core Archive includes as one of its components an EML file. Common standards are the main enabler for bringing together the hundreds of millions of primary biodiversity records in the GBIF index.

Useful Resources

PRINCIPLES: The GEOSS Data Management Principles build on the GEOSS Data Sharing Principles in the sense that they adumbrate what is required in terms of data management to allow data to be promptly shared as Open Data.

TOOL: The DMP Tool is a free, open-source, online application that helps researchers create data management plans.

TOOL: DMPOnline: The DMP Roadmap helps you to create, review and share data management plans that meet institutional and funder requirements. It is provided by the Digital Curation Centre (DCC).

TOOL: ARGOS is an open service that simplifies the creation, management, validation, monitoring and maintenance of Data Management Plans.

TOOL: easyDMP is a web-service that allows a user to create, share and manage data management plans by guiding the researcher through a set of questions tailored to the recommendations of different funding agencies and research authorities. The resulting document can then be attached to the users proposal.

SCIENTIFIC PAPER: The paper Citizen Science 2.0: Data Management Principles to Harness the Power of the Crowd” addresses the challenges for engaging citizen scientist in the context of research projects. 

SCIENTIFIC PAPER: The “Study on the Data Management of Citizen Science: From the Data Life Cycle Perspective”, Data and Information Management” analyses the lifecycle and data management processes of over 1000 citizen science and Citizen Observatory projects, identifying common themes and best practices.

SCIENTIFIC PAPER: The Advice Note 1 from UKEOF’s series of Data Advice Notes highlights the principles of good data and information management, and suggests policies and procedures for data managers.

STANDARD: The Data Standard for Public Participation in Scientific Research (PPSR Core) is a set of global, transdisciplinary data and metadata standards for use in Public Participation in Scientific Research (Citizen Science) projects. PPSR Core is maintained by the Citizen Science Association (citizenscience.org) working group for Data & Metadata.

This work by parties of the WeObserve consortium is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.