Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Yi-Ming Gan (ymgan@naturalsciences.be)
Received: 25 Oct 2024 | Published: 28 Oct 2024
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation:
Gan Y-M, Benson A, Mayorga E, Pye J, Formel S (2024) What Matters for an occurrenceID and What Is an occurrenceID That Matters? Biodiversity Information Science and Standards 8: e140268. https://doi.org/10.3897/biss.8.140268
|
In the Darwin Core data standard (
While data providers tend to track data on tangible individual components (e.g., species, location, sample), generating "Occurrence records" typically requires pivoting and/or joining tables associated to these components. Maintaining a stable and persistent occurrenceID for an Occurrence record created through data transformation is not an easy task. This is especially true for long-term monitoring datasets, where the underlying tables used to generate Occurrence records are continuously updated.
Additionally, most ecological data collectors are focused on the primary use of the data, not on the long term integration and accessibility of the data. The Occurrence concept is only required in data exchange format but not needed in ecological data management practices. The disconnect between the practical data management needs of data collectors and the abstractions required for data exchange raises challenges, particularly with an increasing expectation for globally unique and persistent occurrenceIDs.
This presentation will explore the difficulties of creating and managing occurrenceIDs for data providers and managers, especially those who manage data using basic systems such as spreadsheets and simple relational databases. Maintaining stability and persistence of identifiers for inherently artificial constructs like Occurrences within the original, component-based data structure can pose significant challenges. We will explore why meaningful identifiers for occurrenceIDs are often preferred by data providers. We will unpack different use cases and delve into how and why occurrenceIDs were constructed for each use case. Through this discussion, we hope to spark a conversation that informs future data modeling efforts and addresses the inherent artificiality of Occurrences.
biodiversity data standard, Darwin Core, occurrence, data modeling, identifier
Yi-Ming Gan
SPNHC-TDWG 2024
We would like to thank various members of the Standardizing Marine Biological Data community who participated in the discussions around the topic of occurrenceID as outlined in this abstract. We appreciate the input from the community, including the co-authors, as well as Margaret O'Brien and Dean Pentcheff.
This abstract was improved for flow and grammar using ChatGPT and Gemini.