Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Michael J Elliott (mielliott@ufl.edu)
Received: 27 Jul 2022 | Published: 01 Aug 2022
© 2022 Michael Elliott, Jorrit Poelen, Jose Fortes
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Elliott MJ, Poelen JH, Fortes JA (2022) Signed Citations: Making citations of digital scientific content persistent. Biodiversity Information Science and Standards 6: e90911. https://doi.org/10.3897/biss.6.90911
|
Digital data are a foundation of 21st century science. In order to maintain a stable foundation, the FAIR Guiding Principles (
We propose signed citations, i.e., customary data citations extended to include a standards-based, secure, unique, and fixed-length digital content signature. A content signature is a code that is unique to the data it identifies and can be reliably recovered from the data. For example, the signature of a dataset could be the SHA-256 hash (
If a content signature registry is available which links content signatures to one or more (possibly temporary) known content locations, then content signatures can themselves be used to find identified data. That is, registries make content signatures “resolvable” just like URLs and DOIs. Additionally, signed citations are location- and storage-medium-agnostic, allowing the making of as many copies of cited data as necessary to ensure content persistence across current and future storage media and data networks. As a result, content signatures can be leveraged to help scalably store, locate, access, and independently verify content across new and existing data repositories, search engines, and registries (such as those that exist within services offered by Zenodo, DataOne, and the Software Heritage archive) without requiring any time-sensitive information (e.g. URLs or references to specific infrastructures) to be baked into the citation.
Signed citations can also be used to reliably identify complex data networks and knowledge graphs. By embedding content signatures inside content and then citing that content with a signed citation, a secure (unforgeable, irrevocable, self-verifying) link is formed between the cited content and those identified by embedded content signatures. Such links create secure data graphs that are annotatable and machine-traversable, acting as a mechanism for manual and automated discovery, which are vital to findability according to the FAIR guidelines (
Our proposal originates from our earlier work on reliable dataset identifiers (
citation standards, data persistence, verification, provenance
Michael J Elliott
TDWG 2022
This work was funded by grants from the National Science Foundation (Michael Elliott and Jose Fortes were funded by DBI 202765, Jorrit Poelen was funded by OAC 1839201) and the AT&T Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the AT&T Foundation.