Biodiversity Information Science and Standards : Conference Abstract
|
Corresponding author: Nicky Nicolson ( nicky.nicolson@brunel.ac.uk)
Received: 30 Apr 2019 | Published: 13 Jun 2019
© 2019 Nicky Nicolson, Alan Paton, Sarah Phillips, Allan Tucker
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation: Nicolson N, Paton A, Phillips S, Tucker A (2019) Integrating Collector and Author Roles Across Specimen and Publication Datasets. Biodiversity Information Science and Standards 3: e35866. https://doi.org/10.3897/biss.3.35866
|
This work builds on the outputs of a collector data-mining exercise applied to GBIF mobilised herbarium specimen metadata, which uses unsupervised learning (clustering) to identify collectors from minimal metadata associated with field collected specimens (the DarwinCore terms recordedBy, eventDate and recordNumber). Here, we outline methods to integrate these data-mined collector entities (large scale dataset, aggregated from multiple sources, created programatically) with a dataset of author entities from the International Plant Names Index (smaller scale, single source dataset, created via editorial management). The integration process asserts a generic "scientist" entity with activities in different stages of the species description process: collecting and name publication. We present techniques to investigate specialisations including content - taxa of study - and activity stages: examining if individuals focus on collecting and/or name publication. Finally, we discuss generalisations of this initially herbarium-focussed data mining and record linkage process to enable applications in a wider context, particularly in zoological datasets.
Nicky Nicolson