Biodiversity Information Science and Standards :
Standards
|
Corresponding author: Lissa Breugelmans (lissa.breugelmans@plantentuinmeise.be)
Academic editor: Gail Kampmeier
Received: 06 Oct 2023 | Accepted: 02 Nov 2023 | Published: 29 Nov 2023
© 2023 Lissa Breugelmans, Maarten Trekels
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Breugelmans L, Trekels M (2023) Implementation Experience Report for the Developing Latimer Core Standard: The DiSSCo Flanders use-case. Biodiversity Information Science and Standards 7: e113766. https://doi.org/10.3897/biss.7.113766
|
|
TDWG, collection descriptions, dashboard, natural science collections
Natural science collections are a primary resource for mapping out the world’s biodiversity through the long-term preservation of collected specimens (
To accomplish this goal, it is important to facilitate interoperability between major registries holding information on the collections and institutions, for example, the Global Registry of Scientific Collections (GRSciColl), Index Herbariorum, the registry of the Consortium of European Taxonomic Facilities (CETAF registry) and the Distributed System of Scientific Collections (DiSSCo). The development of the Latimer Core standard is aimed at increasing the FAIRness (Findable, Accessible, Interoperable and Reusable) of data on collections (
This implementation experience report is initiated from the DiSSCo Flanders*
Overview of the DiSSCo Flanders consortium. Participating partner institutions: Flanders Marine Institute (VLIZ), Ghent University (UGhent), Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), University of Antwerp (UAntwerp), Royal Zoological Society of Antwerp (KMDA), Botanic Garden Meise (MeiseBG), Katholieke Universiteit Leuven (KULeuven), Research Institute for Nature and Forest (INBO), Vrije Universiteit Brussel (VUB), The Belgian Association of Botanic Gardens and Arboreta (V.B.T.A.), University of Namur (UNamur), Université Libre de Bruxelles (ULB), Royal Museum for Central Africa (RMCA) and Royal Belgian Institute of Natural Sciences (RBINS). Figure by Frederik Leliaert under CC BY 4.0.
Based on the earlier work of the Natural Collection Descriptions (NCD) group, the Biodiversity Information Standards (TDWG) Collection Descriptions Interest Group is developing the Latimer Core standard (
In order to facilitate the development of the standard, the approach was taken to create a GitHub issue*
During the development phase, it was clear early on in the process that a need existed to implement real-world examples using the standard. Wikibase*
As stated above, the DiSSCo Flanders project aims at obtaining high-level information on the natural science collections held in the institutions that participate in the DiSSCo Flanders project. This information consists of quantitative data on the overall size of the collections, as well as size by taxonomic groups, preservation types, stratigraphic age, geographic region and level of digitisation (
The data were extracted from the original survey spreadsheets*
Visualisation of the DiSSCo Flanders data model. Figure by Lissa Breugelmans under CC BY 4.0.
Screenshot of the landing page of the DiSSCo Flanders PowerBI dashboard. Figure by Lissa Breugelmans under CC BY 4.0.
Despite the survey being developed with the (preliminary) standard in mind, some characteristics of the design made data extraction, import into the SQL database and analysis more time-consuming than needed. Data on the collections were gathered on two levels. Collections were subdivided, based on their biogeographical origin and the following metrics were recorded: number of objects digitised, number of objects not digitised (documented), number of objects not digitised (not documented) and total number of objects. On a higher level, collections were grouped over all geographic origins and the same measurements were recorded, in addition to: number of objects with images, number of type specimens and number of specimens per MIDS level (Minimum Information on a Digital Specimen,
In general, most of the data could be relatively easily mapped to the Latimer Core terms*
While implementing the standard for the first time, it was unclear where to map the terrestrial-freshwater-marine origins of the specimens, as well as the geographical concepts that were used to describe units smaller than continents, but larger than countries or regions. In the meantime, however, an additional class, EcologicalContext (properties biomeType and biogeographicRealm), has been added to address this gap.
Several terms were defined as potentially multi-value (JSON array) fields. However, for the purpose of building the PowerBI dashboard, we were not able to find a way to join tables that used multi-value fields to extract the necessary data (PowerBI queries are constructed through its graphical user interface (GUI), using its own query language). Therefore, we introduced additional fields in the referenced tables in order to create a single value field that refers back to the parent table (e.g. a new field ofObjectGroup in the MeasurementOrFact table replaces the hasMeasurementOrFact field in the ObjectGroup table). We are unsure if the decision to work with multi-value fields was made for specific reasons (performance-related or other), but allowing for the relationship field in the other table might increase flexibility.
For the temporalCoverage class, the Latimer Core documentation suggests leaving the property EndDate blank when the collecting period is still currently running. There is, however, no term defined to use when the period is unknown, which might lead to confusion.
Finally, it would be useful to define controlled vocabularies for the classes and properties that are newly defined for the Latimer Core standard, in order to further enhance interoperability of the data. For certain properties, the use of the controlled vocabulary might be recommended but not mandatory, in order to allow for flexibility.
The DiSSCo Flanders use case surveyed the content of regional Flemish collections. The smaller research collections and living plant collections typically had only limited or no online representation of their content. Even a rough inventory of many collections was lacking. The standardised survey ensured that the content of the collections can be evaluated against each other. This also made it possible to have a graphical representation of the collections through a PowerBI dashboard, which is instrumental in increasing the visibility of the collections for scientists and policy-makers.
Although the survey design proved to be suboptimal with respect to the current version of the Latimer Core standard*
From the DiSSCo Flanders use case, four recommendations can be formulated. First, the suboptimal design of the survey shows that there is a clear need to create guidance on performing this kind of exercise. Future surveys in other consortia and institutions could clearly benefit from having a design blueprint for the survey. This is, however, an endeavour that should be performed at a larger scale with many problems and pitfalls. Large scale infrastructures, such as the future DiSSCo infrastructure in Europe or the iDigBio (Integrated Digitised Biocollections) initiative in the United States, have to play a key role in providing tests at a larger scale. The tools and training material that are created with this effort should be disseminated and maintained by these infrastructures. Secondly, it is advisable to further develop controlled vocabularies for the newly-adopted Classes and Properties in order to maximise the interoperability of the data. In order to make the data available on a worldwide scale, the third recommendation is that the LatimerCore standard is implemented in the main collection registries (e.g. GRSciColl, CETAF registry). Finally, making it easy for institutions to publish a Latimer Core record once in a registry, would reduce the redundancy for collections to fill out and modify their records in several places.
The authors are grateful for the discussions around the implementation with the TDWG Collections Descriptions Interest Group. The authors also sincerely appreciate the time and effort invested by the reviewers, Barbara Thiers and Thomas McElrath and the technical editor, Gail Kampmeier, whose insightful remarks and thoughtful suggestions significantly improved the quality of the manuscript.
The work presented in this report was funded by the Research Foundation – Flanders (FWO) as part of the Flemish contribution to the DiSSCo Research Infrastructure under grant n° I001721N (DiSSCo Flanders project).
The Wikibase cloud environment was updated through time, going from an experimental set-up to a service provided by Wikimedia Germany. This resulted in updated URLs for the wikibases. Currently the sandbox wikibase is hosted at https://tdwg-cd.wikibase.cloud/. A more up-to-date version of the standard is implemented at https://latimer-core.wikibase.cloud/
The original survey is located at https://zenodo.org/records/6511351
The populated MySQL database can be found at https://doi.org/10.5281/ZENODO.8214927
This interactive dashboard provides an overview of the nature and size of the collections that each institute houses. In addition, it also provides information on the geographic origin of the specimens in the collections and on the degree to which the collections are digitised.
This dashboard will be integrated in the website of DiSSCo Flanders*
The dashboard can be accessed directly through the following link.