Biodiversity Information Science and Standards : Conference Abstract
PDF
Conference Abstract
Machine-Actionable Metadata in Practice: Lessons From Automating FAIR Assessment in Plant-Pollinator Datasets
expand article infoDebora P Drucker, Filipi Miranda Soares§, Jorrit Poelen|,, José Augusto Salim#
‡ Embrapa, Campinas, Brazil
§ INRAE, MISTEA, Montpellier, France
| UC Santa Barbara, Santa Barbara, CA, United States of America
¶ Ronin Institute, Montclair, NJ, United States of America
# Universidade de São Paulo, São Paulo, Brazil
Open Access

Abstract

Plant-pollinator interactions play a pivotal role in ecosystem functioning and sustainable agriculture. However, plant-polinator datasets are scattered across various networks, in country-specific initiatives, and stored in isolated silos, making them difficult to access by scientists and decision-makers. By promoting the adoption of Findable, Acessible, Interoperable, and Reusable (FAIR) data standards (Wilkinson et al. 2016) across multiple initiatives worldwide, we are working to transform the fragmented nature of these datasets and make data on plant-pollinator interactions widely available. 

As the biodiversity community advances towards FAIR data, machine-actionable metadata has emerged as a critical enabler for scalable data assessment, discovery, and reuse. However, while FAIR principles emphasize machine-readability, many datasets are still evaluated manually or lack structured metadata entirely, limiting their integration into global platforms. This study shares practical insights from the WorldFAIR Agricultural Biodiversity Case Study, in which we operationalized machine-actionable FAIR metadata for the review of plant-pollinator interaction datasets (Drucker et al. 2024).

We developed a semi-automated workflow to assist in evaluating datasets against the FAIR principles using tools from the Global Biotic Interactions initiative (GloBI, Poelen et al. 2014). The GloBI bots "Nomer" and "Elton" can read structured metadata from standard vocabularies such as Darwin Core (DwC), Ecological Metadata Language (EML), and the Plant-Pollinator Interactions (PPI) vocabulary. Nomer focuses on taxonomic alignment with several taxonomic catalogues, such as GBIF Backbone and Catalogue of Life. Elton extracts species interactions from datasets of various structures and formats, including DwC-Archives.

By relying on machine-readable metadata, the bots were able to flag inconsistencies, suggest improvements, and generate repeatable reports across pilot projects in Argentina, Brazil, the African continent, Kenya, Colombia, East Africa, Central Asia, and the USA (for example, Elton et al. 2025). This helped researchers assess dataset interoperability without needing full access to the raw data, a crucial feature given legal and institutional access constraints. To make the data review report readable for researchers, GloBI's bots produce a document resembling a data publication, complete with a title, authors, publication date, abstract, introduction, and other relevant sections. 

Our results underscore the transformative role of machine-actionable metadata in biodiversity data governance. Automating FAIR assessments not only increases transparency and repeatability but also accelerates the integration of datasets into platforms like GloBI. While human expertise remains essential, tools like Nomer and Elton demonstrate that FAIR assessment can evolve beyond checklists to become dynamic, scalable, and integrated into the data lifecycle. Resources and code are openly available in the online repositories Zenodo*3 and GitHub*2, and are summarized in the GloBI platform*1.

To help alleviate the burden of manually reviewing data as part of scientific publication review, we propose deploying domain-specific, automated data review processes that enable researchers to better understand how to make their data easier to review and reuse. Recognizing that publishing reusable, integrated data remains mostly a manual process, we recommend that plant-pollinator and species interaction datasets be registered with one or more infrastructures (e.g., GloBI, GBIF) to benefit from the domain-specific data review services they offer. We suggest that data publishers continue to collaborate on building, maintaining, and improving similar infrastructures to assess and increase the quality and FAIRness of published scientific data.

Through the adoption of standards such as Ecological Metadata Language, Darwin Core, Plant-Pollinator Interactions Vocabulary, and Relation Ontology, we aim to enhance the understanding of how plant-pollinator interactions contribute to sustaining life on Earth while ensuring that data is easily findable, accessible, interoperable, and reusable for further research and analysis (FAIR).

Keywords

biotic interactions, interoperability, data review, data publication

Presenting author

Debora P Drucker

Presented at

Living Data 2025

Acknowledgements

WorldFAIR Agricultural Biodiversity Case Study collaborators

Grant title

This work was developed as part of the Global cooperation on FAIR data policy and practice (WorldFAIR), funded by the EC HORIZON-WIDERA-2021-ERA-01-41 Coordination and Support Action under Grant Agreement No. 101058393, and we acknowledge the support provided by the project partners and resources.

Conflicts of interest

The authors have declared that no competing interests exist.

References

Endnotes
login to comment