Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Roselyn Gabud (rsgabud@up.edu.ph)
Received: 10 Sep 2023 | Published: 11 Sep 2023
© 2023 Roselyn Gabud, Nelson Pampolina, Vladimir Mariano, Riza Batista-Navarro
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Gabud R, Pampolina N, Mariano V, Batista-Navarro R (2023) Extracting Reproductive Condition and Habitat Information from Text Using a Transformer-based Information Extraction Pipeline. Biodiversity Information Science and Standards 7: e112505. https://doi.org/10.3897/biss.7.112505
|
Understanding the biology underpinning the natural regeneration of plant species in order to make plans for effective reforestation is a complex task. This can be aided by providing access to databases that contain long-term and wide-scale geographical information on species distribution, habitat, and reproduction. Although there exists widely-used biodiversity databases that contain structured information on species and their occurrences, such as the Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA), the bulk of knowledge about biodiversity still remains embedded in textual documents. Unstructured information can be made more accessible and useful for large-scale studies if there are tools and services that automatically extract meaningful information from text and store it in structured formats, e.g., open biodiversity databases, ready to be consumed for analysis (
We aim to enrich biodiversity occurrence databases with information on species reproductive condition and habitat, derived from text. In previous work, we developed unsupervised approaches to extract related habitats and their locations, and related reproductive condition and temporal expressions (
In this work, we implement an information extraction (IE) pipeline comprised of a named entity recognition (NER) tool and our hybrid relation extraction (RE) tool. The NER tool is a transformer-based language model that was pretrained on scientific text and then fine-tuned using COPIOUS (Conserving Philippine Biodiversity by Understanding big data;
relation extraction, biodiversity
Roselyn Gabud
TDWG 2023