Biodiversity Information Science and Standards : Conference Abstract
|
Corresponding author: Barnaby E Walker (b.walker@kew.org)
Received: 11 Jun 2019 | Published: 18 Jun 2019
© 2019 Barnaby Walker, Tarciso Leão, Steven Bachman, Eve Lucas, Eimear Nic Lughadha
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation: Walker B, Leão T, Bachman S, Lucas E, Nic Lughadha E (2019) Addressing Uncertainties in Machine Learning Predictions of Conservation Status. Biodiversity Information Science and Standards 3: e37147. https://doi.org/10.3897/biss.3.37147
|
Extinction risk assessments are increasingly important to many stakeholders (
The wide range in sources of species occurrence records can lead to data quality issues, such as missing, imprecise, or mistaken information. These data quality issues may be compounded in databases that aggregate information from multiple sources: many such records derive from field observations (78% for plant species in GBIF;
Machine learning models based on species occurrence records have been reported to predict with high accuracy the conservation status of species. However, given the black-box nature of some of the better machine learning models, it is unclear how well these accuracies apply beyond the data on which the models were trained. Practices for training machine learning models differ between studies, but more interrogation of these models is required if we are to know how much to trust their predictions.
To address these problems, we compare predictions made by a machine learning model when trained on specimen occurrence records that have benefitted from minimal or more thorough cleaning, with those based on records from an expert-curated database. We then explore different techniques to interrogate machine learning models and quantify the uncertainty in their predictions.
IUCN Red List, machine learning, natural history collections, uncertainty, conservation assessment
Barnaby E Walker
Biodiversity_Next 2019