Biodiversity Information Science and Standards : Conference Abstract
|
Corresponding author: Steve Kelling (stk2@cornell.edu)
Received: 31 Mar 2018 | Published: 18 May 2018
© 2018 Steve Kelling
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation: Kelling S (2018) An Integrated Data Quality System for Species Observations. Biodiversity Information Science and Standards 2: e25395. https://doi.org/10.3897/biss.2.25395
|
Species-level observational data comprise the largest and fastest-growing part of the Global Biodiversity Information Facility (GBIF). The largest single contributor of species observations is eBird, which so far has contributed more than 361 million records to GBIF. eBird engages a vast network of human observers (citizen-scientists) to report bird observations, with the goal of estimating the range, abundance, habitat preferences, and trends of bird species at high spatial and temporal resolutions across each species’ entire life-cycle. Since its inception, eBird has focused on improving the data quality of its observations, primarily focused in two areas:
In this presentation I will review how this is done in eBird.
Standardized Data Collection. eBird gathers bird observations based on how bird watchers typically observe birds with units of data collection being “checklists” of zero or more species including a count of individuals for each species observed. Participants choose the location where they made their observations and submit their checklists via Mobile Apps (50% of all submissions) or the website (50% of all submissions). All checklists are submitted in a standard format identifying where, how, and with whom they made their observations. Mobile apps precisely record locations, the track taken, and the distance they traveled while making the observations. The start time and duration of surveys are also recorded. All observers must report whether they reported all the birds they detected and identified, which allows analysts to infer absence of birds if they were not reported. All data are stored within an Oracle data management framework.
Data Accuracy. The most significant data quality challenge for species observations is detecting and correctly identifying organisms to species. The issue involves how to handle both false positives — the misidentification of an observed organism, and false negatives—failing to report a species that was present. The most egregious false positives can be identified as anomalies that fall outside the norm of occurrence for a species at a particular time or space. However, false positives can also be misidentifications of common species. These challenges are addressed by:
In 2017, 4,107,757 observations representing 4.6% of all eBird records submitted were flagged for review by the data driven filters. Of these records 57.4% were validated and 42.6% were invalidated.
Data quality, observational data, eBird, citizen science
Steve Kelling
The National Science Foundation (ABI sustaining: DBI-1356308)