Biodiversity Information Science and Standards : Conference Abstract
Print
Conference Abstract
Darwin Core Spatial Processor (DwCSP): a Fast Biodiversity Occurrences Curator
expand article infoJulien Troudet, Fred Legendre, Régine Vignes-Lebbe
‡ Institut Systématique Evolution Biodiversité (ISYEB), Sorbonne Université, MNHN, CNRS, EPHE, Paris, France
Open Access

Abstract

Primary biodiversity data, or occurrence data, are being produced at an increasing rate and are used in numerous studies (Hampton et al. 2013, La Salle et al. 2016). This data avalanche is a remarkable opportunity but it comes with hurdles. First, available software solutions are rare for very large datasets and those solutions often require significant computer skills (Gaiji et al. 2013), while most biologists are not formally trained in bioinformatics (List et al. 2017). Second, large datasets are heterogeneous because they come from different producers and they can contain erroneous data (Gaiji et al. 2013). Hence, they need to be curated. In this context, we developed a biodiversity occurrence curator designed to quickly handle large amounts of data through a simple interface: the Darwin Core Spatial Processor (DwCSP). DwCSP does not require the installation or use of third-party software and has a simple graphical user interface that requires no computer knowledge. DwCSP allows for the data enrichment of biodiversity occurrences and also ensures data quality through outlier detection. For example, the software can enrich a tabulated occurrence file (Darwin Core for instance) with spatial data from polygon files (e.g., Esri shapefile) or a Rasters file (geotiff). The speed of the enriching procedures is ensured through multithreading and optimized spatial access methods (R-Tree indexes). DwCSP can also detect and tag outliers based on their geographic coordinates or environmental variables. The first type of outlier detection uses a computed distance between the occurrence and its nearest neighbors, whereas the second type uses a Mahalanobis distance (Mahalanobis 1936). One hundred thousand occurrences can be processed by DwCSP in less than 20 minutes and another test on forty million occurrences was completed in a few days on a recent personal computer. DwCSP has an English interface including documentation and will be available as a stand-alone Java Archive (JAR) executable that works on all computers having a Java environment (version 1.8 and onward).

Keywords

biodiversity occurrences, software, curration, spatial data, outliers

Presenting author

Régine Vignes-Lebbe, http://www.infosyslab.fr/?q=fr/equipe/rvl

References