Biodiversity Information Science and Standards :
Conference Abstract
Corresponding author: Eric R Sokol (
Received: 24 Aug 2022 | Published: 24 Aug 2022
© 2022 Eric Sokol, Colin Smith, Margaret O'Brien
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Sokol ER, Smith CA, O`Brien M (2022) Leveraging ecocomDP as a Flexible Intermediate Data Pattern to Expose NEON Biodiversity Data in GBIF. Biodiversity Information Science and Standards 6: e93915.
The Environmental Data Initiative (EDI) and the National Ecological Observatory Network (NEON) have been developing a flexible intermediate data design pattern for ecological community data called “ecocomDP”, which is intended to promote FAIR data principles. Specifically, this effort will enhance the discoverability of and access to biodiversity data from NEON and EDI data holdings, including data from the United States Long Term Ecological Research (USLTER) program (
Workflow to format and submit data to GBIF using ecocomDP as a flexible intermediate data format. All data packages stored in the EDI data repository are assigned digital object identifiers (DOIs), version controlled, and accessible through the EDI data portal. Investigators can (a) submit biodiversity datasets to the EDI repository for publication, which can then be (b) converted to the ecocomDP format using functions in the ecocomDP R library. NEON biodiversity data products can be (c) converted to the ecocomDP format using mappings available in the ecocomDP R library, and we are developing a process to submit the converted data to the EDI data repository. The ecocomDP data packages can then be (d) converted to Darwin Core Archives (DwC-A) that are stored in the EDI repository, which are then (e) submitted to GBIF as sampling event datasets.
EDI now has more than 70 data packages reformatted to the ecocomDP model, and has nearly finished developing a conversion of that intermediate format to a Darwin Core Archive (DwC-A, event core) format (
Taxonomic group | NEON data product ID | DOI for 2022 data release |
Breeding land birds | DP1.10003.001 | |
Ground beetles | DP1.10022.001 | |
Herptile bycatch from ground beetle sampling | DP1.10022.001 | |
Small mammals | DP1.10072.001 | |
Mosquitoes | DP1.10043.001 | |
Terrestrial plants | DP1.10058.001 | |
Ticks | DP1.10093.001 | |
Tick pathogens | DP1.10092.001 | |
Fishes | DP1.20107.001 | |
Macroinvertebrates | DP1.20120.001 | |
Microalgae | DP1.20166.001 | |
Zooplankton | DP1.20219.001 | |
The overall goal of this effort is to provide an automated, modular workflow with complete provenance to submit NEON and EDI datasets to GBIF, built in such a way that datasets can be properly updated as new samples are collected and the data are published. The development of such a submission pipeline will provide a standardized process to expose biodiversity data from two continental scale networks, NEON and the U.S. National Science Foundation's Long-term Ecological Research network in GBIF. Further, the modularity of the workflow will allow independent researchers to adapt tools developed in this effort for their data archiving and publishing needs.
National Ecological Observatory Network, US Long Term Ecological Research program, USLTER, FAIR data
Eric R. Sokol
TDWG 2022
The National Ecological Observatory Network is a program sponsored by the National Science Foundation and operated under cooperative agreement by Battelle. This material is based in part upon work supported by the National Science Foundation through the NEON Program.