Biodiversity Information Science and Standards : Conference Abstract
PDF
Conference Abstract
Supporting Biodiversity Dataset Preparation: An Introduction to the TaiBIF Open Data Toolkit
expand article infoMelissa Jean-Yi Liu‡,§, Jhu-Jyun Jhang, Daphne Hoh, Chun-I Chang, Mao-Ning Tuanmu
‡ Taiwan Biodiversity Information Facility, Biodiversity Research Centre, Academia Sinica, Taipei, Taiwan
§ Global Biodiversity Information Facility, Taipei, Taiwan
Open Access

Abstract

The Darwin Core standard (DwC; Darwin Core Task Group 2009) and the Global Biodiversity Information Facility (GBIF) new data model provide a flexible set of biodiversity data fields to accommodate various thematic datasets such as taxonomic checklists, sampling events, eDNA metabarcoding, ecological survey data, etc. However, this flexibility can make it challenging for data providers to get started, often leading to frustration when trying to map their original data fields to DwC terms. Additionally, while data cleaning is crucial for enhancing data quality (Chapman 2005), it requires significant expertise and effort, which may hinder the mobilization of high-quality data. To address these common pain points in data mobilization, the Taiwan Biodiversity Information Facility (TaiBIF) developed the TaiBIF Open Data Toolkit (ODT) by integrating various thematic dataset templates, DwC terms, the Nansen Legacy Excel Template Generator, Excel data editing interfaces (e.g, filter, cell editing, AutoFill), GBIF Data Validator, and the OpenRefine. We combined the strengths of each tool to create a straightforward workflow from data sheet generation (Fig. 1Fig. 2), data validation (Fig. 3), data cleaning to dataset packaging (Fig. 4). We hope the toolkit aligns with the needs of data publishers and facilitates a smoother and more user-friendly process of data management and publishing.

Figure 1.

Step 1 & 2: Choose a dataset type and generate the data template (screenshot from TaiBIF ODT).

Figure 2.

Step 3 & 4: Choose core and extensions to edit data (screenshot from TaiBIF ODT).

Figure 3.

Step 5: Validate the dataset. The records that have issues will be shown in red (screenshot from TaiBIF ODT).

Figure 4.

Step 6: Clean and export the dataset. It provides useful features such as bulk changes, text filters, text facets, etc. (screenshot from TaiBIF ODT).

Keywords

data template, data cleaning, data validation, biodiversity data tool

Presenting author

Jhu-Jyun Jhang

Presented at

SPNHC-TDWG 2024

Acknowledgements

This project is funded by the Taiwan Ecological Network and the Forestry and Nature Conservation Agency, Ministry of Agriculture, Taiwan.

Conflicts of interest

The authors have declared that no competing interests exist.

References

login to comment