Biodiversity Information Science and Standards : Conference Abstract
|
Corresponding author: Jocelyn Pender (pender.jocelyn@gmail.com)
Received: 27 Jun 2019 | Published: 04 Jul 2019
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation: Pender J (2019) Flora Prepper: Preparing floras for morphological parsing and integration. Biodiversity Information Science and Standards 3: e37743. https://doi.org/10.3897/biss.3.37743
|
The increased availability of digital floras and the application of optical character recognition (OCR) to digitized texts has resulted in exciting opportunities for flora data mining. For example, the software package CharaParser has been developed for the semantic annotation of morphological descriptions from taxonomic treatments (
Here I present a pilot project implementing text mining and NLP approaches to marking-up floras implemented in Python. I will describe the success of the project, and summarize lessons learned, especially in relation to previous flora markup projects. Annotation of existing flora documents is an essential step towards building next-generation floras (i.e., mash-ups and enhanced floras as platforms) and enables automated trait extraction. Building an easy-to-use access point to modern text mining and NLP techniques for botanical literature will allow for more flexible and responsive flora annotation, and is an important step towards realizing botanical data integration goals.
Flora, text mining, natural language processing, Python, annotation
Jocelyn Pender
Biodiversity_Next 2019