Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Pieter Huybrechts (pieterhuy@gmail.com)
Received: 17 Aug 2023 | Published: 18 Aug 2023
© 2023 Pieter Huybrechts
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Huybrechts P (2023) Big Data for Beginners. Biodiversity Information Science and Standards 7: e111301. https://doi.org/10.3897/biss.7.111301
|
|
With the increasing amount of datasets being published and made available through global aggregators, such as the Global Biodiversity Information Facility (GBIF), new opportunities have opened to answer research questions that previously could not be considered. Techniques for large scale data integration offer benefits for the biodiversity research community (
However, while these hurdles and bottlenecks are very real, several of them have low cost of entry solutions. The aim of this presentation is to encourage the community to explore ambitious queries, to combine and examine all available data in its totality and to break down specific technical barriers, by providing a practical overview for researchers to maximise the power of large-scale data processing in their work.
While big data processing may seem daunting, tools accessible to users without a background in big data are available for both local workstations and cloud computing services that allow for scalable data processing at low cost, for instance Databricks Community Edition or Apache Arrow. Using these resources, researchers can incorporate larger datasets into existing protocols, and by doing so, uncover patterns and insights that would be otherwise impossible to acquire using smaller subsets of the ever-expanding complex set that biodiversity occurrence data presents.
data integration, biodiversity data
Pieter Huybrechts
TDWG 2023