Biodiversity Information Science and Standards : Conference Abstract
|
Corresponding author: Joel Sachs (joel.sachs@agr.gc.ca)
Received: 29 Jul 2019 | Published: 08 Aug 2019
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation: Sachs J, Pender J, Lujan-Toro B, Macklin J, Haase P, Malik R (2019) Using Wikidata and Metaphactory to Underpin an Integrated Flora of Canada. Biodiversity Information Science and Standards 3: e38627. https://doi.org/10.3897/biss.3.38627
|
We are using Wikidata and Metaphactory to build an Integrated Flora of Canada (IFC). IFC will be integrated in two senses: First, it will draw on multiple existing flora (e.g. Flora of North America, Flora of Manitoba, etc.) for content. Second, it will be a portal to related resources such as annotations, specimens, literature, and sequence data.
Background
We had success using Semantic Media Wiki (SMW) as the platform for an on-line representation of the Flora of North America (FNA). We used Charaparser (
Migrating to Wikidata/Wikibase
We decided to explore a migration from SMW to Wikibase for three main reasons: simplified workflow; triple level provenance; and sustainability.
Simplified workflow: Our workflow for our FNA-based portal includes Natural Language Processing (NLP) of coarse-grained XML to get the fine-grained XML, transforming this XML for input into SMW, and a custom SMW skin for displaying the data. We consider the coarse-grained XML to be canonical. When it changes (because we find an error, or we improve our NLP), we have to re-run the transformation, and re-load the data, which is time-consuming. Ideally, our presentation would be based on API calls to the data itself, eliminating the need to transform and re-load after every change.
Provenance: Wikidata's provenance model supports having multiple, conflicting assertions for the same character trait, which is something that inevitably happens when floristic data is integrated.
Sustainability: Wikidata has strong support from the Wikimedia Foundation, while SMW is increasingly seen as a legacy system.
Wikibase vs. Wikidata
Wikidata, however, is not a suitable home for the Integrated Flora of Canada. It is built upon a relatively small number of community curated properties, while we have ~4500 properties for the Asteraceae family alone. The model we want to pursue is to use Wikidata for a small group of core properties (e.g. accepted name, parent taxon, etc.), and to use our own instance of Wikibase for the much larger number of specialized morphological properties (e.g. adaxial leaf colour, leaf external texture, etc.) Essentially, we will be running our own Wikidata, over which we would exercise full control.
Metaphactory
Metaphactory is a suite of middleware and front-end interfaces for authoring, managing, and querying knowledge graphs, including mechanisms for faceted search and geospatial visualizations. It is also the software (together with Blazegraph) behind the Wikidata Query Service. Metaphactory provides us with a SPARQL endpoint; a templating mechanism that allows each taxonomic treatment to be rendered via a collection of SPARQL queries; reasoning capabilities (via an underlying graph database) that permit the organization of over 42,000 morphological properties; and a variety of search and discovery tools.
There are a number of ways in which Wikidata and Metaphactory can work together, and we are still exploring questions such as: Will provenance be managed via named graphs, or via the Wikidata snak model?; How will data flow between the two platforms? Etc. We will report on our findings to date, and invite collaboration with related Wikimedia-based projects.
Jocelyn Pender
Biodiversity_Next 2019