Biodiversity Information Science and Standards : Conference Abstract
Conference Abstract
Wikidata and the biodiversity knowledge graph
expand article info Roderic Page
‡ University of Glasgow, Glasgow, United Kingdom
Open Access


This talk explores the role Wikidata (Vrandečić and Krötzsch 2014) might play in the task of assembling biodiversity information into a single, richly annotated and cross linked structure known as the biodiversity knowledge graph (Page 2016). Initially conceived as a language-independent data store of facts derived from the Wikipedia, Wikidata has morphed into a global knowledge graph, complete with a user friendly interface for data entry and a powerful implementation of the SPARQL query language. Wikidata already underpins projects such as Gene Wiki (Burgstaller-Muehlbacher et al. 2016) and Scholia (Nielsen et al. 2017).  Much of the content of Wikispecies is being automatically added to Wikidata, hence many of the entities relevant to biodiversity (such as taxa, taxonomic publications, and taxonomists) well represented in Wikidata, making it even more attractive.

Much of the data relevant to biodiversity is widely scattered in different locations, requiring considerable manual effort to collect and curate. Appeals to the taxonomic community to undertake these tasks have not always met with success. For example, the Global Registry of Biodiversity Repositories (GrBio) was an attempt to create a global list of biodiversity repositories, such as natural history museums and herbaria. An appeal by Schindel et al. (2016) for the taxonomic community to curate this list largely fell on deaf ears, and at the time of writing the GrBio project is moribund. Given that many repositories are housed in institutions that are the subject of articles in Wikipedia, many of these repositories already have entries in Wikidata. Hence, rather than follow the route GrBio took of building a resource and then hoping a community will assemble around that resource, we could go to Wikidata where there is an existing community and build the resource there. An impressive example of the potential for this is WikiCite, which initially had the goal of including in Wikidata every article cited in any of the Wikipedias. Taxonomic articles are highly cited in Wikipedia (Nielsen 2007),  hence already fall within the remit of WikiCite. Hence Wikidata is a candidate for the “bibliography of life” (King et al. 2011), a database of all taxonomic literature.

Another important role Wikidata can play is to define the boundaries of a biodiversity knowledge graph. Entities such as journals, articles, people, museums, and herbaria are often already in Wikidata, hence we can delegate managing that content to the Wikidata community (bolstered by our own contributions), and focus instead on domain-specific entities such as DNA sequences, specimens, etc., or domain specific attributes of those entities if they are already in Wikidata. This means we can avoid the inevitable “mission creep” that bedevils any attempt to link together information from multiple disciplines.

These ideas are explored using examples based on content entirely within Wikidata (including entities such as publications, authorship, and natural history collections), as well as approaches that combine Wikidata with external knowledge graphs such as Ozymandias (Page 2018).


biodiversity knowledge graph, linked data, wikidata, wikicite, bibliography of life

Presenting author

Roderic Page