SPARQLing Archaeology: Knowledge Graphs, Wikidata, and QGIS in Practice
Examples from NFDI4Objects Task Area 2, TRAIL 2.5 and 2.8
A hands-on course introducing archaeologists to querying, analysing, and visualising Linked Open Data with SPARQL, Python, and QGIS, using Wikidata and the NFDI4Objects Knowledge Graph as primary examples.
This Open Educational Resource introduces students to the practical use of archaeological Knowledge Graphs. Through browser-executable Python notebooks, learners query Wikidata and the NFDI4Objects Knowledge Graph with SPARQL, analyse the results, and visualise them on interactive maps. The course also covers fuzzy reasoning over RDF graphs with the Academic Meta Tool (AMT) and the integration of SPARQL queries into QGIS via the SPARQLing Unicorn plugin. No local installation is required — all notebooks run directly in the browser.
Linked Open Data, Wikidata, NFDI4Objects, Knowledge Graph, Jupyter Notebook, QGIS, Vagueness
1 Introduction
How can we find out at which sites in Ireland Ogham stones have been documented? How can the production of Roman Samian ware be mapped across Europe — and how reliable is the attribution of a specific vessel to an individual potter? Which early medieval holy wells cluster around a particular patron saint, and where do the enigmatic Hogback grave-covers of northern Britain actually come from? And where — across the Eurasian continent — have ancient DNA samples been taken, and which laboratory protocols were used on them? Questions of this kind can today be addressed directly to open Knowledge Graphs such as Wikidata or the NFDI4Objects Knowledge Graph. This OER module is a hands-on course introducing archaeologists to querying, analysing, and visualising Linked Open Data with SPARQL, Python, and QGIS, using Wikidata and the NFDI4Objects Knowledge Graph as primary examples. All notebooks run directly in the browser, without any local installation. In addition, fuzzy reasoning with the Academic Meta Tool (AMT) and the integration of SPARQL into QGIS via the SPARQLing Unicorn Plugin are introduced.
Learning Objectives
In this module you will learn (Petersen et al. 2025):
- to understand and articulate the core principles of Linked Data, RDF, and SPARQL;
- to query public Knowledge Graphs (Wikidata, NFDI4Objects KG) with SPARQL and to process the results programmatically in Python;
- to load and analyse local Turtle (TTL) files with
rdflib; - to visualise query results as diagrams and as interactive Leaflet maps;
- to apply fuzzy reasoning over RDF graphs using the Academic Meta Tool (AMT) and to model simple attribution scenarios;
- to integrate SPARQL queries into a GIS workflow via the SPARQLing Unicorn QGIS Plugin.
Prerequisites
For an adequate understanding of this module, the following is assumed:
- basic familiarity with Python (reading and adapting short scripts, working with lists and dictionaries);
- a general understanding of structured data (tables, CSV, JSON);
- a basic grasp of URIs/URLs and of the Web as a data space;
- elementary knowledge of archaeological material culture (Roman Samian ware, early medieval Ogham inscriptions, early medieval religious landscapes and stone monuments, archaeogenetics) — helpful but not essential, as each notebook provides a short contextual introduction.
Learners unfamiliar with the application of Semantic Web technologies to archaeology will find accessible introductions in (Isaksen 2011) and (Thiery 2013), and a comprehensive survey of current practice in (Schmidt et al. 2022).
Relation to the Research Data Lifecycle
This module focuses primarily on the later phases of the research data lifecycle. In the phase enrich and analyse, SPARQL queries and fuzzy reasoning allow existing Knowledge Graphs to be interrogated, aggregated, and combined with local datasets (Tolle et al. 2026). In the phase share, publish, and discover, the module illustrates how openly published Linked Data — both from Wikidata and from domain-specific infrastructures such as the NFDI4Objects KG — becomes discoverable and reusable through standardised interfaces (SPARQL endpoints, Turtle files) (Schmidt et al. 2022, Thiery et al. 2026). Finally, the phase re-use and cite is addressed by demonstrating how persistent identifiers (Wikidata QIDs, Pleiades URIs) support the unambiguous citation of entities across datasets and publications (Thiery, Schubert, et al. 2025).
2 Content
2.1 A Short Primer on Linked Data and Wikidata
Linked Data is a set of principles for publishing structured data on the Web so that it can be interlinked and queried across sources. At its core sits the Resource Description Framework (RDF): every statement is expressed as a triple of the form subject – predicate – object, where each element is identified by an IRI (a Web-scale identifier) or, in the case of objects, a literal value. Collections of such triples form a Knowledge Graph, which can be queried using the SPARQL query language. For a detailed discussion of the uptake of Semantic Web technologies in archaeology, see (Isaksen 2011); an early worked example from the domain of Roman potters’ stamps is given in (Thiery 2013).
Wikidata is the largest openly available general-purpose Knowledge Graph. It is collaboratively edited, multilingual, and exposes a public SPARQL endpoint at https://query.wikidata.org/. For the archaeological and cultural heritage communities, Wikidata serves both as a source of authoritative identifiers — for places, persons, periods, and objects — and as a target for linking domain-specific data. A comprehensive survey of archaeological Linked Open Data practice and its realisation in Wikidata is provided by (Schmidt et al. 2022); concrete interdisciplinary case studies combining Wikidata and a dedicated Wikibase are discussed in (Thiery et al. 2026).
Alongside Wikidata, this module uses the NFDI4Objects Knowledge Graph — a domain-specific Knowledge Graph for material culture and archaeology — as well as small, self-contained TTL files that illustrate how the same techniques apply to local datasets (Thiery, Schenk, et al. 2025).
2.2 How this Module is Organised
The module is divided into three sections, each accessible from the navigation bar above. Every notebook in the first two sections is executable directly in the browser via quarto-live and Pyodide, so no local Python installation is required.
2.3 Jupyter Notebooks: Querying and Analysing Knowledge Graphs
This section contains browser-executable notebooks that demonstrate how to query public SPARQL endpoints, transform the results into Python data structures, and visualise them as diagrams or interactive maps. The notebooks progress from Wikidata, via the NFDI4Objects Knowledge Graph, to a fully self-contained local TTL file — so that you can see how the same patterns apply across different data sources and scales. The Ogham case study is described in detail in (Thiery and Thiery 2023) and (Thiery 2022); the Samian ware case study builds on (Mees and Thiery 2026); the Campanian Ignimbrite example is drawn from (Thiery et al. 2026). Three further case studies extend the scope: two Wikidata WikiProjects on early medieval religious landscapes — HolyWells in Ireland and the UK, and Hogback stones in northern Britain — illustrate how small, community-curated datasets can be analysed along both geographic and semantic axes (dedications, sites, materiality); and the Poseidon archaeogenetic archive, integrated into the NFDI4Objects KG via the ArNO ontology, shows how a natural-science dataset can be queried through the same SPARQL workflow as the cultural-heritage examples.
Each notebook is available in two variants: a browser-executable Quarto-Live document (runs in-page via Pyodide, no installation required) and a classical Jupyter notebook for local execution with the full scientific Python stack. The two variants share the same SPARQL queries, data schema, and visualisations; the only differences are runtime-specific library choices (e.g. pyfetch vs. SPARQLWrapper, custom Leaflet vs. folium). Use the Quarto-Live version for a zero-install first read-through and the Jupyter version when you want to modify queries, reuse the DataFrames in your own analyses, or work offline.
Available notebooks:
- Wikidata Ogham Site Diagrams (Jupyter) — a first SPARQL query against Wikidata, returning Ogham stone sites and plotting them as a diagram.
- Wikidata Ogham Sites Leaflet Map (Jupyter) — the same data, visualised as an interactive Leaflet map.
- Wikidata Holy Wells Leaflet Map (Jupyter) — mapping holy wells in Ireland and the UK from the Wikidata WikiProject HolyWells, with a complementary hex-binned density grid and a completeness analysis of Commons images and OSM links.
- Wikidata Holy Wells Dedications (Jupyter) — the same holy-wells dataset joined to the patron saints’ gender, split into side-by-side maps and two per-saint maps colour-coded by dedication.
- Wikidata Hogback Stones Leaflet Map (Jupyter) — early medieval hogback grave-covers from northern Britain, recovered from the Wikidata WikiProject Hogback; demonstrates label-driven site extraction and a coarse hex grid that reproduces Williams’ “core area” observation.
- NFDI4Objects KG Ogham Sites Diagrams (Jupyter) — aggregating Ogham sites by county from the NFDI4Objects KG.
- NFDI4Objects KG Ogham Sites Leaflet Map (Jupyter) — mapping NFDI4Objects data interactively.
- NFDI4Objects KG Samian Production Centres (Jupyter) — querying Roman Samian ware production sites.
- NFDI4Objects KG Samian Discovery Sites & Pleiades (Jupyter) — linking discovery sites to the Pleiades gazetteer.
- NFDI4Objects KG Samian Potter Activity (Jupyter) — exploring potter activity in the Samian ware dataset.
- NFDI4Objects KG Poseidon aDNA Sites Map (Jupyter) — archaeogenetic samples from the Poseidon Community Archive integrated via the ArNO ontology; a multi-hop SPARQL query traverses
aDNASample → DiscoverySite → Site → Place → Country, rendered as a country-coloured marker map and a log-scaled hex density map. - NFDI4Objects KG Poseidon Country × Capture Type (Jupyter) — the same dataset cross-tabulated by country and laboratory capture type (1240K, Shotgun, TwistArchaic, HumanOrigins), visualised as a stacked bar chart and a log-scaled heatmap; shows how a single tidy SPARQL result can drive complementary methodological views.
- Local TTL File: Campanian Ignimbrite Sites (Jupyter) — loading a local Turtle file into the browser and querying it with
rdflib. - Local TTL File: SISAL Cave Sites (Jupyter) — mapping 305 SISAL speleothem cave sites from a local Turtle file, with a marker map (archaeological status, UNESCO World Heritage, Wikidata links) and a country-level choropleth built via client-side point-in-polygon against Natural Earth boundaries. Data: Kaushal et al. 2024, SISALv3; RDF conversion from Research-Squirrel-Engineers/GeoScience-FAIRification-LOD.
- Local TTL File: SISAL Corchia Isotopes (Jupyter) — δ¹⁸O and δ¹³C speleothem records from Antro del Corchia (SISAL site 145, Apuan Alps), plotted against calendar age with Marine Isotope Stage bands. Demonstrates SPARQL-over-rdflib with pre-computed Savitzky–Golay smoothed values carried in the RDF graph.
2.4 Jupyter Notebooks: AMT
The Academic Meta Tool (AMT) is a lightweight framework for fuzzy reasoning over RDF graphs. It allows the definition of axioms such as RoleChain, Inverse, and Disjoint, and their evaluation under different fuzzy logics (Łukasiewicz, Product, Gödel). This is particularly useful in archaeology, where classifications, attributions, and identifications are rarely crisp. The modelling rationale behind AMT — and its application to Samian ware within the NFDI4Objects consortium — is discussed in (Tolle et al. 2026).
The notebook in this section walks through a concrete worked example:
- Academic Meta Tool — Potter Attribution Example — a fuzzy reasoning scenario on attributing Samian ware to a specific potter, using the
PotterAttributionExample.ttldataset. The notebook also demonstrates graph visualisation withpyvisand export to TTL and Cypher.
2.5 SPARQLing Unicorn QGIS Plugin
The SPARQLing Unicorn QGIS Plugin brings SPARQL queries directly into the QGIS desktop environment, so that results from Knowledge Graphs can be geocoded and styled like any other GIS layer. The design rationale and the broader tooling ecosystem of the Research Squirrel Engineers Network are described in (Thiery, Schubert, et al. 2025) and (Thiery, Schenk, et al. 2025). This section collects ready-to-use example queries that complement the browser notebooks above.
- QGIS Plugin: SPARQL query examples — a curated set of queries for use with the plugin.
2.5.1 Install the SPARQLing Unicorn QGIS Plugin
2.5.2 Query Triplestores
3 Literature
The works cited throughout this module are listed below with a brief note on what each contribution offers. Full bibliographic details are rendered in the automatically generated references section at the end of this document.
- (Isaksen 2011) — A PhD thesis that surveys the early uptake of Semantic Web technologies in archaeology and distinguishes two competing visions: Mixed-Source Knowledge Representation (MSKR) and Linked Open Data (LOD). Recommended as a conceptual entry point for understanding why archaeological Linked Data looks the way it does today.
- (Thiery 2013) — An early German-language case study on applying Linked Data to Roman potters’ stamps; useful as a concrete, small-scale example of modelling archaeological evidence as RDF.
- (Schmidt et al. 2022) — A broad survey of Linked Open Data practice in archaeology, with a particular focus on how Wikidata is used as a hub for archaeological identifiers. The recommended starting point for readers new to the field.
- (Thiery 2022) — A methodological paper on publishing Irish Ogham stones as Linked Open Data, including the SPARQL queries that underpin the Ogham notebooks in this course.
- (Thiery and Thiery 2023) — A follow-up paper to (Thiery 2022) that discusses how heterogeneous Ogham datasets can be interlinked across repositories; directly relevant to the NFDI4Objects Ogham notebooks.
- (Thiery, Schenk, et al. 2025) — Presents the Research Squirrel Engineers Network and its FAIRification tooling for archaeology and the geosciences, including the SPARQLing Unicorn Toolkit; sets the context for the QGIS section of this course.
- (Thiery, Schubert, et al. 2025) — A best-practice case study on applying FAIR4RS principles to computational archaeology, with concrete examples from Ogham stones and Campanian Ignimbrite sites; useful for understanding the reproducibility framework behind this OER.
- (Thiery et al. 2026) — An interdisciplinary paper showing how Wikidata and a dedicated Wikibase can be combined to model archaeological, archaeometric, and volcanological data together; the data source for the Campanian Ignimbrite notebook.
- (Mees and Thiery 2026) — A paper on using Wikidata as a linking hub for Roman Samian ware data across multiple archaeological databases; the conceptual backbone of the Samian ware notebooks.
- Straten, M. thor, Strohm, S., Thiery, F. & Renz, M. (2025). Data-Driven Community Standards for Interdisciplinary Heterogeneous Information Networks. E-Science-Tage 2025. Heidelberg: heiBOOKS. doi:10.11588/heibooks.1652.c23914 — Introduces the hybrid ArNO ontology for archaeo-natural research data developed in NFDI4Objects Task Area 3; the schema-level foundation of the Poseidon notebooks in this course.
- Schmid, C., Ghalichi, A., Lamnidis, T. C., Mudiyanselage, D. B. A., Haak, W. & Schiffels, S. (2024). Poseidon — A framework for archaeogenetic human genotype data management. eLife 13. doi:10.7554/eLife.98317.1 — Describes the Poseidon framework itself; the data source for the Poseidon Community Archive that is loaded into the NFDI4Objects KG as
collection/17. - (Tolle et al. 2026) — Introduces the Academic Meta Tool (AMT) and its fuzzy modelling approach for ambiguous archaeological relations, with Samian ware as a worked example; essential reading for the AMT section of this course.
- (Petersen et al. 2025) — The reference learning-objectives matrix for Research Data Management in Germany (Version 3); used here to anchor the module’s learning objectives within a broader didactic framework.
Further Sources and Information
For readers who wish to explore the topics of this module further:
- Linked Data in archaeology — a foundational discussion is provided by (Isaksen 2011), and a recent survey of community practice by (Schmidt et al. 2022).
- Wikidata and Wikibase for the humanities — interdisciplinary case studies are presented in (Thiery et al. 2026) and, for Samian ware specifically, in (Mees and Thiery 2026).
- Uncertainty and fuzzy modelling — the theoretical grounding for AMT and related approaches is discussed in (Tolle et al. 2026).
- Archaeogenetics and knowledge graphs — the integration of ancient DNA data into a CIDOC-CRM-compatible infrastructure via the hybrid ArNO ontology is documented in Straten, Strohm, Thiery & Renz (2025) doi:10.11588/heibooks.1652.c23914; the underlying Poseidon framework is described in Schmid et al. (2024) doi:10.7554/eLife.98317.1; the RDF generation pipeline is at https://github.com/archaeonatural-cloud/poseidon2lod.
- Research Software Engineering for archaeology — the tooling ecosystem underlying this OER, including the SPARQLing Unicorn Toolkit, is documented in (Thiery, Schubert, et al. 2025) and (Thiery, Schenk, et al. 2025).
- Quarto — the publishing system used for this OER: https://quarto.org/docs/guide/.
- quarto-live — the extension enabling browser-executable Python: https://r-wasm.github.io/quarto-live/.
- NFDI4Objects — the research data infrastructure context of this module: https://www.nfdi4objects.net/.
Re-Use
🔗 Quarto source code of this module
This module uses the OER Template of NFDI4Objects
Disclaimer: Use of LLM
This OER module and its accompanying notebooks were drafted with the assistance of Claude (Anthropic, https://claude.ai/). The AI was used in particular for drafting prose, converting Jupyter notebooks to browser-executable Quarto documents, porting the Academic Meta Tool to Python, and structuring the didactic material. All content, code, and SPARQL queries have been reviewed, executed end-to-end, and validated by the author before publication. Remaining errors are the author’s sole responsibility.
The module is published with a licence but without warranty: it is provided “as is”, without any express or implied guarantee of correctness, completeness, or fitness for a particular purpose. See the licence statement below for details.
Bibliography
License
Authors
Citation
@article{thiery2026,
author = {Thiery, Florian},
publisher = {Research Squirrel Engineers Network},
title = {SPARQLing {Archaeology:} {Knowledge} {Graphs,} {Wikidata,}
and {QGIS} in {Practice}},
journal = {Squirrel Papers},
volume = {8},
number = {6},
pages = {§1},
date = {2026},
url = {https://n4o-rse.github.io/oer-001-sparqling-archaeology/},
doi = {10.5281/zenodo.19650452},
langid = {en},
abstract = {This Open Educational Resource introduces students to the
practical use of archaeological Knowledge Graphs. Through
browser-executable Python notebooks, learners query Wikidata and the
NFDI4Objects Knowledge Graph with SPARQL, analyse the results, and
visualise them on interactive maps. The course also covers fuzzy
reasoning over RDF graphs with the Academic Meta Tool (AMT) and the
integration of SPARQL queries into QGIS via the SPARQLing Unicorn
plugin. No local installation is required — all notebooks run
directly in the browser.}
}



