Wikidata Ogham Sites — Overview

About this notebook

This notebook is a hands-on introduction to working with knowledge graph data in Python, using Wikidata — the largest openly-licensed general-purpose knowledge graph — as a live data source. It retrieves records about Ogham stones (early medieval inscribed monuments found mainly in Ireland and western Britain) via a SPARQL query and visualises their distribution as bar charts.

It is part of an Open Educational Resource (OER) series on knowledge graphs and linked open data, and is designed to stand on its own: you do not need to have read any other notebook in the series to follow along. A local-Python variant of this notebook is available as wikidata-ogham-sites.ipynb (same content, for use with a regular Jupyter/VS Code setup).

Why this dataset?

Ogham stones are a useful teaching dataset because they are:

  • Well-curated on Wikidata, with typed instances, find-spots, and administrative districts linked by dedicated properties.
  • Small enough to fit comfortably in memory and render in a browser, but large enough to yield meaningful aggregations.
  • Rich in structure: each record participates in several relationships (instance-of, find-spot, county), which makes them a good example for both entity-centric and aggregation queries.

The same pipeline — SPARQL query → DataFrame → visualisation — can be reused for any other knowledge-graph dataset, regardless of domain. Companion notebooks in this series apply it to other endpoints and visualisation types.

Tooling notes

Throughout this notebook we use:

  • pyodide.http.pyfetch to query Wikidata. Libraries like SPARQLWrapper or requests cannot run in the browser because they depend on blocking HTTP; in Pyodide we use pyfetch, which is async/await-based. The local .ipynb variant of this notebook uses SPARQLWrapper instead, which is more convenient when you are not constrained to the browser.
  • pandas to hold the results in a tabular form. Once data is in a DataFrame, standard data-science tooling (grouping, filtering, plotting) applies — regardless of the original source being a graph.
  • matplotlib for static bar charts. For the map variant of this dataset, see the companion notebook wikidata-ogham-sites-map-live.qmd, which uses Leaflet for interactive geographic visualisation.
Note

On first load, your browser downloads the Python runtime (Pyodide, ~10 MB). Please allow a moment for it to initialise.

Step 1 — Defining the SPARQL query

The query below asks Wikidata for every item that is an instance of Ogham stone (wd:Q2016147), together with its find-spot (linked via wdt:P189), the county in which the find-spot lies, and — optionally — its coordinate location (wdt:P625).

Two notes on query design that generalise to other knowledge-graph queries:

  • SERVICE wikibase:label is a Wikidata-specific service that returns human-readable labels for every item variable that also has a ?…Label companion in the SELECT clause. It is significantly cheaper than joining rdfs:label manually.
  • OPTIONAL is used for coordinates here because not every stone in Wikidata has them; making them mandatory would drop perfectly valid records. In the map variant of this notebook, we invert this choice and make coordinates mandatory — that is the right call there, but not here.

Step 2 — Loading the data

SPARQL results come back as JSON in a format called bindings: a list of dictionaries, one per solution, where each key maps to a {"type": ..., "value": ...} object. We flatten these into plain records and build a DataFrame. This shape — flat records with consistent keys — is almost always what you want when you plan to plot or aggregate.

TipCommon pitfall: duplicates

A single Ogham stone can appear in multiple rows because a stone may be linked to more than one “find-spot” in Wikidata (e.g. both the original location and a current museum). When computing aggregates, remember to use nunique() or drop_duplicates() where appropriate, rather than len(df).

Step 3a — Visualisation: top two find-spots per county

For each Irish county, we identify the two find-spots with the highest number of associated Ogham stones. This highlights concentration patterns — which is often the first thing a domain expert wants to see when exploring a new dataset.

Step 3b — Visualisation: distribution by county

A simpler aggregation: how many Ogham-stone records does each county have? This kind of plot is the sanity-check step of almost any knowledge-graph query — if one county dominates the counts implausibly, that often signals a data-modelling quirk rather than a real-world pattern.

Step 4 — Exploring the data

The cell below is a free playground. Edit the county_filter value, or write your own aggregations — the DataFrame df is available for the rest of the session.


Part of an Open Educational Resource series on knowledge graphs and linked open data, produced in the context of NFDI4Objects.