Samian ware: discovery sites and their Pleiades links

This browser-executable notebook queries the NFDI4Objects Knowledge Graph for Samian-ware discovery sites — places where terra sigillata has been found — and checks which of them are linked to Pleiades, the gazetteer of ancient places. The result is visualised as a two-layer Leaflet map that makes the coverage of Pleiades identifiers geographically visible.

Note

On first load, your browser downloads the Python runtime (Pyodide, ~10 MB). Please allow a moment for it to initialise.

Warning

The NFDI4Objects Knowledge Graph is a research prototype. If this notebook fails to load data with a network error, the endpoint may be temporarily unreachable or may not allow cross-origin browser requests from this page’s domain. The local .ipynb companion is not affected by this and is always a reliable fallback.

About this notebook

Samian ware travelled widely. Finds have been recorded from Britain to North Africa and from the Atlantic coast to the Danube limes. The NFDI4Objects Knowledge Graph represents each find location as a lado:DiscoverySite with a GeoSPARQL geometry. Some — but not all — discovery sites additionally carry a lado:pleiadesID link into the Pleiades gazetteer, giving a stable external identifier for the ancient place.

This notebook fetches all discovery sites together with their (optional) Pleiades link, summarises the coverage as a bar chart, and plots the sites on an interactive map that separates Pleiades-annotated from unannotated finds into two toggleable layers. A companion local notebook, n4okg-samian-discovery-sites.ipynb, runs the same pipeline against the full scientific Python stack (SPARQLWrapper, folium) for readers who prefer a Jupyter environment.

Why this dataset?

The Pleiades-coverage question is a good example of a broader theme in linked open data: optional properties are where the heterogeneity lives. A DiscoverySite with a Pleiades link participates in the wider ancient-world LOD ecosystem; one without is effectively an island. Making that distinction geographically visible — rather than hiding it behind a single percentage figure — turns a provenance question into a research question about which regions are well connected.

What you’ll learn

  • how OPTIONAL in SPARQL behaves in the result set (missing bindings, not null columns)
  • how to compute and visualise LOD coverage as a categorical map
  • how to attach external-gazetteer links directly into marker popups

Data-context notes

  • one row per discovery site; the pleiadesID column is empty when the site has no Pleiades link
  • pleiadesID is filtered to IRIs (isIRI) so that accidental string values are excluded
  • coordinates come from geosparql:hasGeometrygeosparql:asWKT as usual — WKT in POINT(lon lat) form
  • coordinate precision varies: some sites have exact locations, others are geocoded to a modern settlement or province centroid, which means stacked markers at e.g. a city centre are expected and not a bug

Tooling notes

In the browser, SPARQL access goes through pyodide.http.pyfetchSPARQLWrapper is not available in Pyodide. Mapping uses a hand-rolled Leaflet block returned via _repr_html_; with 3 900-plus markers this is much lighter than folium, which would inline the entire marker payload into a generated HTML page.

Step 1 — Define the SPARQL query

OPTIONAL wraps the Pleiades-link pattern, so a site without a link still appears in the result — just without a pleiadesID binding. The FILTER(isIRI(?pleiadesID)) clause defends against non-IRI values that may slip in during cataloguing.

Step 2 — Load the data

A missing OPTIONAL variable is simply absent from the bindings dictionary — we use .get() to turn that into None in the DataFrame. A derived boolean column has_pleiades makes subsequent analysis easy.

Step 3a — Pleiades coverage (sanity check)

A two-bar summary before the map: how many discovery sites participate in the Pleiades cross-link ecosystem, and how many do not. The raw percentage is useful, but the absolute counts matter too — a high percentage on a small dataset is less reassuring than a middling percentage on a large one.

Step 4 — Explore

The df DataFrame stays in scope — modify the cell below to filter, aggregate, or sanity-check the Pleiades coverage in different ways.


Part of an Open Educational Resource series on knowledge graphs and linked open data, produced in the context of NFDI4Objects.