Mapping SISAL cave sites from a local Turtle file
About this notebook
This notebook reads a local RDF/Turtle snapshot of the SISALv3 cave site catalogue (305 caves with geographic coordinates), parses it with rdflib, and produces two interactive Leaflet maps:
- a marker map with every cave, colour-coded by archaeological status (plain speleothem site vs. confirmed archaeological cave site), and
- a country-level choropleth that shows how the 305 SISAL caves distribute across the world’s countries.
Everything runs in your browser via Pyodide — no Python installation, no SPARQL endpoint. The TTL file sits next to this page and is loaded into Pyodide’s virtual filesystem at startup; the country boundaries for the second map are fetched on demand from a public CDN.
Data: SISALv3 — Kaushal et al. 2024, Earth Syst. Sci. Data 16, 1933–1963 · database DOI 10.5287/ora-2nanwp4rk. RDF conversion and enrichment (archaeological typing, UNESCO flags, Wikidata cross-links): the GeoScience-FAIRification-LOD repository by the Research Squirrel Engineers. The TTL snapshot used here was produced by plot_sisal_from_csv.py in that repository.
Why this dataset?
SISAL (Speleothem Isotopes Synthesis and AnaLysis) is a global, community-curated compilation of stable isotope records from cave carbonates. Version 3 (Kaushal et al. 2024) ships with a flat CSV of sites that has been converted to Linked Open Data under the geo-lod vocabulary. As a teaching dataset it is compact, globally distributed, and the archaeological enrichment layer (cross-links to Wikidata and UNESCO World Heritage) makes it a nice bridge between palaeoclimate and cultural heritage data.
What you’ll learn
- How to load a static TTL file into Pyodide via Quarto’s
resources:frontmatter and parse it withrdflib - How to write a small SPARQL query against an in-memory
rdflibgraph and turn the bindings into apandasDataFrame - Two complementary cartographic idioms for site-level data: coloured markers (every cave individually, categorical colouring) and a country choropleth (aggregated count per country)
Data-context notes
- Coordinate convention. The TTL stores WKT literals as
<http://www.opengis.net/def/crs/EPSG/0/4326> POINT(lon lat)— note the CRS prefix and thelon lataxis order mandated by GeoSPARQL. Leaflet, by contrast, expects[lat, lon], so we swap the order once on parse. - Archaeological sub-class. A subset of sites is typed as
geolod:ArchaeologicalCaveSitein addition togeolod:Cave. Seven sites are flagged as UNESCO World Heritage viageolod:isUNESCOWorldHeritage. We expose both categorical layers on the marker map. - Counts vs. samples.
geolod:countD18OSamplesandgeolod:countD13CSamplesare per-site observation counts aggregated across all entities (speleothems) at that site. A single cave can host several speleothems; Corchia (site 145) has four. The country choropleth below counts sites, not observations — a country with one very well-sampled cave will therefore look lighter than one with several caves of modest sample size.
Tooling notes
rdflib is not pre-bundled in Pyodide, so we install it via micropip. The map is built with raw Leaflet rather than folium because folium writes HTML files to disk and relies on the full Jinja stack — heavy for a browser runtime. For the country-level aggregation we fetch a small Natural Earth GeoJSON (1:110 m scale, ~270 KB) directly from a CDN and do point-in-polygon counting in plain JavaScript — so no geopandas, no shapely, no pyproj is needed inside Python.
A full local-runtime companion (Jupyter + SPARQLWrapper + folium) exists in the SISAL FAIRification repository, which also produced the TTL snapshot used here.
On first load, your browser downloads the Python runtime (Pyodide, ~10 MB) and fetches the ~160 KB TTL file into Pyodide’s virtual filesystem. The country-choropleth map additionally pulls in a ~270 KB GeoJSON from a CDN — all one-off downloads that the browser caches.
1 Setup, data loading, and SPARQL query
The resources: entry in the frontmatter tells quarto-live to copy sisal_sites.ttl into Pyodide’s VFS, where it is readable as an ordinary file at the notebook’s working directory. One cell handles everything from here to a clean DataFrame: install rdflib, parse the graph, run one SPARQL query for every cave (label, coordinates, sample counts, archaeological flags, and — where available — a Wikidata identifier), and project the bindings into pandas.
2a Marker map — every cave, typed by status
Each cave is plotted as a small circle; colour and layer group encode whether it is a plain speleothem site, an archaeological cave site, or a UNESCO World Heritage property. The layer control doubles as a legend.
2b Country choropleth — sampling intensity per country
The aggregate view assigns each cave to a country (via point-in-polygon against Natural Earth country boundaries) and colours each country by the total number of SISAL caves it contains. Countries without SISAL sites are drawn in a neutral grey; countries with sites get a sequential ramp (yellow → red). The point-in-polygon counting is done client-side in Leaflet — Python only provides the site coordinates, Leaflet fetches the country GeoJSON and handles the rest.
3 Explore
The full DataFrame is in scope — pick any dimension and aggregate or filter. A few starting points:
Part of an Open Educational Resource series on knowledge graphs and linked open data, produced in the context of NFDI4Objects. Data source: Kaushal et al. 2024, SISALv3; RDF conversion: Research-Squirrel-Engineers/GeoScience-FAIRification-LOD.