cairn.
v0.1.0 · beta ★ GitHub
Engineering deep-dive

How Cairn works.

Every other geocoder is a stack: parse PBF, push into Postgres, build an Elasticsearch index, layer libpostal on top, glue it together with a Node service, deploy a daemon graveyard. Cairn is one binary reading memory-mapped tile blobs. This page explains why that's enough — and how each layer earns its place.

00

Vocabulary

Performance and indexing terms used throughout the rest of this page. Skim once, then forward; every term below shows up repeatedly in the build / search / reverse / perf sections.

p50 / p95 / p99 latency
Sort every request's response time low-to-high. p50 is the median: half of requests beat it. p99 is the slowest 1 % cut-off — only 1 in 100 requests is worse. We report p99 because real systems are judged by their bad days, not their averages.
RPS (requests per second)
Throughput. Peak RPS is the highest sustained load the server handled before tail latency degrades past target. Reported alongside latency so a number can't lie about one without admitting the other.
RSS (resident set size)
The amount of physical RAM the process is actively using. Smaller is better for fitting on cheap hosts and packing multiple shards onto one box.
mmap
Memory-mapping a file: ask the kernel to expose disk bytes as if they were already in RAM, with the OS paging them in on demand. Reads avoid a copy and benefit from the page cache; cold queries touch disk, hot ones don't.
BM25
The classic relevance score for text search — like TF-IDF but with diminishing returns on term frequency. Tantivy (the search engine library Cairn uses) ranks results with BM25 by default.
PIP (point-in-polygon)
Given a coordinate and a set of polygons, decide which polygons contain the point. Reverse geocoding's core operation: "I'm at lat 47.14, lon 9.52 — what city is that inside?".
k-NN (k nearest neighbors)
Given a query point, return the k closest items. Used as a fallback for reverse when no polygon contains the query coordinate (e.g. you're a kilometre off the coast of Liechtenstein — return the closest few villages).
R*-tree
A spatial index that organizes bounding boxes hierarchically so PIP and k-NN queries skip irrelevant regions instead of scanning every polygon. Used to sort tile envelopes, then to decide which tile blobs to mmap.
tile
A bounded geographic cell. Cairn stores Place data partitioned per Valhalla-style tile (4° at level 0, 1° at level 1, 0.25° at level 2) so a query touches only the tiles its bounding box overlaps — not the whole planet.
rkyv
A Rust serialization format that lays bytes out exactly as the program sees them in memory. Combined with mmap, it lets queries read structured data with zero parsing — the file is the in-memory representation.
blake3
A modern cryptographic hash. The bundle's manifest.toml records a blake3 digest of every tile and index file, so verification is one fast pass over the directory.
Place / AdminFeature / BuildingFootprint
The three on-disk row types. Place = point (city centre, POI, address). AdminFeature = polygon (country, region, district). BuildingFootprint = polygon + height (Microsoft Building Footprints output).
gid / place_id
Two identifiers per Place. place_id is a packed u64 — cheap, but rebuilds change it. gid is a structured string of shape <source>:<type>:<id> (e.g. osm:way:12345) that survives bundle rebuilds. Bookmark the gid; reload via /v1/place?ids=….
01

Build pipeline

From a 4.7 GB Geofabrik PBF to a 1.5 GB on-disk bundle in ~8 minutes on a single laptop. No Postgres, no Elasticsearch, no daemon. Three parallel passes, one tantivy commit, one rkyv serialization, one ed25519 signature.

Stage 1 · Parse PBF

Cairn reads OSM PBF blocks via osmpbf's BlobReader and par_bridge's rayon worker pool. Each PBF blob (~32 KB compressed protobuf) decodes in parallel; the per-blob workers emit independent (Vec<Place>, Counters) tuples that reduce into the global stream.

Why parallel decode matters: at planet scale PBF decompression dominates wall-clock. Block-level parallelism scales nearly linearly with rayon worker count.

Stage 2 · Node-coord cache (Phase 6f)

Way centroids and admin polygons need every referenced node's (lon, lat). The cache strategy decides build RSS:

StrategyStoragePer-entryRSS at planetWhen to pick
inline HashMap<i64, [f64;2]> ~48 B ~430 GB ≤ 5 GB PBF
sorted-vec Vec<(i64, [i32;2])> sorted, binary search ~16 B ~140 GB 5–30 GB PBF
flatnode mmap'd dense [i32;2] array, sparse file 8 B / id slot ~2–4 GB working set ≥ 30 GB PBF

Coord quantization to degrees * 1e7 is lossless at OSM's native precision. Flatnode relies on POSIX sparse-file semantics: set_len(N) reserves a 110 GB logical range without allocating disk pages, and only-written slots back to physical storage. Switzerland's flatnode logical size is 103 GB; real disk usage is 16 GB.

Stage 3 · WayNodes pre-filter (Phase 6g)

Way centroid + admin polygon assembly need way_id → Vec<node_id> for the small subset of ways that are relation members. A pre-pass collects the relevant HashSet<i64>; the way-place pass only registers way_nodes for that set. Switzerland: 6.2 M ways scanned, 790 K registered (87 % drop). At Europe scale the difference is ~50 GB of saved RSS.

Stage 4 · Multipolygon ring assembly (Phase 7a-K)

Admin relations carry outer + inner ways. Cairn stitches endpoints into closed rings, then validates and pairs them into geo::Polygon values. Four invariants are enforced:

Stage 5 · Parallel admin assembly (Phase 7a-pass-2b2)

Admin polygon assembly used to be a sequential ElementReader::for_each walk pinned to a single core. parallel_admin_relations now runs block-level workers with their own per-tile local PlaceId counters; collisions across blocks resolve via the same deterministic renumber pass that the way- and node-place passes use.

DE: pass 2b2 wall-clock ~120 s → ~17 s (~7×). Switzerland: 6 s → 0.9 s.

Stage 6 · Tantivy index + tile blob write

Places stream into a 256 MiB-buffer tantivy::IndexWriter (Phase 7a, was 64 MiB). Larger buffer collapses 113 in-build segments to ~17 after merge — fewer per-query segment iter ops at serve time. writer.commit() is followed by writer.wait_merging_threads() so the blake3 hash-pass that follows sees a stable on-disk segment set.

Place vectors get bucketed into tile coords and serialized to ZSTD-compressed rkyv blobs (one per non-empty (level, tile_id)). Admin layer is rkyv-archived per-tile with mmap'd zero-copy reads at serve time. Point layer stays bincode (rkyv was 68 % larger on String-heavy points; the linear scan doesn't gain from zero-copy).

Stage 7 · Manifest + signature + SBOM

manifest.toml records every artifact's blake3 hash, byte size, and place count. sbom.json emits a CycloneDX 1.5 inventory of every Cargo.lock package (purl + SHA-256) plus every input dataset. Optional cairn-build sign appends an ed25519 detached signature; cairn-build sign-verify fails non-zero on tamper.

03

Reverse pipeline

Default · admin PIP → nearest-K fallback

AdminIndex stores admin polygons in an R*-tree of bounding boxes plus a per-tile rkyv blob. Query path: bbox-prefilter for tiles intersecting the probe, then a geo::Contains point-in-polygon test against each candidate's polygon. Returns finest-first by admin level (city → region → country).

When PIP returns nothing (probe falls outside every admin polygon — ocean, disputed territory, gap in OSM coverage), the fallback walks NearestIndex's R*-tree of place centroids, ranks tile slots by squared distance, and returns the K closest places by haversine.

Phase 7a-H · Context-aware reverse

?context=full returns a structured payload combining four signals in one round-trip — no incumbent ships this:

{
  "lat": 47.142,
  "lon": 9.521,
  "admin": [
    { "name": "Vaduz",        "level": 8, "distance_km": 0.05 },
    { "name": "Liechtenstein","level": 0, "distance_km": 5.20 }
  ],
  "nearest_road":    { "name": "Städtle",        "distance_km": 0.04 },
  "nearest_poi":     { "name": "Vaduz Castle",   "distance_km": 0.42 },
  "nearest_address": { "name": "Städtle 28",     "distance_km": 0.06 }
}

Implementation hooks NearestIndex::nearest_k_filtered(probe, k, |p| classify_kind(&p.kind) == Some(target)) three times — once per class. Slot collection widens proportionally to the predicate's selectivity, so "nearest road" doesn't degenerate when the closest tile slots happen to contain only POIs.

04

Augmenters

v0.3 introduced a second class of pipeline alongside the ground-up build: augmenters rewrite an existing bundle in place with new sources. Same on-disk layout, same manifest, same signature contract — just additional layers and richer Place tags. Two ship today: building footprints (lane A) and Wikidata enrichment (lane I).

Lane A · Microsoft Building Footprints

~1.4 B polygons under ODbL on Source Cooperative. The importer (cairn-import-buildings) reads GeoParquet via parquet + arrow-array's ParquetRecordBatchReaderBuilder and decodes WKB polygon geometries with a hand-rolled cursor. Why hand-rolled: the Place loader's decode_wkb_point only handles type 1; polygons need type 3 (single ring iteration) and type 6 (MultiPolygon recursion into N type-3 sub-polygons). Both paths reject Z/M-flagged inputs (PostGIS 1000+type / 2000+type / 3000+type) outright — the per-vertex reader assumes 16-byte (x, y) records, so silently accepting PolygonZ would mis-align every coordinate after the first.

Per-row output: BuildingFootprint { id, centroid, bbox, outer_ring, height }. Centroid is the area-weighted shoelace centroid of the largest polygon's outer ring (or the bbox center as fallback for degenerate rings). Holes are dropped at parse time — they're < 0.1 % of rows in the wild and doubling on-disk footprint isn't worth the geometry fidelity.

Per-tile BuildingLayer writes mirror the existing AdminLayer rkyv pattern: 16-byte aligned header + rkyv body, magic CRBL. Replication factor is ~1.0 (most buildings are smaller than a tile by orders of magnitude). Runtime BuildingIndex is an R*-tree of tile envelopes with an LRU cache (default 256 entries — smaller than admin/nearest 1024 because building tiles are denser; pinning every loaded tile would balloon RSS at planet scale). Two query modes: at(coord) returns rooftop-containing buildings finest-first by area; nearest_k(coord, k) ranks tile slots by bbox-distance, gathers k * 4 candidates, and re-sorts globally.

Containment is a three-stage filter. Stage 1: tile R*-tree narrows to per-tile candidates. Stage 2: per-feature bbox check (min_lon ≤ x ≤ max_lon, same for y) rejects the dense majority — building bboxes are tight enough that the bbox alone would already be ~70-90 % accurate at urban density. Stage 3: Sunday-style ray-cast point-in-polygon on the closed outer_ring confirms true rooftop containment, catching the 10-30 % of L- / U-shaped footprints where the bbox covers an inner notch the polygon doesn't. The bbox-only fast path is exposed as at_bbox(coord) / ?strict=false for callers that already plan to refine downstream.

Spatial join during cairn-build augment --buildings walks every Place tile, queries the in-memory BuildingIndex for each Place's centroid, and stamps building_id + building_height tags on hits. Idempotent — the apply step dedups (key, value) before insertion, so re-running on the same data leaves Place tiles byte-identical.

Lane I · Wikidata enrichment

The canonical Wikidata JSONL dump is ~120 GB compressed. Streaming it once is cheap; streaming it twice or holding even 1 % of it in memory is not. The augmenter (cairn-augment-wikidata) does a single memory-bounded pass with two stages:

  1. Pass 1 — collect Q-ids. Walk every place tile, scan Place.tags for wikidata=Q… entries, deposit into a HashSet<String>. OSM imports already carry the tag for ~1-3 M Places per planet bundle. RSS stays at ~30-200 MB depending on coverage.
  2. Pass 2 — stream the dump. BufReader with an 8 MB buffer wraps a GzDecoder (or plain File when .json not .json.gz). For each line, a substring pre-filter looks for "id":"Q…" near the front — entire entity bodies are skipped without invoking serde_json when the Q-id isn't in the wanted set. With ~3 M tracked Q-ids out of ~100 M entities total, only ~3 % of lines hit the full parser. Saves ~40 minutes wall-clock on a planet dump.

Distill output per matched entity: labels by language, aliases by language, and a curated set of cross-ref claims (P1566 GeoNames ID, P300 ISO 3166-2, P901 FIPS, P131 admin parent, P31 instance-of). The P31 list is conservatively mapped to PlaceKind via a static table — Q515 → City, Q484170 → County, Q34442 → Street, etc. The table only covers ~25 well-known Q-ids on purpose: Wikidata's P31 graph has > 100 K types and a wider table would let typo-Q-ids reclassify Places.

Apply step rewrites only modified Place tiles, preserving each tile's existing compression scheme (zstd or raw). Aliases land under lang = "<canonical>_alt" so they remain searchable but stay separable from the canonical label for future per-language ranking weights. PlaceKind promotion is one-way: Places already classified more specifically by upstream (e.g. an OSM admin polygon → Country) are trusted; we never demote.

Idempotency contract

Both lanes guarantee a re-run on the same inputs produces a byte-identical bundle. The augmenter dedups labels by (lang, value) and tags by (key, value) before insertion, hashes the per-tile content signature pre/post, and skips the encode_tile + fs::write roundtrip when the signature matches. Re-running cairn-build augment on a fully-augmented bundle is a no-op — same blake3, same byte_size, same signature.

05

Bundle format

Everything Cairn needs to serve queries lives in one directory. No catalog database, no service registry, no runtime config-fetch. The format is content-addressed, verifiable, and reproducible.

<bundle>/
├── manifest.toml       schema_v4, blake3 of every artifact, bundle_id
├── manifest.toml.sig   ed25519 detached signature (optional)
├── sbom.json           CycloneDX 1.5 — Cargo.lock + datasets
├── tiles/
│   └── <level>/<id>.bin    rkyv-archived Place vector + ZSTD
├── index/
│   └── text/           tantivy segments + admin_names sidecar
│                       (gid field indexed for /v1/place lookups)
└── spatial/
    ├── admin/<tile>.rk      rkyv AdminFeature, magic CRAD, mmap zero-copy
    ├── points/<tile>.bin    bincode PlacePoint, linear scan
    └── buildings/<tile>.bin  rkyv ArchivedBuilding, magic CRBL  (v0.3 lane A)
      

Tile blobs are 16-byte aligned with a stable header that carries the rkyv archive size + payload offset. Read path mmaps the file, validates via rkyv::check_archived_root, then hands an Arc<AdminTileArchive> to the per-tile slot. PIP iterates the archived form directly via pip_archived — 128 ns inside-bbox, 2 ns outside-bbox (the bbox prefilter rejects in cache).

Building tiles share the rkyv-archive pattern with a distinct magic (CRBL) so the loader can't confuse them with admin tiles, and a per-feature bbox so the runtime BuildingIndex.at(coord) can prefilter without touching ring vertices for the 99% of buildings whose bboxes miss the query.

Each Place carries a stable global identifier (gid) tag of shape <source>:<type>:<id> (osm:way:12345, wof:locality:101748479, overture:place:08bb…) stamped at import time by the source-specific importer. Sources without an upstream stable id (OpenAddresses, generic CSV) get a deterministic hash of (kind, normalized name, ~100 m centroid) so bookmarks survive rebuilds. The text index registers gid as a STRING (whole-token) field for direct TermQuery lookup via /v1/place?ids=….

Manifest schema is at version 4 as of v0.3 (the bump signals the additive building_tiles field). v3 readers tolerate v4 manifests via #[serde(default)]; any future breaking change bumps the version and ships a reader for both shapes.

06

Differentiators

Federated multi-bundle (Phase 2e)

cairn-serve --bundles a/,b/,c/ loads each bundle independently and wraps the indices in FederatedText / FederatedAdmin / FederatedNearest. Queries fan out, hits merge by score; PIP fans out, sorts finest-first; nearest-K fans out, re-sorts by haversine. Each Hit's admin_names label renders inside its own bundle before merge so Place-ID collisions across bundles don't pollute labels.

Operational pattern: split a planet into continental shards without standing up multiple processes.

Hot reload (Phase 7a-U)

POST /admin/reload swaps to a new bundle in place. AppState's mutable surface is backed by arc_swap::ArcSwapOption so installs are a single atomic store. In-flight requests finish on the previous snapshot they captured at handler entry; new requests pick up the new bundle on next call.

No process restart. No dropped connections. K8s rolling update without rolling.

Time-aware names (Phase 7a-Q)

?valid_at=YYYY filters by OSM start_date / end_date tags. Year-only parse handles plain digits, ISO dates, BC suffix ("44 BC"), and qualifier prefixes ("before 1989", "circa 1850"). Sentinels for unbounded validity. RangeQuery filter joined with the rest of the search clause.

?valid_at=1939 resolves Königsberg; ?valid_at=1980 resolves Kaliningrad. No incumbent ships temporal validity at all.

WASM autocomplete (Phase 4b)

cairn-wasm compiles to a ~250 KB cdylib for wasm32-unknown-unknown. Exposes Autocompleter::new(Vec<u8>) + complete(prefix, limit) → Vec<String> driven by FST prefix iteration. Tantivy is excluded from the WASM path so the bundle stays small.

Use cases: country-bundle splash autocomplete, embedded form widgets, PWA / offline-first apps.

Signed bundles + SBOM (Phase 4c, 4d)

cairn-build sign emits an ed25519 detached signature over manifest.toml; the manifest already carries blake3 of every artifact, so signing the manifest signs the whole bundle. Pure-Rust ed25519-dalek keeps static-musl self-contained.

sbom.json ships a CycloneDX 1.5 inventory. /v1/sbom serves the file with application/vnd.cyclonedx+json so dependency-track / grype / cyclonedx-cli pick it up without sniffing.

Stable global identifier (v0.4)

Two identifiers per Place — both intentional, both useful:

  • place_id: u64 — the bundle-local packed id (level: 3 | tile: 22 | local: 39). Cheap to hash, suitable for in-process maps. Not stable across rebuilds because local is a per-tile insertion counter.
  • gid: String — global identifier of shape <source>:<type>:<id> (osm:way:12345, wof:locality:101748479, overture:place:08bb…). Built at import time from upstream stable IDs; sources without one (OpenAddresses, generic CSV) get a deterministic hash of (kind, name, ~100 m centroid). Survives rebuilds + federation hops.

Stamped on a reserved gid tag at import time, indexed as a tantivy STRING field, surfaced on every Hit and resolvable via /v1/place?ids=…. Format converges with what several open-source geocoders already emit, so existing client code that already deserializes a gid field works unchanged.

Query explain (Phase 4e)

?explain=true populates Hit.explain with bm25, exact_match_boost, population_boost, language_boost, geo_bias, final_score. Each rerank stage records its multiplier in place. Off by default; skip_serializing_if = Option::is_none keeps payloads lean for normal callers.

Useful for debugging ranking surprises and for clients that want to surface "why this hit ranked here". No incumbent ships this.

07

Performance characteristics

Microbench (Apple Silicon, criterion)

OperationMean (ns)Notes
myers/short identical~805-char same-string
myers/short typo~1201 substitution
myers/64-char boundary~250fast-path tail
wagner_fischer/short typo~150O(m·n) reference
trigram_indexed/short~3505-char Latin
trigram_indexed/CJK~800UTF-8 walk + alphanumeric filter
pip_archived/inside~94rkyv zero-copy ring scan
pip_archived/outside-bbox~1.5bbox prefilter rejects in cache
nearest_k=10/4096pts~375 µsR*-tree slot rank + linear

Live numbers in benchmarks/perf-baseline.json; a CI workflow gates regressions beyond +15 % per benchmark.

End-to-end (Switzerland)

MetricCairnPeliasNominatimPhoton
Build wall24 s3 m 46 s3 h 13 m2 m 1 s
Bundle disk195 MB3.5 GB9.2 GB1.3 GB
Hot RSS80 MB2.7 GB2.4 GB2.1 GB
p500.51 ms13.76 ms9.51 ms5.88 ms
p990.74 ms57.23 ms23.00 ms25.18 ms
Peak RPS57 5543621 1092 406

Recall on noisy queries (1 153 typo / fold variants)

ConfigurationHitsRecall
baseline253 / 1 15321.9 %
?fuzzy=1865 / 1 15375.0 %
?phonetic=true1 142 / 1 15399.0 %
?semantic=true253 / 1 15321.9 %
all flags on1 153 / 1 153100.0 %

DoubleMetaphone single-handedly rescues 99 % of typos. Semantic boost is for morphological variants (Vienna → Viennese) and doesn't fire on character-level perturbations. No incumbent geocoder ships a ?phonetic= toggle today.