Geocoding,
without the stack.
Cairn turns OpenStreetMap, WhosOnFirst, OpenAddresses, and Geonames data into a single on-disk bundle, then serves forward search, autocomplete, fuzzy matching, structured queries, and reverse geocoding from one static Rust binary. No cluster. No cloud. No daemon dependencies.
Stack your first cairn in 30 seconds.
Build a country bundle from one OSM PBF, serve it, query it. No Postgres, no Elasticsearch, no Java. Pure Rust + mmap'd rkyv tile blobs.
# 1. build a Liechtenstein bundle
$ cairn-build build \
--osm liechtenstein-latest.osm.pbf \
--wof whosonfirst-data-admin-li.db \
--out bundle
# 2. serve it (single static binary)
$ cairn-serve --bundle bundle
# 3. query it
$ curl 'localhost:8080/v1/search?q=Vaduz'
{
"query": "Vaduz",
"results": [
{
"name": "Vaduz",
"kind": "city",
"lon": 9.522,
"lat": 47.139,
"label": "Vaduz, Schweiz",
"langs": ["de", "en", "fr"],
"score": 48.45
}
]
}
Production-ready, on the platform you already use.
Cloud deployment artifacts live in
cairn-geocoder/cairn-cloud.
Helm chart published to OCI; Kustomize base + dev/prod overlays;
Terraform modules for AWS Fargate, GCP Cloud Run, and Nomad +
Consul; Grafana dashboard + Prometheus alerting rules wired to
Cairn's /metrics output.
helm install — OCI
helm install cairn oci://ghcr.io/cairn-geocoder/charts/cairn --version 0.1.0. Three bundle sources (image / http / pvc), optional ed25519 signature verify, ServiceMonitor, HPA, PDB.
kubectl apply -k
Flat manifests with dev (NodePort, single replica) + prod overlays (3 replicas, HPA, PDB, Ingress, NetworkPolicy egress lockdown). For operators who don't want Helm.
Read → AWS · TerraformECS Fargate + ALB
Single-region module: cluster, service, task def with bundle-fetch init, ALB on /healthz, App Auto Scaling on CPU 70%, CloudWatch logs, IAM scoped to task.
Read → GCP · TerraformCloud Run + GCS
Cloud Run v2 service with min/max scaling, startup + liveness probes, GCS-backed bundle pull on cold start via metadata-server token. Public or internal-LB ingress.
Read → Nomad · TerraformNomad + Consul
Nomad service job with prestart bundle-fetch task, Docker driver for cairn-serve, Consul service registration with /healthz + /readyz HTTP checks. Kubernetes-free path.
Read → ObservabilityGrafana + Prometheus
Drop-in dashboard JSON (8 panels covering uptime, RPS by endpoint, error rate, reverse-outcome split) plus 7 alerting rules wired to cairn_requests_total + cairn_uptime_seconds.
How Cairn thinks.
Six ideas that explain the design. Read these once and the rest of the docs makes sense.
The bundle
One directory of mmap-ready files. manifest.toml with blake3 hashes anchors integrity. Build once, ship anywhere.
rkyv tile blobs
Place data lives in 16-byte aligned rkyv archives. cairn-serve mmap's them; PIP touches archived bytes directly — 92× faster than geo::Contains on miss.
Tantivy + FST + ngram
BM25 search with prefix-ngram tokenizer for autocomplete, FuzzyTermQuery for typos, and a name_translit field for cross-script ("moskva" → "Москва").
Phonetic + semantic rerank
DoubleMetaphone catches typos (Smyth → Smith). 32-dim character-trigram vectors catch morphology (Vienna → Viennese). Both opt-in, both shipped.
Read → Concept · 05Multi-source dedup
OSM + WoF + OA + Geonames merge into one Place stream. --source-priority flags weight the cross-source dedup; haversine bucket collapses near-duplicates.
Federation
Run cairn-serve --bundles a/,b/,c/ to fan out across continental shards. Search merges by score; reverse merges by finest admin level.
Enrichers + stable identifiers.
Two recent landings layer on top of the v0.2 build pipeline:
augmenters that enrich an existing bundle in
place (no rebuild needed), and a stable global
identifier (gid) on every Place so
bookmarks survive rebuilds and federation hops.
Building footprints
Microsoft Building Footprints (~1.4 B polygons, ODbL) layer
into spatial/buildings/ as per-tile rkyv blobs.
Spatial join stamps building_id on Places whose
centroid lands inside a building bbox. Rooftop-precision
addressing where OSM is sparse.
Wikidata enrichment
Two-pass over the canonical Wikidata dump: collect Q-ids from Place tags, then stream the dump to extract labels (200+ languages), aliases, cross-refs (P1566 GeoNames ID, P300 ISO 3166-2, P901 FIPS, P131 admin parent), and P31 instance-of for kind refinement. Idempotent on re-run.
cairn-build augment --wikidata → v0.4 · NEWStable global identifier
Every Place ships a gid field that survives
bundle rebuilds and federation hops
(osm:way:12345, wof:locality:101748479,
overture:place:08bb…). Bookmark a record once;
resolve it years later via
/v1/place?ids=…. Format is
<source>:<type>:<id> — the
same shape several open-source geocoders converge on, so
existing client code that already deserializes a
gid field works unchanged.
OSM replication
cairn-build replicate-fetch + replicate-apply
walk OSM minutely diffs and rewrite affected place tiles in
place. Stale bundles catch up over multiple runs without
touching the planet PBF; airgap-safe by design.
Recipes from the trail.
Short, complete curl snippets covering the most common queries against a running cairn-serve.
Forward search
$ curl 'localhost:8080/v1/search?q=Vaduz&limit=2'
{
"results": [
{ "place_id": 1099603968001,
"gid": "wof:locality:101748479",
"name": "Vaduz", "kind": "city",
"lon": 9.522, "lat": 47.139 }
]
}
# place_id is bundle-local (rebuilds change it).
# gid is rebuild-stable: bookmark it, resolve via /v1/place.Lookup by gid
# Lookup by stable gid. Mix gids and u64 place_ids freely;
# missing identifiers return empty (not 404).
$ curl 'localhost:8080/v1/place \
?ids=osm:way:12345,wof:locality:101748479'Autocomplete
# Prefix-ngram index. Cheap enough for keystroke-grain UI.
$ curl 'localhost:8080/v1/search?q=Vad&mode=autocomplete'Fuzzy + phonetic
# Edit distance + DoubleMetaphone
$ curl 'localhost:8080/v1/search?q=vaaduz&fuzzy=2'
$ curl 'localhost:8080/v1/search?q=Smyth&phonetic=true'
# Phonetic rescues 99.5% of typos in Cairn's noisy benchmark.Layer + focus bias
$ curl 'localhost:8080/v1/search?q=Vaduz \
&layer=city \
&focus.lat=47.165&focus.lon=9.51 \
&focus.weight=2.0'Structured search
$ curl 'localhost:8080/v1/structured \
?road=Aeulestrasse&city=Vaduz'Reverse geocoding
$ curl 'localhost:8080/v1/reverse \
?lat=47.141&lon=9.523&limit=4'
{
"source": "pip",
"results": [
{ "name": "Vaduz", "level": 1 },
{ "name": "Oberland", "level": 1 },
{ "name": "Liechtenstein", "level": 0 }
]
}The HTTP API at a glance.
Each endpoint shows a one-line plain description plus the technical knobs available. Substitute localhost:8080 for whatever cairn-serve binds.
gid (osm:way:12345) or legacy bundle-local u64. Comma-separated list.
v0.4 · UPDATED
?mode=at with strict ray-cast PIP (default) or ?mode=nearest. Pass ?strict=false for the bbox-only fast path.
v0.3 · NEW
What's in a bundle.
One directory. Everything Cairn needs at runtime, anchored by blake3 hashes in manifest.toml. cairn-build verify recomputes them and refuses any drifted artifact.
bundle/
├── manifest.toml schema, source hashes, blake3
├── manifest.toml.sig ed25519 detached signature (optional)
├── sbom.json CycloneDX 1.5 software + data BoM
├── tiles/<level>/<row>/<col>/<id>.bin rkyv-archived Place blobs
├── index/text/ tantivy segments (mmap'd at runtime)
└── spatial/
├── admin/<level>/<tile>.bin per-tile rkyv AdminLayer
└── points/<tile>.bin per-tile bincode PointLayer
Bring your own data.
Mix and match. OSM alone is enough for a working bundle; the rest fill specific holes.
| Source | Format | Flag | Coverage |
|---|---|---|---|
| OpenStreetMap | *.osm.pbf | --osm | Places, POIs, streets, admin polygons. |
| WhosOnFirst | SQLite per-country | --wof | High-quality admin polygons + multilingual names. |
| OpenAddresses | CSV per-region | --oa | Authoritative addresses where OSM is sparse. |
| Geonames | TSV | --geonames · --postcodes | Populated places + postal codes. |
Switzerland on a Mac.
Same 506 MB OSM PBF, same 6 408-query workload, same Apple Silicon host. Cairn vs Pelias, Nominatim, Photon. Reproducible — benchmarks/ ships every script.
| Engine | Build | Disk | Hot RSS | p50 | p99 | Peak RPS |
|---|---|---|---|---|---|---|
| Cairn | 24 s | 195 MB | 80 MB | 0.51 ms |
0.74 ms |
57 554 |
| Pelias | 3 m 46 s | 3.5 GB | 2.7 GB | 13.76 ms | 57.23 ms | 362 |
| Nominatim | 3 h 13 m | 9.2 GB | 2.4 GB | 9.51 ms | 23.00 ms | 1 109 |
| Photon | 2 m 1 s | 1.3 GB | 2.1 GB | 5.88 ms | 25.18 ms | 2 406 |
Cairn is 31–77× faster at p99, sustains
24–159× higher peak RPS, with
26–34× smaller hot RSS and
6.7–48× smaller disk than each incumbent.
Build wall-clock is 5–483× faster.
Recall on 1 153 noisy queries: ?phonetic=true
lifts hit-rate from 21.9% to 99.0% alone.
Country-scale check — Germany
Same pipeline, 8.6× the input (4.7 GB PBF, ~3 M places vs 506 MB / 520 k for Switzerland). Steady-state numbers on the same Apple Silicon host:
| Metric | Switzerland | Germany | Ratio |
|---|---|---|---|
| Input PBF | 506 MB | 4.7 GB | 8.6× |
| Build wall-clock | 20 s | 487 s | 24× |
| Bundle disk | 212 MB | 1.54 GB | 7.4× |
| Hot serve RSS | 102 MB | 359 MB | 3.5× |
| p50 latency | 0.68 ms | 0.57 ms | 0.84× |
| p99 latency | 1.32 ms | 2.35 ms | 1.8× |
| Peak RPS | 23 477 | 39 664 (c=32) | 1.7× |
Steady-state hot RSS scales sublinearly (3.5× for 8.6× input) — rkyv tiles stay mmap'd; resident set is what's actively touched. p99 stays under 2.4 ms at 3 M places. Peak RPS at c=32 actually exceeds Switzerland's single-bundle peak — concurrent hot paths scale with rayon worker count. Post Phase 6f / 6g + parallel admin assembly + tantivy buffer tuning, DE peak RPS lifted from 8.4 k → 39.7 k and p99 dropped from 3.71 ms → 2.35 ms on the same Apple-Silicon host.
x86_64 reproduction: a manual GitHub Actions
bench workflow runs the same harness on
ubuntu-latest (Intel) so numbers aren't arm64-specific.
Germany — head-to-head with the incumbents
Same 4.7 GB Geofabrik PBF, same Mac, same 10 000-query DE workload. Cairn vs Pelias vs Photon at country scale:
| Engine | Build | Disk | Hot RSS | p50 | p99 | RPS peak |
|---|---|---|---|---|---|---|
| Cairn | 8 m 7 s | 1.54 GB | 359 MB | 0.57 ms | 2.35 ms | 39 664 |
| Pelias | 37 m | 5.28 GB | 5.15 GB | 10.29 ms | 23.16 ms | 506 |
| Photon | 19 m 22 s | 9.10 GB | 2.13 GB | 9.69 ms | 92.94 ms | 1 919 |
| Nominatim | not run — ~28 h projected on arm64 single-thread osm2pgsql | |||||
On 3 M places, Cairn keeps a 10–40× p99 lead, 21–78× peak RPS, and 6–14× smaller hot RSS than each incumbent. Build is 2.4–4.6× faster in wall-clock and produces a 3.4–5.9× smaller on-disk artifact.
Stack with us.
Cairn is built in the open. Issues, PRs, and design discussions all happen on GitHub.
v0.1.0 — "Germany"
First public beta. Pelias drop-in parity (Tier 1) + quality lead (Tier 2) + ops polish (Tier 3) + differentiators (Tier 4) all shipped. Reproducible Switzerland + Germany benchmarks land Cairn 6–43× faster at p99 than every incumbent on the same input. Helm chart + Terraform modules ship in cairn-cloud.
2026 · beta11-crate Rust workspace. Dual MIT / Apache-2.0. CI on every push.
github.com/cairn-geocoder/cairn →9 preset queries plus a free-form composer dispatched live to a hosted Switzerland bundle.
Open the demo →Live CVE pipeline.
The production container image is scanned by Trivy on every push, every PR, and once a day at 06:17 UTC. The table below mirrors the most recent scan; the canonical source is the GitHub Code Scanning tab. Reporting policy and threat model live in SECURITY.md.
| CVE | Severity | Package | Installed | Fixed in | Title |
|---|---|---|---|---|---|
| Loading… | |||||
Triage policy:
CRITICAL ≤ 7 days,
HIGH next release,
MEDIUM routine maintenance.
Anything not listed here is either patched, ignored with documented
reason, or below the MEDIUM reporting threshold.