Forward
Address or place name → coordinates + canonical record.
"Vaduz Castle" → (47.142, 9.523),
country = Liechtenstein, kind = castle.
Every map app, every delivery, every emergency dispatch, every property listing, every voter registration starts with the same question: "Where is this place?" Today most teams answer it by sending the address to Google's, Mapbox's, or HERE's servers — and accepting the cost, the lock-in, and the data exposure that comes with it. Cairn is what you reach for when you'd rather not.
Geocoding is the translation layer between human-readable place descriptions ("Place de la Concorde, Paris", "8044 Zürich", "Heidelberg University Library") and machine-friendly coordinates (latitude / longitude, polygon shapes, postal codes). It runs in five flavors:
Address or place name → coordinates + canonical record.
"Vaduz Castle" → (47.142, 9.523),
country = Liechtenstein, kind = castle.
Coordinates → containing admin chain + nearest features.
(47.142, 9.523) → "Vaduz, Oberland,
Liechtenstein" + nearest road, POI, address.
Partial input → ranked suggestions, keystroke-grain. The search box on every shipping form, every map UI, every store-locator page.
Field-by-field ("road = X, city = Y, country = Z") → best match. Used when the upstream system already separates components — payment forms, CRM imports, government records.
Free-text address → structured components
(house_number, road,
postcode, city). Pre-step for
normalization, deduplication, validation.
Sounds boring. Powers everything from Uber's pickup field to a national census's address index. A geocoder that's slow, wrong, or unavailable is a geocoder that quietly costs the rest of the product roadmap.
Geocoding is invisible until it isn't. These are the moments where the choice between "send the address to Google" and "run the geocoder we own" stops being a tooling preference and starts being a strategic decision.
Last-mile cost is dominated by stop-density × per-stop time. A geocoder that returns the wrong rooftop or the closest crossroads instead of the building entrance burns minutes per stop and fuels customer-support tickets. At fleet scale, 10 ms × 200 K calls/day = 33 minutes of hot-path latency per dispatcher.
911 / 112 dispatch needs a reverse geocode in under a second on a constrained network, against a vetted authoritative dataset (national address registry, not "whatever Google has"). Air-gapped or sovereignty-bounded networks can't call the public cloud. Cairn ships a single static binary that runs on the dispatch box.
Patient addresses, claim locations, provider directories — all subject to data-residency rules (HIPAA, GDPR, schrems II, sectoral). Sending an address to a US-hosted API for resolution can be the breach event. Self-hosted geocoding keeps PHI on-premise.
Listings, comparables, parcel boundaries, school catchments, flood zones. Every overlay needs a coordinate. Pricing models depend on consistent geocoding across millions of historical records — outsourced APIs change ranking and coverage without notice, breaking time-series analysis.
Coverage maps, outage notifications, service requests, smart meter assignment. Operators run their own GIS — the geocoder needs to integrate with internal datasets (cell towers, fiber routes, substation polygons) that no public API has ever heard of.
Store locators, BOPIS pickup, curbside delivery. Every "find a store near me" is a forward + reverse geocode pair. High-traffic days scale per-call billing into surprise invoices that the CFO didn't sign off on.
KYC address validation, AML location screening, fraud scoring, branch routing, mortgage underwriting (flood / fire / earthquake risk by parcel). Auditable, reproducible geocoding is a regulatory requirement — auditors want to know which version of which database returned which answer.
Refugee camp logistics, disaster response, public-health outreach. Often offline-first by necessity — bandwidth is scarce, the cloud is down, OSM is the only ground truth. A geocoder that runs on a rugged laptop in a tent is the difference between a delivered shipment and a stuck convoy.
Reproducibility demands a frozen geocoder version. A paper that says "we used Google's API in 2023" can't be replicated in 2026 because Google's index moved on. A versioned, content-hashed bundle pins the exact answers.
Google's geocoding API is $5 / 1 000 calls. Mapbox is $0.75 / 1 000. Sounds small until autocomplete drives 200 calls per filled form. A retail Black Friday spike, a viral campaign, an integration bug, a misconfigured retry loop, and the bill finds you.
Once a thousand integrations call maps.googleapis.com,
switching is a re-architecture. The vendor knows. Pricing
increases 30 % year-over-year and you have ninety days to
comply.
Every query — addresses of customers, patients, refugees, voters, suspects — gets sent over TLS to a third-party log you don't control. Subpoenas, breaches, government requests, internal misuse. "Just metadata" is a useful fiction until it isn't.
Even on a perfect day, a transatlantic round-trip is ~120 ms before the API does any work. Inside a datacenter with a local geocoder, p99 is microseconds. The difference shows up in checkout abandonment, dispatcher reflex time, and conversion funnels.
Hosted APIs ship continuous updates. The geocode you got today is not necessarily the one you'll get in six months, and the vendor offers no version pin. For audit trails, scientific replication, and regulator submissions, that's disqualifying.
Google's index doesn't know about your internal sites, non-public utility infrastructure, custom address aliases, or yet-to-be-published street names. Adding them is impossible.
One file, one config flag, one bundle directory. Runs on a Mac, a Hetzner box, an AWS Fargate task, a k3s pod, a rugged ARM laptop. No Postgres, no Elasticsearch, no Java, no Node.
The bundle is a directory you pre-build and copy in. No outbound network calls at runtime. Required for sovereign networks, classified environments, ships, embassies, disaster zones.
Switzerland: p99 = 0.74 ms, peak
57 554 RPS, hot RSS 80 MB. Every
query is mmap'd rkyv tile reads + tantivy term lookups —
no daemon round-trip, no DB query plan.
Every bundle ships with a manifest.toml
carrying blake3 of every artifact, an ed25519 detached
signature, and a CycloneDX 1.5 SBOM. Pin a bundle ID in
your audit trail; reproduce the exact answer years later.
POST /admin/reload swaps to a new bundle
without a process restart. Atomic ArcSwap-backed indices,
in-flight queries finish on the previous snapshot. K8s
rolling updates without rolling.
cairn-serve --bundles a/,b/,c/ shards a
planet across continents without standing up multiple
processes. Queries fan out, merge by score; PIP fans
out, merges finest-first.
Bundles ingest OSM + WhosOnFirst + OpenAddresses +
Geonames out of the box. Add your own CSV (internal
facilities, non-public infrastructure, custom aliases) and
re-build — it's one cairn-build build away.
?valid_at=YYYY filters by OSM
start_date / end_date. Historical
geocoding for archives, journalism, longitudinal research.
No incumbent ships this.
Standard REST shape: /v1/search,
/v1/reverse, /v1/structured,
?text=, ?focus.point.lon. Each
endpoint returns a clean JSON envelope you can hit with
curl, Postman, or any HTTP client. Teams already running
an open-source geocoder typically migrate with a DNS swap.
Every Place ships a global identifier (gid)
of shape <source>:<type>:<id>
(osm:way:12345,
wof:locality:101748479) that survives bundle
rebuilds and federation hops. Save a record once;
reload it years later via
/v1/place?ids=… — same identifier, same
answer. Audit trails, share links, and cross-system
joins keep working across releases.
Microsoft Building Footprints layered alongside addresses give rooftop-level resolution where OSM is sparse (~1.4 B polygons globally). Reverse-geocode a coordinate and return the actual building it sits inside, not just the nearest address point.
| If you are… | Cairn is the right fit when… |
|---|---|
| Government / EU institution | Data sovereignty + air-gap + reproducibility + auditable supply chain are non-negotiable. |
| Bank / insurer | KYC, AML, claims, and risk pipelines need on-premise geocoding under regulatory audit (BCBS 239, Solvency II, GDPR Art. 32). |
| Healthcare provider | PHI may not transit a third-party API; provider directories + patient outreach need accurate, local- authority geocodes. |
| Logistics / fleet operator | Per-call API costs are eating margin and you've already maxed out caching. |
| Telecom / utility | Geocoder must merge with internal GIS (cell towers, fiber, substations) that public APIs don't know about. |
| Aid / humanitarian org | Operating offline-first in disaster zones with OSM as the ground truth. |
| Researcher | Need a frozen, cite-able geocoder version that returns the same answer in five years. |
| SaaS engineering team | Auto-completion + checkout flow latency matters and the API bill is climbing. |
| OSS / hobby / indie | Just want a good geocoder that runs on your laptop. |
One cargo build --release or one
docker run. Country bundle ready in 30 s.
How the build pipeline, search pipeline, reverse pipeline, and bundle format actually work — for engineers evaluating Cairn against Pelias / Photon / Nominatim.
→ Deep-diveArchitecture, runtime deps, on-disk format, license, measured query latency. Honest read on trade-offs.
→ Comparison matrix
Real cairn-serve on Hetzner k3s. Liechtenstein
bundle. Click through the preset queries or compose your
own.
11-crate Rust workspace. Dual MIT / Apache-2.0. Issues + PRs welcome.
→ GitHubHelm chart, Kustomize overlays, Terraform modules for AWS / GCP / Nomad. Grafana dashboards + Prometheus alerts.
→ cairn-cloud