Skip to main content

Indonesian address parser and geocoder

Project description

id-address

Indonesian Address Parser & Geocoder — parse messy Indonesian addresses into structured components and geocode them to coordinates.

PyPI version Python 3.9+ License: MIT

Why?

Indonesian addresses are chaos. There's no standardized format, and addresses often mix:

  • Street abbreviations (Jl., Jalan, Jln., Gg., Gang, Komp.)
  • RT/RW (unique neighborhood system: RT 05/RW 08)
  • Administrative levels (Kelurahan → Kecamatan → Kota/Kabupaten → Provinsi)
  • Landmarks (Sebelah Indomaret, Depan Masjid)
  • Inconsistent postal codes

Google Maps API is expensive. Existing parsers don't handle Indonesian formats. This library fills that gap by providing a deterministic parser backed by official Kemendagri dataset matching and fuzzy text matching.

Installation

pip install id-address

Quick Start

from id_address import AddressParser, Geocoder

parser = AddressParser()
result = parser.parse("Jl. M.H. Thamrin No.1, RT.02/RW.08, Gelora, Tanah Abang, Jakarta Pusat")

print(f"Street: {result.components.street}")
print(f"Kelurahan: {result.components.kelurahan}")
print(f"Kemendagri Code: {result.components.administrative_code}")

Features

✅ Currently Supported

  • Dataset Integration: Matches parsed components against official Kemendagri codes.
  • Fuzzy Matching: Tolerates typos in Kelurahan/Kecamatan/City names via Levenshtein distance.
  • Robust Parsing: Handles Street prefixes, RT/RW extraction, House numbers, Postal codes.
  • Unicode Normalization: Automatically cleans messy encodings and HTML entities before parsing.
  • CLI Tool: Process massive CSVs directly from the terminal via id-address batch.
  • Geocoding: Abstract BaseGeocoder with a robust NominatimGeocoder implementation (includes exponential backoff).
  • Graceful Failure: Tracks ambiguity inside result.components.parse_warnings.

🚧 Roadmap (v0.3 - v1.0)

  • Enterprise Plugins (Pandas id_address accessor, FastAPI Pydantic validators)
  • Multi-provider geocoding (Google Maps, Here)
  • Support for POI/Landmark based addresses via Overpass API

Usage Examples

Command Line Interface (CLI)

Process thousands of addresses from a CSV file directly from your terminal:

# Parse a single address
id-address parse "Jl. Merdeka No 10, RT 03/RW 05, Menteng"

# Geocode a single address
id-address geocode "Jl. Ahmad Yani, Banjarmasin"

# Batch process a CSV file (must contain an 'address' column)
id-address batch input.csv -o cleaned_addresses.csv

Custom Dataset / Kemendagri Data

By default, the library uses a bundled minimal sample dataset. To use the full Kemendagri dataset or your own custom administrative data, provide a JSON file formatted like so:

[
  {
    "code": "31.71.01.1001",
    "province": "DKI Jakarta",
    "city": "Jakarta Pusat",
    "kecamatan": "Tanah Abang",
    "kelurahan": "Gelora",
    "postal_code": "10270"
  }
]

And load it into the parser:

parser = AddressParser()
parser.load_dataset("path/to/your/custom_dataset.json")

Parse a batch of addresses in Python

addresses = [
    "Jl. Sudirman No. 45, Jakarta Pusat 10220",
    "Gg. Kelinci No.3, Petojo Selatan, Gambir, Jakarta Pusat 10160",
    "Komp. Puri Kencana Blok A12, Kembangan, Jakarta Barat 11610",
]

parser = AddressParser()
results = parser.parse_batch(addresses)

for r in results:
    print(f"{r.formatted} (confidence: {r.confidence})")

Reverse geocoding

from id_address import Geocoder

geocoder = Geocoder()
result = geocoder.reverse_geocode(-6.2088, 106.8229)

if result:
    print(result.formatted)
    # Output: Jalan M.H. Thamrin, Gelora, Tanah Abang, Jakarta Pusat, DKI Jakarta

Geocode with rate limiting

geocoder = Geocoder()
results = geocoder.geocode_batch(parsed_results, delay=1.0)  # 1 sec between requests

API Reference

AddressParser

Method Description
parse(address: str) -> AddressResult Parse single address
parse_batch(addresses: list[str]) -> list[AddressResult] Parse multiple addresses
load_dataset(path: str) Load administrative dataset (optional)

Geocoder

Method Description
geocode(result: AddressResult) -> AddressResult Geocode parsed address in-place
geocode_batch(results: list[AddressResult], delay: float) -> list[AddressResult] Batch geocode with rate limiting
reverse_geocode(lat: float, lon: float) -> Optional[AddressResult] Reverse geocoding

AddressResult

Property Type Description
raw_input str Original address string
components AddressComponents Parsed components
latitude float | None Latitude coordinate
longitude float | None Longitude coordinate
confidence float Confidence score (0.0–1.0)
formatted str Formatted address string
to_dict() dict Dictionary representation

Development

Setup

git clone https://github.com/verrysimatupang99/id-address.git
cd id-address
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"

Run tests

pytest

Lint & format

ruff check id_address/
black id_address/

Data Sources

  • Nominatim/OpenStreetMap: Free geocoding (requires attribution)
  • Kemendagri: Official Indonesian administrative boundaries
  • BPS: Indonesian statistics agency geographic data

License

MIT License — see LICENSE file.

Acknowledgments

Built because every Indonesian developer has suffered through parsing addresses like:

"Jl. K.H. Hasyim Ashari No. 89, RT.07/RW.02, Duri Pulo, Kec. Gambir, Kota Jakarta Pusat, DKI Jakarta 10140 — SEBERANG INDOMARET"

No more. 🇮🇩

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

id_address-0.1.0a2.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

id_address-0.1.0a2-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file id_address-0.1.0a2.tar.gz.

File metadata

  • Download URL: id_address-0.1.0a2.tar.gz
  • Upload date:
  • Size: 18.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for id_address-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 ba555117d53455da33c1d9692bbc9603eb996e0ea226274c434b1307528a706b
MD5 654fc016ae0502785a3714a3bf371b76
BLAKE2b-256 d8380ee6e2eaeb6c36b7a9d2de296111f878678b5c51dce4835a3fe77447a93a

See more details on using hashes here.

File details

Details for the file id_address-0.1.0a2-py3-none-any.whl.

File metadata

  • Download URL: id_address-0.1.0a2-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for id_address-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 5430d776062b3e804d5055ef46e77400cab077495157fd4ca4011f47590bbeb9
MD5 3aa17b7ecdc0e65fe16ce78db8d78f0d
BLAKE2b-256 62e6708300891d32687692a09e854f47def882935e8cce71ea8cc47600301aa8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page