Skip to main content

Indonesian address parser and geocoder

Project description

id-address

Indonesian Address Parser & Geocoder — parse messy Indonesian addresses into structured components and geocode them to coordinates.

PyPI version Python 3.9+ License: MIT

Why?

Indonesian addresses are chaos. There's no standardized format, and addresses often mix:

  • Street abbreviations (Jl., Jalan, Jln., Gg., Gang, Komp.)
  • RT/RW (unique neighborhood system: RT 05/RW 08)
  • Administrative levels (Kelurahan → Kecamatan → Kota/Kabupaten → Provinsi)
  • Landmarks (Sebelah Indomaret, Depan Masjid)
  • Inconsistent postal codes

Google Maps API is expensive. Existing parsers don't handle Indonesian formats. This library fills that gap by providing a deterministic parser backed by official Kemendagri dataset matching and fuzzy text matching.

Installation

pip install id-address

Quick Start

from id_address import AddressParser, Geocoder

parser = AddressParser()
result = parser.parse("Jl. M.H. Thamrin No.1, RT.02/RW.08, Gelora, Tanah Abang, Jakarta Pusat")

print(f"Street: {result.components.street}")
print(f"Kelurahan: {result.components.kelurahan}")
print(f"Kemendagri Code: {result.components.administrative_code}")

Features

✅ Currently Supported

  • Dataset Integration: Matches parsed components against official Kemendagri codes.
  • Fuzzy Matching: Tolerates typos in Kelurahan/Kecamatan/City names via Levenshtein distance.
  • Robust Parsing: Handles Street prefixes, RT/RW extraction, House numbers, Postal codes.
  • Unicode Normalization: Automatically cleans messy encodings and HTML entities before parsing.
  • CLI Tool: Process massive CSVs directly from the terminal via id-address batch.
  • Geocoding: Abstract BaseGeocoder with a robust NominatimGeocoder implementation (includes exponential backoff).
  • Graceful Failure: Tracks ambiguity inside result.components.parse_warnings.

🚧 Roadmap (v0.3 - v1.0)

  • Enterprise Plugins (Pandas id_address accessor, FastAPI Pydantic validators)
  • Multi-provider geocoding (Google Maps, Here)
  • Support for POI/Landmark based addresses via Overpass API

Usage Examples

Command Line Interface (CLI)

Process thousands of addresses from a CSV file directly from your terminal:

# Parse a single address
id-address parse "Jl. Merdeka No 10, RT 03/RW 05, Menteng"

# Geocode a single address
id-address geocode "Jl. Ahmad Yani, Banjarmasin"

# Batch process a CSV file (must contain an 'address' column)
id-address batch input.csv -o cleaned_addresses.csv

Custom Dataset / Kemendagri Data

By default, the library uses a bundled minimal sample dataset. To use the full Kemendagri dataset or your own custom administrative data, provide a JSON file formatted like so:

[
  {
    "code": "31.71.01.1001",
    "province": "DKI Jakarta",
    "city": "Jakarta Pusat",
    "kecamatan": "Tanah Abang",
    "kelurahan": "Gelora",
    "postal_code": "10270"
  }
]

And load it into the parser:

parser = AddressParser()
parser.load_dataset("path/to/your/custom_dataset.json")

Parse a batch of addresses in Python

addresses = [
    "Jl. Sudirman No. 45, Jakarta Pusat 10220",
    "Gg. Kelinci No.3, Petojo Selatan, Gambir, Jakarta Pusat 10160",
    "Komp. Puri Kencana Blok A12, Kembangan, Jakarta Barat 11610",
]

parser = AddressParser()
results = parser.parse_batch(addresses)

for r in results:
    print(f"{r.formatted} (confidence: {r.confidence})")

Reverse geocoding

from id_address import Geocoder

geocoder = Geocoder()
result = geocoder.reverse_geocode(-6.2088, 106.8229)

if result:
    print(result.formatted)
    # Output: Jalan M.H. Thamrin, Gelora, Tanah Abang, Jakarta Pusat, DKI Jakarta

Geocode with rate limiting

geocoder = Geocoder()
results = geocoder.geocode_batch(parsed_results, delay=1.0)  # 1 sec between requests

API Reference

AddressParser

Method Description
parse(address: str) -> AddressResult Parse single address
parse_batch(addresses: list[str]) -> list[AddressResult] Parse multiple addresses
load_dataset(path: str) Load administrative dataset (optional)

Geocoder

Method Description
geocode(result: AddressResult) -> AddressResult Geocode parsed address in-place
geocode_batch(results: list[AddressResult], delay: float) -> list[AddressResult] Batch geocode with rate limiting
reverse_geocode(lat: float, lon: float) -> Optional[AddressResult] Reverse geocoding

AddressResult

Property Type Description
raw_input str Original address string
components AddressComponents Parsed components
latitude float | None Latitude coordinate
longitude float | None Longitude coordinate
confidence float Confidence score (0.0–1.0)
formatted str Formatted address string
to_dict() dict Dictionary representation

Development

Setup

git clone https://github.com/verrysimatupang99/id-address.git
cd id-address
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"

Run tests

pytest

Lint & format

ruff check id_address/
black id_address/

Data Sources

  • Nominatim/OpenStreetMap: Free geocoding (requires attribution)
  • Kemendagri: Official Indonesian administrative boundaries
  • BPS: Indonesian statistics agency geographic data

License

MIT License — see LICENSE file.

Acknowledgments

Built because every Indonesian developer has suffered through parsing addresses like:

"Jl. K.H. Hasyim Ashari No. 89, RT.07/RW.02, Duri Pulo, Kec. Gambir, Kota Jakarta Pusat, DKI Jakarta 10140 — SEBERANG INDOMARET"

No more. 🇮🇩

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

id_address-0.1.0a1.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

id_address-0.1.0a1-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file id_address-0.1.0a1.tar.gz.

File metadata

  • Download URL: id_address-0.1.0a1.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for id_address-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 05bf3c2f591f242f154c7e67687179852c53baeeaf163e9eb08fc2a858dea5b5
MD5 08b089cbf5d3e3fbd2ae38bc4050f052
BLAKE2b-256 ccb65d9bd1a2ce44c40bcb1950ff2c1c6a8fd1f28e249d21574f88aeb104a40b

See more details on using hashes here.

File details

Details for the file id_address-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: id_address-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for id_address-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 034cb9f8364207ff77a50f641b6c67ce4d04b7e0131ed4906bf923c5958e5098
MD5 aa6a4dc997e011ce51d1de57650f1e64
BLAKE2b-256 3e32d98da0448d207a6d04ca2e6b2d61401b3d5520ea8b77183dd1243e19f36f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page