Indian place name identifier with SymSpell-style fuzzy lookup and OCR word segmentation
Project description
Indic Places Library
Indian place-name lookup, fuzzy matching, OCR address spacing, and merged-word segmentation for Python.
This package is designed for public use in Indian address processing, OCR cleanup, place-name identification, and document extraction workflows.
Install from PyPI
Install the latest version:
pip install --upgrade indic-places
Clean reinstall without cache:
python -m pip install --no-cache-dir --upgrade --force-reinstall indic-places
Install a specific version:
python -m pip install indic-places==1.1.5
Add to requirements.txt:
indic-places>=1.1.5
Import
The PyPI package name is:
indic-places
The Python import is:
from indic_places import IndicPlaces
Data Stats
| Metric | Count |
|---|---|
| Structured GeoNames + postal records | 815,477 |
| Unique place names | 817,641 |
| Runtime OCR/custom place aliases | 652,331 |
| Coverage | India-wide |
Quick Usage
from indic_places import IndicPlaces
ip = IndicPlaces()
address = "PILASSERYADIVARAMPUTHUPPADIADIVARAM PUDUPADIKATTIPARAADIVARAM THAMARASSERYKOZHIKODE - 673586"
print(ip.normalize_address_spacing(address))
Expected output:
PILASSERY ADIVARAM PUTHUPPADI ADIVARAM PUDUPADI KATTIPARA ADIVARAM THAMARASSERY KOZHIKODE - 673586
What This Library Solves
OCR often returns Indian addresses as merged text:
PILASSERYADIVARAMPUTHUPPADIADIVARAM
KUNNUMPURATHHOUSEKALLARAP.O
THAMARASSERYKOZHIKODE
indic-places uses Indian place vocabulary and address terms to safely space merged OCR tokens.
Main Features
- Indian place-name lookup
- OCR merged-address spacing
- SymSpell-style fuzzy lookup
- Word segmentation
- Place extraction from text
- India-wide GeoNames and postal vocabulary
- Runtime OCR/custom place aliases from
indic_places/data/custom_places.txt
Use in Any Python Project
1. Install
python -m pip install --upgrade indic-places
2. Create the Engine Once
from indic_places import IndicPlaces
place_engine = IndicPlaces()
3. Normalize an OCR Address
raw_address = "PILASSERYADIVARAMPUTHUPPADIADIVARAM PUDUPADIKATTIPARAADIVARAM THAMARASSERYKOZHIKODE - 673586"
clean_address = place_engine.normalize_address_spacing(raw_address)
print(clean_address)
Output:
PILASSERY ADIVARAM PUTHUPPADI ADIVARAM PUDUPADI KATTIPARA ADIVARAM THAMARASSERY KOZHIKODE - 673586
Recommended Integration Pattern for OCR Pipelines
Use the library at the final address-cleanup stage, after your extraction logic has already identified the address candidate.
from indic_places import IndicPlaces
_PLACE_ENGINE = None
def get_place_engine():
global _PLACE_ENGINE
if _PLACE_ENGINE is None:
_PLACE_ENGINE = IndicPlaces()
return _PLACE_ENGINE
def normalize_address(address: str) -> str:
address = " ".join(str(address or "").split()).strip(" ,:-|")
if not address:
return ""
engine = get_place_engine()
return engine.normalize_address_spacing(address)
Use with an Existing Extractor Function
If your project has a final address cleanup function, call normalize_address_spacing() inside that final cleanup function.
from indic_places import IndicPlaces
_PLACE_ENGINE = IndicPlaces()
def finalize_address(address: str) -> str:
address = " ".join(str(address or "").split()).strip(" ,:-|")
if not address:
return ""
address = _PLACE_ENGINE.normalize_address_spacing(address)
return " ".join(address.split()).strip(" ,:-|")
If your extraction function stores the best address candidate before returning, normalize before storing the final value.
def evaluate_and_store_address(candidate: str):
candidate = finalize_address(candidate)
if not candidate:
return False
# Store candidate in your output dictionary/model.
return True
Lookup Places
from indic_places import IndicPlaces
ip = IndicPlaces()
results = ip.lookup("Bangalor", top_n=5)
for r in results:
print(r.name, r.state, r.district, r.pincode, r.score)
Extract Places from Text
from indic_places import IndicPlaces
ip = IndicPlaces()
text = "PONMINISSERY HOUSE PERAMBRA THRISSUR 680689"
places = ip.extract_places(text)
for p in places:
print(p.name, p.state, p.district, p.pincode)
Word Segmentation
from indic_places import IndicPlaces
ip = IndicPlaces()
result = ip.segment("iliveinmumbaiorkerala")
print(result.segmented)
print(result.score)
CLI Usage
Show stats:
indic-places stats
Lookup:
indic-places lookup Bangalor
Segment text:
indic-places segment iliveinmumbaiorkerala
Extract places:
indic-places extract "PONMINISSERY HOUSE PERAMBRA THRISSUR 680689"
Recommended Pattern for Large OCR Pipelines
Bad:
def clean_address(address):
ip = IndicPlaces()
return ip.normalize_address_spacing(address)
Good:
from indic_places import IndicPlaces
_PLACE_ENGINE = IndicPlaces()
def clean_address(address):
return _PLACE_ENGINE.normalize_address_spacing(address)
This loads the place index once and is better for large document batches.
Data Files
The package includes runtime data files:
indic_places/data/address_terms.txt
indic_places/data/custom_places.txt
indic_places/data/places_index.json.gz
The repository also contains supporting/reference data:
data/unique_place_names.txt
data/geonames_india_places_full.csv.gz
data/by_state_geonames/
Data Sources and Attribution
This package includes place-name vocabulary derived from open geographical datasets, including GeoNames India gazetteer and postal data.
GeoNames data is licensed under Creative Commons Attribution 4.0. Please credit GeoNames when using data derived from GeoNames.
Suggested attribution:
This product includes data derived from GeoNames (https://www.geonames.org/), licensed under CC BY 4.0.
The data is provided as-is and may contain spelling variants, alternate names, outdated entries, or OCR-specific aliases.
Privacy and Project Neutrality
This package is public and project-neutral.
It does not include private project names, private customer data, private document text, or proprietary extraction logic. Use it as a reusable Indian place-name and OCR address cleanup utility.
Troubleshooting
Old version still installing
python -m pip uninstall indic-places -y
python -m pip install --no-cache-dir --upgrade --force-reinstall indic-places
Check installed version
python -c "import importlib.metadata as m; print(m.version('indic-places'))"
Command not found
python -m indic_places.cli stats
Works locally but not after pip install
Make sure package data files are included in the published wheel:
MANIFEST.in
pyproject.toml
indic_places/data/custom_places.txt
indic_places/data/address_terms.txt
indic_places/data/places_index.json.gz
Source Code
GitHub repository:
https://github.com/Tinku746286/indic_names_library
For normal users, install from PyPI:
pip install --upgrade indic-places
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file indic_places-1.1.5.tar.gz.
File metadata
- Download URL: indic_places-1.1.5.tar.gz
- Upload date:
- Size: 5.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
256ab4793e095734627facdfe0cf84e49adaf94aff18feda2b04e747ffc5e814
|
|
| MD5 |
f5a9378f754609f40421d0bd99b60ccb
|
|
| BLAKE2b-256 |
cfdf3fef6d4e12e93c4dc43609a5e358d39a9b2d6db54211a06a15b617259715
|
File details
Details for the file indic_places-1.1.5-py3-none-any.whl.
File metadata
- Download URL: indic_places-1.1.5-py3-none-any.whl
- Upload date:
- Size: 5.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f9a8071fc889eec9acc37791ade73f7cb3aba5752b62f2348ff88c0879e89ce
|
|
| MD5 |
ebf07d2c3c81e63149d2d69ff5cb93da
|
|
| BLAKE2b-256 |
29ef6f7416855d0e7b0fa29826c915de74adc4de32269db6c05a549f19b52795
|