Indian place name identifier with SymSpell-style fuzzy lookup and OCR word segmentation
Project description
Indic Places Library
Indian place-name lookup, fuzzy matching, OCR address spacing, and merged-word segmentation for Python.
This library is mainly built for Indian OCR/address extraction work, especially sanction-letter extraction pipelines where borrower addresses often come from scanned PDFs and OCR output.
Install from PyPI
Install the latest version:
pip install --upgrade indic-places
Clean reinstall without cache:
python -m pip install --no-cache-dir --upgrade --force-reinstall indic-places
Install a specific version:
python -m pip install indic-places==1.1.4
Add to requirements.txt:
indic-places>=1.1.4
Import
The PyPI package name is:
indic-places
The Python import is:
from indic_places import IndicPlaces
Data Stats
| Metric | Count |
|---|---|
| Structured GeoNames + postal records | 815,477 |
| Unique place names | 817,641 |
| Runtime OCR/custom place aliases | 652,331 |
| Coverage | India-wide |
Quick Usage
from indic_places import IndicPlaces
ip = IndicPlaces()
address = "PILASSERYADIVARAMPUTHUPPADIADIVARAM PUDUPADIKATTIPARAADIVARAM THAMARASSERYKOZHIKODE - 673586"
print(ip.normalize_address_spacing(address))
Expected output:
PILASSERY ADIVARAM PUTHUPPADI ADIVARAM PUDUPADI KATTIPARA ADIVARAM THAMARASSERY KOZHIKODE - 673586
What This Library Solves
OCR often returns Indian addresses like this:
PILASSERYADIVARAMPUTHUPPADIADIVARAM
KUNNUMPURATHHOUSEKALLARAP.O
THAMARASSERYKOZHIKODE
indic-places uses Indian place vocabulary and address terms to safely space merged OCR tokens.
Main Features
- Indian place-name lookup
- OCR merged-address spacing
- SymSpell-style fuzzy lookup
- Word segmentation
- Place extraction from text
- India-wide GeoNames and postal vocabulary
- Runtime OCR/custom place aliases from
indic_places/data/custom_places.txt
Use in Your Project: SureCreditSanction Engine
Your project is a sanction-letter extraction system. It extracts fields such as borrower address, borrower name, branch name/code, regional office, sanction date/reference, period of limit, and validity of sanction.
For this project, indic-places should be used at the final borrower-address cleanup stage.
Install in SureCreditSanction Engine
cd "C:\Users\KUMAR TINKU\Downloads\SureCreditSanction-Engine\SureCreditSanction-Engine\sure_sanction"
.venv\Scripts\activate
python -m pip install --upgrade indic-places
Check installed version:
python -c "import importlib.metadata as m; print(m.version('indic-places'))"
Add Import in Extractor File
Add this near the top of the file where borrower address extraction is implemented:
try:
from indic_places import IndicPlaces
except Exception:
IndicPlaces = None
Add Lazy Loader
Do not create IndicPlaces() inside every function call. Load it once.
_INDIC_PLACE_ENGINE = None
_INDIC_PLACE_ENGINE_FAILED = False
def _get_indic_place_engine():
global _INDIC_PLACE_ENGINE, _INDIC_PLACE_ENGINE_FAILED
if _INDIC_PLACE_ENGINE_FAILED:
return None
if _INDIC_PLACE_ENGINE is None:
if IndicPlaces is None:
_INDIC_PLACE_ENGINE_FAILED = True
return None
try:
_INDIC_PLACE_ENGINE = IndicPlaces()
except Exception:
_INDIC_PLACE_ENGINE_FAILED = True
return None
return _INDIC_PLACE_ENGINE
def _normalize_address_with_indic_places(address):
s = str(address or "").strip()
if not s:
return ""
engine = _get_indic_place_engine()
if engine is None:
return s
try:
fixed = engine.normalize_address_spacing(s)
return " ".join(str(fixed or s).split())
except Exception:
return s
Use in _finalize_address
Call the library during final address cleanup:
def _finalize_address(s):
if not s:
return ""
s = str(s).strip()
s = _normalize_address_with_indic_places(s)
return " ".join(s.split()).strip(" ,:-|")
Important for get_borrower_address()
If your get_borrower_address() function has early returns, call _finalize_address(cand) inside _evaluate_and_store() before storing best_address.
def _evaluate_and_store(cand, current_source):
nonlocal best_address, source
if not cand:
return False
cand = _clean_addr_value(cand)
# Important: normalize before best_address is stored.
cand = _finalize_address(cand)
if _is_bad_borrower_address_text(cand) or len(cand.strip()) <= 8:
return False
if not best_address or len(cand) > len(best_address):
best_address = cand
source = current_source
return bool(re.search(r"\b\d{6}\b", cand))
Example: Borrower Address Cleanup
Input:
borrower_address = "PILASSERYADIVARAMPUTHUPPADIADIVARAM PUDUPADIKATTIPARAADIVARAM THAMARASSERYKOZHIKODE - 673586"
Code:
from indic_places import IndicPlaces
ip = IndicPlaces()
borrower_address = ip.normalize_address_spacing(borrower_address)
print(borrower_address)
Output:
PILASSERY ADIVARAM PUTHUPPADI ADIVARAM PUDUPADI KATTIPARA ADIVARAM THAMARASSERY KOZHIKODE - 673586
Lookup Places
from indic_places import IndicPlaces
ip = IndicPlaces()
results = ip.lookup("Bangalor", top_n=5)
for r in results:
print(r.name, r.state, r.district, r.pincode, r.score)
Extract Places from Text
from indic_places import IndicPlaces
ip = IndicPlaces()
text = "PONMINISSERY HOUSE PERAMBRA THRISSUR 680689"
places = ip.extract_places(text)
for p in places:
print(p.name, p.state, p.district, p.pincode)
Word Segmentation
from indic_places import IndicPlaces
ip = IndicPlaces()
result = ip.segment("iliveinmumbaiorkerala")
print(result.segmented)
print(result.score)
CLI Usage
Show stats:
indic-places stats
Lookup:
indic-places lookup Bangalor
Segment text:
indic-places segment iliveinmumbaiorkerala
Extract places:
indic-places extract "PONMINISSERY HOUSE PERAMBRA THRISSUR 680689"
Recommended Pattern for Large OCR Pipelines
Bad:
def clean_address(address):
ip = IndicPlaces()
return ip.normalize_address_spacing(address)
Good:
from indic_places import IndicPlaces
_PLACE_ENGINE = IndicPlaces()
def clean_address(address):
return _PLACE_ENGINE.normalize_address_spacing(address)
This loads the place index once and is better for processing thousands or lakhs of documents.
Troubleshooting
Old version still installing
python -m pip uninstall indic-places -y
python -m pip install --no-cache-dir --upgrade --force-reinstall indic-places
Check version
python -c "import importlib.metadata as m; print(m.version('indic-places'))"
Command not found
python -m indic_places.cli stats
Works locally but not after pip install
Make sure these package data files are included:
MANIFEST.in
pyproject.toml
indic_places/data/custom_places.txt
indic_places/data/address_terms.txt
indic_places/data/places_index.json.gz
Source Code
GitHub repository:
https://github.com/Tinku746286/indic_names_library
For normal users, install from PyPI:
pip install --upgrade indic-places
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file indic_places-1.1.4.tar.gz.
File metadata
- Download URL: indic_places-1.1.4.tar.gz
- Upload date:
- Size: 5.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd42c7865eb76ed29fb16bd6f05d31a129ff92aa8b3ef6a76ca1a36cd783199e
|
|
| MD5 |
30a9d0b6c6ca4bcdbe3a986b58127402
|
|
| BLAKE2b-256 |
034cf12f4ef1b7e0244fbdc9477b04e7b63e2a3b369565f8b4d89d3b6b1df9da
|
File details
Details for the file indic_places-1.1.4-py3-none-any.whl.
File metadata
- Download URL: indic_places-1.1.4-py3-none-any.whl
- Upload date:
- Size: 5.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65a0f629893ff7f637490dbad1855f9f1a52d35df3a19989da54526ff01b7af3
|
|
| MD5 |
26c94dc70e22265543030830470c2a29
|
|
| BLAKE2b-256 |
3193b098d0532fa0e7598b638fdf96979c67cd7ab8d4f03e7c4724003c5130c8
|