Skip to main content

Indian place name identifier with SymSpell-style fuzzy lookup and OCR word segmentation

Project description

Indic Places Library

Indian place-name lookup, fuzzy matching, OCR address spacing, and merged-word segmentation for Python.

This library is mainly built for Indian OCR/address extraction work, especially sanction-letter extraction pipelines where borrower addresses often come from scanned PDFs and OCR output.

Install from PyPI

Install the latest version:

pip install --upgrade indic-places

Clean reinstall without cache:

python -m pip install --no-cache-dir --upgrade --force-reinstall indic-places

Install a specific version:

python -m pip install indic-places==1.1.4

Add to requirements.txt:

indic-places>=1.1.4

Import

The PyPI package name is:

indic-places

The Python import is:

from indic_places import IndicPlaces

Data Stats

Metric Count
Structured GeoNames + postal records 815,477
Unique place names 817,641
Runtime OCR/custom place aliases 652,331
Coverage India-wide

Quick Usage

from indic_places import IndicPlaces

ip = IndicPlaces()

address = "PILASSERYADIVARAMPUTHUPPADIADIVARAM PUDUPADIKATTIPARAADIVARAM THAMARASSERYKOZHIKODE - 673586"
print(ip.normalize_address_spacing(address))

Expected output:

PILASSERY ADIVARAM PUTHUPPADI ADIVARAM PUDUPADI KATTIPARA ADIVARAM THAMARASSERY KOZHIKODE - 673586

What This Library Solves

OCR often returns Indian addresses like this:

PILASSERYADIVARAMPUTHUPPADIADIVARAM
KUNNUMPURATHHOUSEKALLARAP.O
THAMARASSERYKOZHIKODE

indic-places uses Indian place vocabulary and address terms to safely space merged OCR tokens.

Main Features

  • Indian place-name lookup
  • OCR merged-address spacing
  • SymSpell-style fuzzy lookup
  • Word segmentation
  • Place extraction from text
  • India-wide GeoNames and postal vocabulary
  • Runtime OCR/custom place aliases from indic_places/data/custom_places.txt

Use in Your Project: SureCreditSanction Engine

Your project is a sanction-letter extraction system. It extracts fields such as borrower address, borrower name, branch name/code, regional office, sanction date/reference, period of limit, and validity of sanction.

For this project, indic-places should be used at the final borrower-address cleanup stage.

Install in SureCreditSanction Engine

cd "C:\Users\KUMAR TINKU\Downloads\SureCreditSanction-Engine\SureCreditSanction-Engine\sure_sanction"
.venv\Scripts\activate
python -m pip install --upgrade indic-places

Check installed version:

python -c "import importlib.metadata as m; print(m.version('indic-places'))"

Add Import in Extractor File

Add this near the top of the file where borrower address extraction is implemented:

try:
    from indic_places import IndicPlaces
except Exception:
    IndicPlaces = None

Add Lazy Loader

Do not create IndicPlaces() inside every function call. Load it once.

_INDIC_PLACE_ENGINE = None
_INDIC_PLACE_ENGINE_FAILED = False


def _get_indic_place_engine():
    global _INDIC_PLACE_ENGINE, _INDIC_PLACE_ENGINE_FAILED

    if _INDIC_PLACE_ENGINE_FAILED:
        return None

    if _INDIC_PLACE_ENGINE is None:
        if IndicPlaces is None:
            _INDIC_PLACE_ENGINE_FAILED = True
            return None

        try:
            _INDIC_PLACE_ENGINE = IndicPlaces()
        except Exception:
            _INDIC_PLACE_ENGINE_FAILED = True
            return None

    return _INDIC_PLACE_ENGINE


def _normalize_address_with_indic_places(address):
    s = str(address or "").strip()
    if not s:
        return ""

    engine = _get_indic_place_engine()
    if engine is None:
        return s

    try:
        fixed = engine.normalize_address_spacing(s)
        return " ".join(str(fixed or s).split())
    except Exception:
        return s

Use in _finalize_address

Call the library during final address cleanup:

def _finalize_address(s):
    if not s:
        return ""

    s = str(s).strip()
    s = _normalize_address_with_indic_places(s)

    return " ".join(s.split()).strip(" ,:-|")

Important for get_borrower_address()

If your get_borrower_address() function has early returns, call _finalize_address(cand) inside _evaluate_and_store() before storing best_address.

def _evaluate_and_store(cand, current_source):
    nonlocal best_address, source

    if not cand:
        return False

    cand = _clean_addr_value(cand)

    # Important: normalize before best_address is stored.
    cand = _finalize_address(cand)

    if _is_bad_borrower_address_text(cand) or len(cand.strip()) <= 8:
        return False

    if not best_address or len(cand) > len(best_address):
        best_address = cand
        source = current_source

    return bool(re.search(r"\b\d{6}\b", cand))

Example: Borrower Address Cleanup

Input:

borrower_address = "PILASSERYADIVARAMPUTHUPPADIADIVARAM PUDUPADIKATTIPARAADIVARAM THAMARASSERYKOZHIKODE - 673586"

Code:

from indic_places import IndicPlaces

ip = IndicPlaces()
borrower_address = ip.normalize_address_spacing(borrower_address)

print(borrower_address)

Output:

PILASSERY ADIVARAM PUTHUPPADI ADIVARAM PUDUPADI KATTIPARA ADIVARAM THAMARASSERY KOZHIKODE - 673586

Lookup Places

from indic_places import IndicPlaces

ip = IndicPlaces()

results = ip.lookup("Bangalor", top_n=5)

for r in results:
    print(r.name, r.state, r.district, r.pincode, r.score)

Extract Places from Text

from indic_places import IndicPlaces

ip = IndicPlaces()

text = "PONMINISSERY HOUSE PERAMBRA THRISSUR 680689"
places = ip.extract_places(text)

for p in places:
    print(p.name, p.state, p.district, p.pincode)

Word Segmentation

from indic_places import IndicPlaces

ip = IndicPlaces()

result = ip.segment("iliveinmumbaiorkerala")
print(result.segmented)
print(result.score)

CLI Usage

Show stats:

indic-places stats

Lookup:

indic-places lookup Bangalor

Segment text:

indic-places segment iliveinmumbaiorkerala

Extract places:

indic-places extract "PONMINISSERY HOUSE PERAMBRA THRISSUR 680689"

Recommended Pattern for Large OCR Pipelines

Bad:

def clean_address(address):
    ip = IndicPlaces()
    return ip.normalize_address_spacing(address)

Good:

from indic_places import IndicPlaces

_PLACE_ENGINE = IndicPlaces()

def clean_address(address):
    return _PLACE_ENGINE.normalize_address_spacing(address)

This loads the place index once and is better for processing thousands or lakhs of documents.

Troubleshooting

Old version still installing

python -m pip uninstall indic-places -y
python -m pip install --no-cache-dir --upgrade --force-reinstall indic-places

Check version

python -c "import importlib.metadata as m; print(m.version('indic-places'))"

Command not found

python -m indic_places.cli stats

Works locally but not after pip install

Make sure these package data files are included:

MANIFEST.in
pyproject.toml
indic_places/data/custom_places.txt
indic_places/data/address_terms.txt
indic_places/data/places_index.json.gz

Source Code

GitHub repository:

https://github.com/Tinku746286/indic_names_library

For normal users, install from PyPI:

pip install --upgrade indic-places

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indic_places-1.1.4.tar.gz (5.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

indic_places-1.1.4-py3-none-any.whl (5.6 MB view details)

Uploaded Python 3

File details

Details for the file indic_places-1.1.4.tar.gz.

File metadata

  • Download URL: indic_places-1.1.4.tar.gz
  • Upload date:
  • Size: 5.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for indic_places-1.1.4.tar.gz
Algorithm Hash digest
SHA256 dd42c7865eb76ed29fb16bd6f05d31a129ff92aa8b3ef6a76ca1a36cd783199e
MD5 30a9d0b6c6ca4bcdbe3a986b58127402
BLAKE2b-256 034cf12f4ef1b7e0244fbdc9477b04e7b63e2a3b369565f8b4d89d3b6b1df9da

See more details on using hashes here.

File details

Details for the file indic_places-1.1.4-py3-none-any.whl.

File metadata

  • Download URL: indic_places-1.1.4-py3-none-any.whl
  • Upload date:
  • Size: 5.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for indic_places-1.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 65a0f629893ff7f637490dbad1855f9f1a52d35df3a19989da54526ff01b7af3
MD5 26c94dc70e22265543030830470c2a29
BLAKE2b-256 3193b098d0532fa0e7598b638fdf96979c67cd7ab8d4f03e7c4724003c5130c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page