Skip to main content

Indian place name identifier with SymSpell-style fuzzy lookup and OCR word segmentation

Project description

Indic Places Library

Indian place-name lookup, fuzzy matching, OCR address spacing, and merged-word segmentation for Python.

This package is designed for public use in Indian address processing, OCR cleanup, place-name identification, and document extraction workflows.

Install from PyPI

Install the latest version:

pip install --upgrade indic-places

Clean reinstall without cache:

python -m pip install --no-cache-dir --upgrade --force-reinstall indic-places

Install a specific version:

python -m pip install indic-places==1.1.5

Add to requirements.txt:

indic-places>=1.1.5

Import

The PyPI package name is:

indic-places

The Python import is:

from indic_places import IndicPlaces

Data Stats

Metric Count
Structured GeoNames + postal records 815,477
Unique place names 817,641
Runtime OCR/custom place aliases 652,331
Coverage India-wide

Quick Usage

from indic_places import IndicPlaces

ip = IndicPlaces()

address = "PILASSERYADIVARAMPUTHUPPADIADIVARAM PUDUPADIKATTIPARAADIVARAM THAMARASSERYKOZHIKODE - 673586"
print(ip.normalize_address_spacing(address))

Expected output:

PILASSERY ADIVARAM PUTHUPPADI ADIVARAM PUDUPADI KATTIPARA ADIVARAM THAMARASSERY KOZHIKODE - 673586

What This Library Solves

OCR often returns Indian addresses as merged text:

PILASSERYADIVARAMPUTHUPPADIADIVARAM
KUNNUMPURATHHOUSEKALLARAP.O
THAMARASSERYKOZHIKODE

indic-places uses Indian place vocabulary and address terms to safely space merged OCR tokens.

Main Features

  • Indian place-name lookup
  • OCR merged-address spacing
  • SymSpell-style fuzzy lookup
  • Word segmentation
  • Place extraction from text
  • India-wide GeoNames and postal vocabulary
  • Runtime OCR/custom place aliases from indic_places/data/custom_places.txt

Use in Any Python Project

1. Install

python -m pip install --upgrade indic-places

2. Create the Engine Once

from indic_places import IndicPlaces

place_engine = IndicPlaces()

3. Normalize an OCR Address

raw_address = "PILASSERYADIVARAMPUTHUPPADIADIVARAM PUDUPADIKATTIPARAADIVARAM THAMARASSERYKOZHIKODE - 673586"

clean_address = place_engine.normalize_address_spacing(raw_address)

print(clean_address)

Output:

PILASSERY ADIVARAM PUTHUPPADI ADIVARAM PUDUPADI KATTIPARA ADIVARAM THAMARASSERY KOZHIKODE - 673586

Recommended Integration Pattern for OCR Pipelines

Use the library at the final address-cleanup stage, after your extraction logic has already identified the address candidate.

from indic_places import IndicPlaces

_PLACE_ENGINE = None


def get_place_engine():
    global _PLACE_ENGINE

    if _PLACE_ENGINE is None:
        _PLACE_ENGINE = IndicPlaces()

    return _PLACE_ENGINE


def normalize_address(address: str) -> str:
    address = " ".join(str(address or "").split()).strip(" ,:-|")

    if not address:
        return ""

    engine = get_place_engine()
    return engine.normalize_address_spacing(address)

Use with an Existing Extractor Function

If your project has a final address cleanup function, call normalize_address_spacing() inside that final cleanup function.

from indic_places import IndicPlaces

_PLACE_ENGINE = IndicPlaces()


def finalize_address(address: str) -> str:
    address = " ".join(str(address or "").split()).strip(" ,:-|")

    if not address:
        return ""

    address = _PLACE_ENGINE.normalize_address_spacing(address)

    return " ".join(address.split()).strip(" ,:-|")

If your extraction function stores the best address candidate before returning, normalize before storing the final value.

def evaluate_and_store_address(candidate: str):
    candidate = finalize_address(candidate)

    if not candidate:
        return False

    # Store candidate in your output dictionary/model.
    return True

Lookup Places

from indic_places import IndicPlaces

ip = IndicPlaces()

results = ip.lookup("Bangalor", top_n=5)

for r in results:
    print(r.name, r.state, r.district, r.pincode, r.score)

Extract Places from Text

from indic_places import IndicPlaces

ip = IndicPlaces()

text = "PONMINISSERY HOUSE PERAMBRA THRISSUR 680689"
places = ip.extract_places(text)

for p in places:
    print(p.name, p.state, p.district, p.pincode)

Word Segmentation

from indic_places import IndicPlaces

ip = IndicPlaces()

result = ip.segment("iliveinmumbaiorkerala")
print(result.segmented)
print(result.score)

CLI Usage

Show stats:

indic-places stats

Lookup:

indic-places lookup Bangalor

Segment text:

indic-places segment iliveinmumbaiorkerala

Extract places:

indic-places extract "PONMINISSERY HOUSE PERAMBRA THRISSUR 680689"

Recommended Pattern for Large OCR Pipelines

Bad:

def clean_address(address):
    ip = IndicPlaces()
    return ip.normalize_address_spacing(address)

Good:

from indic_places import IndicPlaces

_PLACE_ENGINE = IndicPlaces()

def clean_address(address):
    return _PLACE_ENGINE.normalize_address_spacing(address)

This loads the place index once and is better for large document batches.

Data Files

The package includes runtime data files:

indic_places/data/address_terms.txt
indic_places/data/custom_places.txt
indic_places/data/places_index.json.gz

The repository also contains supporting/reference data:

data/unique_place_names.txt
data/geonames_india_places_full.csv.gz
data/by_state_geonames/

Data Sources and Attribution

This package includes place-name vocabulary derived from open geographical datasets, including GeoNames India gazetteer and postal data.

GeoNames data is licensed under Creative Commons Attribution 4.0. Please credit GeoNames when using data derived from GeoNames.

Suggested attribution:

This product includes data derived from GeoNames (https://www.geonames.org/), licensed under CC BY 4.0.

The data is provided as-is and may contain spelling variants, alternate names, outdated entries, or OCR-specific aliases.

Privacy and Project Neutrality

This package is public and project-neutral.

It does not include private project names, private customer data, private document text, or proprietary extraction logic. Use it as a reusable Indian place-name and OCR address cleanup utility.

Troubleshooting

Old version still installing

python -m pip uninstall indic-places -y
python -m pip install --no-cache-dir --upgrade --force-reinstall indic-places

Check installed version

python -c "import importlib.metadata as m; print(m.version('indic-places'))"

Command not found

python -m indic_places.cli stats

Works locally but not after pip install

Make sure package data files are included in the published wheel:

MANIFEST.in
pyproject.toml
indic_places/data/custom_places.txt
indic_places/data/address_terms.txt
indic_places/data/places_index.json.gz

Source Code

GitHub repository:

https://github.com/Tinku746286/indic_names_library

For normal users, install from PyPI:

pip install --upgrade indic-places

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indic_places-1.1.5.tar.gz (5.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

indic_places-1.1.5-py3-none-any.whl (5.6 MB view details)

Uploaded Python 3

File details

Details for the file indic_places-1.1.5.tar.gz.

File metadata

  • Download URL: indic_places-1.1.5.tar.gz
  • Upload date:
  • Size: 5.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for indic_places-1.1.5.tar.gz
Algorithm Hash digest
SHA256 256ab4793e095734627facdfe0cf84e49adaf94aff18feda2b04e747ffc5e814
MD5 f5a9378f754609f40421d0bd99b60ccb
BLAKE2b-256 cfdf3fef6d4e12e93c4dc43609a5e358d39a9b2d6db54211a06a15b617259715

See more details on using hashes here.

File details

Details for the file indic_places-1.1.5-py3-none-any.whl.

File metadata

  • Download URL: indic_places-1.1.5-py3-none-any.whl
  • Upload date:
  • Size: 5.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for indic_places-1.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2f9a8071fc889eec9acc37791ade73f7cb3aba5752b62f2348ff88c0879e89ce
MD5 ebf07d2c3c81e63149d2d69ff5cb93da
BLAKE2b-256 29ef6f7416855d0e7b0fa29826c915de74adc4de32269db6c05a549f19b52795

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page