Indian place name identifier with SymSpell-style fuzzy lookup and OCR word segmentation

These details have not been verified by PyPI

Project links

Project description

Indic Places Library

Indian place-name lookup, fuzzy matching, OCR address spacing, and merged-word segmentation for Python.

This package is designed for public use in Indian address processing, OCR cleanup, place-name identification, and document extraction workflows.

Install from PyPI

Install the latest version:

pip install --upgrade indic-places

Clean reinstall without cache:

python -m pip install --no-cache-dir --upgrade --force-reinstall indic-places

Install a specific version:

python -m pip install indic-places==1.1.5

Add to requirements.txt:

indic-places>=1.1.5

Import

The PyPI package name is:

indic-places

The Python import is:

from indic_places import IndicPlaces

Data Stats

Metric	Count
Structured GeoNames + postal records	815,477
Unique place names	817,641
Runtime OCR/custom place aliases	652,331
Coverage	India-wide

Quick Usage

from indic_places import IndicPlaces

ip = IndicPlaces()

address = "PILASSERYADIVARAMPUTHUPPADIADIVARAM PUDUPADIKATTIPARAADIVARAM THAMARASSERYKOZHIKODE - 673586"
print(ip.normalize_address_spacing(address))

Expected output:

PILASSERY ADIVARAM PUTHUPPADI ADIVARAM PUDUPADI KATTIPARA ADIVARAM THAMARASSERY KOZHIKODE - 673586

What This Library Solves

OCR often returns Indian addresses as merged text:

PILASSERYADIVARAMPUTHUPPADIADIVARAM
KUNNUMPURATHHOUSEKALLARAP.O
THAMARASSERYKOZHIKODE

indic-places uses Indian place vocabulary and address terms to safely space merged OCR tokens.

Main Features

Indian place-name lookup
OCR merged-address spacing
SymSpell-style fuzzy lookup
Word segmentation
Place extraction from text
India-wide GeoNames and postal vocabulary
Runtime OCR/custom place aliases from indic_places/data/custom_places.txt

Use in Any Python Project

1. Install

python -m pip install --upgrade indic-places

2. Create the Engine Once

from indic_places import IndicPlaces

place_engine = IndicPlaces()

3. Normalize an OCR Address

raw_address = "PILASSERYADIVARAMPUTHUPPADIADIVARAM PUDUPADIKATTIPARAADIVARAM THAMARASSERYKOZHIKODE - 673586"

clean_address = place_engine.normalize_address_spacing(raw_address)

print(clean_address)

Output:

PILASSERY ADIVARAM PUTHUPPADI ADIVARAM PUDUPADI KATTIPARA ADIVARAM THAMARASSERY KOZHIKODE - 673586

Recommended Integration Pattern for OCR Pipelines

Use the library at the final address-cleanup stage, after your extraction logic has already identified the address candidate.

from indic_places import IndicPlaces

_PLACE_ENGINE = None


def get_place_engine():
    global _PLACE_ENGINE

    if _PLACE_ENGINE is None:
        _PLACE_ENGINE = IndicPlaces()

    return _PLACE_ENGINE


def normalize_address(address: str) -> str:
    address = " ".join(str(address or "").split()).strip(" ,:-|")

    if not address:
        return ""

    engine = get_place_engine()
    return engine.normalize_address_spacing(address)

Use with an Existing Extractor Function

If your project has a final address cleanup function, call normalize_address_spacing() inside that final cleanup function.

from indic_places import IndicPlaces

_PLACE_ENGINE = IndicPlaces()


def finalize_address(address: str) -> str:
    address = " ".join(str(address or "").split()).strip(" ,:-|")

    if not address:
        return ""

    address = _PLACE_ENGINE.normalize_address_spacing(address)

    return " ".join(address.split()).strip(" ,:-|")

If your extraction function stores the best address candidate before returning, normalize before storing the final value.

def evaluate_and_store_address(candidate: str):
    candidate = finalize_address(candidate)

    if not candidate:
        return False

    # Store candidate in your output dictionary/model.
    return True

Lookup Places

from indic_places import IndicPlaces

ip = IndicPlaces()

results = ip.lookup("Bangalor", top_n=5)

for r in results:
    print(r.name, r.state, r.district, r.pincode, r.score)

Extract Places from Text

from indic_places import IndicPlaces

ip = IndicPlaces()

text = "PONMINISSERY HOUSE PERAMBRA THRISSUR 680689"
places = ip.extract_places(text)

for p in places:
    print(p.name, p.state, p.district, p.pincode)

Word Segmentation

from indic_places import IndicPlaces

ip = IndicPlaces()

result = ip.segment("iliveinmumbaiorkerala")
print(result.segmented)
print(result.score)

CLI Usage

Show stats:

indic-places stats

Lookup:

indic-places lookup Bangalor

Segment text:

indic-places segment iliveinmumbaiorkerala

Extract places:

indic-places extract "PONMINISSERY HOUSE PERAMBRA THRISSUR 680689"

Recommended Pattern for Large OCR Pipelines

Bad:

def clean_address(address):
    ip = IndicPlaces()
    return ip.normalize_address_spacing(address)

Good:

from indic_places import IndicPlaces

_PLACE_ENGINE = IndicPlaces()

def clean_address(address):
    return _PLACE_ENGINE.normalize_address_spacing(address)

This loads the place index once and is better for large document batches.

Data Files

The package includes runtime data files:

indic_places/data/address_terms.txt
indic_places/data/custom_places.txt
indic_places/data/places_index.json.gz

The repository also contains supporting/reference data:

data/unique_place_names.txt
data/geonames_india_places_full.csv.gz
data/by_state_geonames/

Data Sources and Attribution

This package includes place-name vocabulary derived from open geographical datasets, including GeoNames India gazetteer and postal data.

GeoNames data is licensed under Creative Commons Attribution 4.0. Please credit GeoNames when using data derived from GeoNames.

Suggested attribution:

This product includes data derived from GeoNames (https://www.geonames.org/), licensed under CC BY 4.0.

The data is provided as-is and may contain spelling variants, alternate names, outdated entries, or OCR-specific aliases.

Privacy and Project Neutrality

This package is public and project-neutral.

It does not include private project names, private customer data, private document text, or proprietary extraction logic. Use it as a reusable Indian place-name and OCR address cleanup utility.

Troubleshooting

Old version still installing

python -m pip uninstall indic-places -y
python -m pip install --no-cache-dir --upgrade --force-reinstall indic-places

Check installed version

python -c "import importlib.metadata as m; print(m.version('indic-places'))"

Command not found

python -m indic_places.cli stats

Works locally but not after pip install

Make sure package data files are included in the published wheel:

MANIFEST.in
pyproject.toml
indic_places/data/custom_places.txt
indic_places/data/address_terms.txt
indic_places/data/places_index.json.gz

Source Code

GitHub repository:

https://github.com/Tinku746286/indic_names_library

For normal users, install from PyPI:

pip install --upgrade indic-places

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.4.13

Apr 25, 2026

1.4.12

Apr 25, 2026

1.4.10

Apr 25, 2026

1.4.7

Apr 25, 2026

1.4.2

Apr 25, 2026

1.4.1

Apr 25, 2026

1.4.0

Apr 24, 2026

1.3.9

Apr 24, 2026

1.3.8

Apr 24, 2026

1.3.7

Apr 24, 2026

1.3.5

Apr 24, 2026

1.3.4

Apr 24, 2026

1.2.7

Apr 24, 2026

1.2.3

Apr 24, 2026

1.2.2

Apr 24, 2026

1.1.7

Apr 24, 2026

1.1.6

Apr 24, 2026

This version

1.1.5

Apr 24, 2026

1.1.4

Apr 24, 2026

1.1.3

Apr 24, 2026

1.1.2

Apr 24, 2026

1.1.1

Apr 24, 2026

1.1.0

Apr 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indic_places-1.1.5.tar.gz (5.7 MB view details)

Uploaded Apr 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

indic_places-1.1.5-py3-none-any.whl (5.6 MB view details)

Uploaded Apr 24, 2026 Python 3

File details

Details for the file indic_places-1.1.5.tar.gz.

File metadata

Download URL: indic_places-1.1.5.tar.gz
Upload date: Apr 24, 2026
Size: 5.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for indic_places-1.1.5.tar.gz
Algorithm	Hash digest
SHA256	`256ab4793e095734627facdfe0cf84e49adaf94aff18feda2b04e747ffc5e814`
MD5	`f5a9378f754609f40421d0bd99b60ccb`
BLAKE2b-256	`cfdf3fef6d4e12e93c4dc43609a5e358d39a9b2d6db54211a06a15b617259715`

See more details on using hashes here.

File details

Details for the file indic_places-1.1.5-py3-none-any.whl.

File metadata

Download URL: indic_places-1.1.5-py3-none-any.whl
Upload date: Apr 24, 2026
Size: 5.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for indic_places-1.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2f9a8071fc889eec9acc37791ade73f7cb3aba5752b62f2348ff88c0879e89ce`
MD5	`ebf07d2c3c81e63149d2d69ff5cb93da`
BLAKE2b-256	`29ef6f7416855d0e7b0fa29826c915de74adc4de32269db6c05a549f19b52795`

See more details on using hashes here.

indic-places 1.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Indic Places Library

Install from PyPI

Import

Data Stats

Quick Usage

What This Library Solves

Main Features

Use in Any Python Project

1. Install

2. Create the Engine Once

3. Normalize an OCR Address

Recommended Integration Pattern for OCR Pipelines

Use with an Existing Extractor Function

Lookup Places

Extract Places from Text

Word Segmentation

CLI Usage

Recommended Pattern for Large OCR Pipelines

Data Files

Data Sources and Attribution

Privacy and Project Neutrality

Troubleshooting

Old version still installing

Check installed version

Command not found

Works locally but not after pip install

Source Code

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes