Skip to main content

Parse raw Indian address strings into structured fields using a fine-tuned Qwen3 LoRA adapter

Project description

indian-address-parser

Parse raw, unstructured Indian address strings into 13 structured fields using a Qwen3-0.6B model fine-tuned with LoRA. Model weights are downloaded automatically from Hugging Face — this package ships only inference code, no weights.

Input:  "FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029"
Output: {"houseNumber": "FLAT NO.32", "houseName": "UTTARA TOWERS", "poi": null,
         "street": "MG ROAD", "subsubLocality": null, "subLocality": null, "locality": null,
         "village": null, "subDistrict": null, "district": "Kamrup", "city": "GUWAHATI",
         "state": "AS", "pincode": "781029"}

Install

pip install indian-address-parser

Usage

Python

from indian_address_parser import AddressParser

parser = AddressParser()  # downloads model weights from HF on first use
result = parser.parse("FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029")
print(result)

# Batch
results = parser.parse_batch([addr1, addr2, addr3])

CLI

# Single address
indian-address-parser "FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029"

# Batch from stdin
cat addresses.txt | indian-address-parser --stdin

# Batch from a file, JSONL output
indian-address-parser --file addresses.txt --out results.jsonl

Fields

houseNumber, houseName, poi, street, subsubLocality, subLocality,
locality, village, subDistrict, district, city, state, pincode

Any field not present in the address is null. If the model output can't be parsed as JSON, all fields are null and a _parse_error key holds the raw model output.

Model details, evaluation metrics, and known limitations

See the model card for training data, LoRA config, per-field evaluation results (100% JSON parse rate, 82.4% mean field accuracy on held-out test data), and known limitations (locality/subLocality/ subsubLocality/village field-boundary ambiguity, etc.).

Apple Silicon (MLX) users

This package uses transformers+peft, which works on CUDA, MPS, and CPU but is not the fastest path on Apple Silicon. For MLX-native inference, see the mlx/ subfolder of the Hugging Face repo instead.

License

Apache 2.0 (matching the base model, Qwen/Qwen3-0.6B).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indian_address_parser-0.1.3.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

indian_address_parser-0.1.3-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file indian_address_parser-0.1.3.tar.gz.

File metadata

  • Download URL: indian_address_parser-0.1.3.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for indian_address_parser-0.1.3.tar.gz
Algorithm Hash digest
SHA256 d09c3ea63af4352865c1c6ec5e519b5792eba955d8bb032e9600e176ef8469a3
MD5 debafa0a94e73f4209ce0a83e2da194b
BLAKE2b-256 7c7bc274c2fc192b5fa7838b4bce608488b6a5f2e80be771c574497be729dba3

See more details on using hashes here.

File details

Details for the file indian_address_parser-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for indian_address_parser-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9dd94ff62ed3cfe63a9fbadce8c7980d044d076bc7feaaf281564b3d4de5742b
MD5 0675df4d4a547175d8b5f4d3b8dbcc26
BLAKE2b-256 5efa859db396796817d14b3cf33c051e264d0de0e2272e304ae78dedc26c85e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page