Skip to main content

Parse raw Indian address strings into structured fields using a fine-tuned Qwen3 LoRA adapter

Project description

indian-address-parser

Parse raw, unstructured Indian address strings into 13 structured fields using a Qwen3-0.6B model fine-tuned with LoRA. Model weights are downloaded automatically from Hugging Face — this package ships only inference code, no weights.

Input:  "FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029"
Output: {"houseNumber": "FLAT NO.32", "houseName": "UTTARA TOWERS", "poi": null,
         "street": "MG ROAD", "subsubLocality": null, "subLocality": null, "locality": null,
         "village": null, "subDistrict": null, "district": "Kamrup", "city": "GUWAHATI",
         "state": "AS", "pincode": "781029"}

Install

pip install indian-address-parser

Usage

Python

from indian_address_parser import AddressParser

parser = AddressParser()  # downloads model weights from HF on first use
result = parser.parse("FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029")
print(result)

# Batch
results = parser.parse_batch([addr1, addr2, addr3])

CLI

# Single address
indian-address-parser "FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029"

# Batch from stdin
cat addresses.txt | indian-address-parser --stdin

# Batch from a file, JSONL output
indian-address-parser --file addresses.txt --out results.jsonl

Fields

houseNumber, houseName, poi, street, subsubLocality, subLocality,
locality, village, subDistrict, district, city, state, pincode

Any field not present in the address is null. If the model output can't be parsed as JSON, all fields are null and a _parse_error key holds the raw model output.

Model details, evaluation metrics, and known limitations

See the model card for training data, LoRA config, per-field evaluation results (100% JSON parse rate, 82.4% mean field accuracy on held-out test data), and known limitations (locality/subLocality/ subsubLocality/village field-boundary ambiguity, etc.).

Apple Silicon (MLX) users

This package uses transformers+peft, which works on CUDA, MPS, and CPU but is not the fastest path on Apple Silicon. For MLX-native inference, see the mlx/ subfolder of the Hugging Face repo instead.

License

Apache 2.0 (matching the base model, Qwen/Qwen3-0.6B).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indian_address_parser-0.1.1.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

indian_address_parser-0.1.1-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file indian_address_parser-0.1.1.tar.gz.

File metadata

  • Download URL: indian_address_parser-0.1.1.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for indian_address_parser-0.1.1.tar.gz
Algorithm Hash digest
SHA256 86d58adf1fdc05b02aa0323f1087588f63c6e15c512cd6c4d409e7b15f1b9d6a
MD5 3bf2103f0e8b9a337aa1dd348507e4d3
BLAKE2b-256 2a34e561fee7caf645dcf0fa30408f3e0990315499ceb16b4962310d9b89c1ee

See more details on using hashes here.

File details

Details for the file indian_address_parser-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for indian_address_parser-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d934fc6d60f656fdd927eff3745203bb220a88f2fac4a402872732d07cba38cc
MD5 cdfdc8414a9034a797f901d9d9be0274
BLAKE2b-256 2bea46bcf71d967d05d68c6bd1a837786501cae7149951f62e253978ad2b955b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page