Skip to main content

Parse raw Indian address strings into structured fields using a fine-tuned Qwen3 LoRA adapter

Project description

indian-address-parser

Parse raw, unstructured Indian address strings into 13 structured fields using a Qwen3-0.6B model fine-tuned with LoRA. Model weights are downloaded automatically from Hugging Face — this package ships only inference code, no weights.

Input:  "FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029"
Output: {"houseNumber": "FLAT NO.32", "houseName": "UTTARA TOWERS", "poi": null,
         "street": "MG ROAD", "subsubLocality": null, "subLocality": null, "locality": null,
         "village": null, "subDistrict": null, "district": "Kamrup", "city": "GUWAHATI",
         "state": "AS", "pincode": "781029"}

Install

pip install indian-address-parser

Usage

Python

from indian_address_parser import AddressParser

parser = AddressParser()  # downloads model weights from HF on first use
result = parser.parse("FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029")
print(result)

# Batch
results = parser.parse_batch([addr1, addr2, addr3])

CLI

# Single address
indian-address-parser "FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029"

# Batch from stdin
cat addresses.txt | indian-address-parser --stdin

# Batch from a file, JSONL output
indian-address-parser --file addresses.txt --out results.jsonl

Fields

houseNumber, houseName, poi, street, subsubLocality, subLocality,
locality, village, subDistrict, district, city, state, pincode

Any field not present in the address is null. If the model output can't be parsed as JSON, all fields are null and a _parse_error key holds the raw model output.

Model details, evaluation metrics, and known limitations

See the model card for training data, LoRA config, per-field evaluation results (100% JSON parse rate, 82.4% mean field accuracy on held-out test data), and known limitations (locality/subLocality/ subsubLocality/village field-boundary ambiguity, etc.).

Apple Silicon (MLX) users

This package uses transformers+peft, which works on CUDA, MPS, and CPU but is not the fastest path on Apple Silicon. For MLX-native inference, see the mlx/ subfolder of the Hugging Face repo instead.

License

Apache 2.0 (matching the base model, Qwen/Qwen3-0.6B).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indian_address_parser-0.1.0.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

indian_address_parser-0.1.0-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file indian_address_parser-0.1.0.tar.gz.

File metadata

  • Download URL: indian_address_parser-0.1.0.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for indian_address_parser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1480b92dee06e3b9a2c92bbdee98c8ff397ada415c24d9f935dfb8c383df74ab
MD5 054d1f8fc99a84d4769b6254be3b9c36
BLAKE2b-256 051621402571a8ab4e8ac027457f7804b4b61f7a3aa9c057c208a8d3a5231aa8

See more details on using hashes here.

File details

Details for the file indian_address_parser-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for indian_address_parser-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 29525a2077a110dd5d33a4cee2682c81ce440746be3e62960181df0ec84b0917
MD5 c8ca2c88ee8afe539e5f7ad86c3a7cb2
BLAKE2b-256 ec4a6567bbde817929bbdcc219cc099937f7f7be9ec0ff1d428b5711491fbad9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page