Skip to main content

Parse raw Indian address strings into structured fields using a fine-tuned Qwen3 LoRA adapter

Project description

indian-address-parser

Parse raw, unstructured Indian address strings into 13 structured fields using a Qwen3-0.6B model fine-tuned with LoRA. Model weights are downloaded automatically from Hugging Face — this package ships only inference code, no weights.

Input:  "FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029"
Output: {"houseNumber": "FLAT NO.32", "houseName": "UTTARA TOWERS", "poi": null,
         "street": "MG ROAD", "subsubLocality": null, "subLocality": null, "locality": null,
         "village": null, "subDistrict": null, "district": "Kamrup", "city": "GUWAHATI",
         "state": "AS", "pincode": "781029"}

Install

pip install indian-address-parser

Usage

Python

from indian_address_parser import AddressParser

parser = AddressParser()  # downloads model weights from HF on first use
result = parser.parse("FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029")
print(result)

# Batch
results = parser.parse_batch([addr1, addr2, addr3])

CLI

# Single address
indian-address-parser "FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029"

# Batch from stdin
cat addresses.txt | indian-address-parser --stdin

# Batch from a file, JSONL output
indian-address-parser --file addresses.txt --out results.jsonl

Fields

houseNumber, houseName, poi, street, subsubLocality, subLocality,
locality, village, subDistrict, district, city, state, pincode

Any field not present in the address is null. If the model output can't be parsed as JSON, all fields are null and a _parse_error key holds the raw model output.

Model details, evaluation metrics, and known limitations

See the model card for training data, LoRA config, per-field evaluation results (100% JSON parse rate, 82.4% mean field accuracy on held-out test data), and known limitations (locality/subLocality/ subsubLocality/village field-boundary ambiguity, etc.).

Apple Silicon (MLX) users

This package uses transformers+peft, which works on CUDA, MPS, and CPU but is not the fastest path on Apple Silicon. For MLX-native inference, see the mlx/ subfolder of the Hugging Face repo instead.

License

Apache 2.0 (matching the base model, Qwen/Qwen3-0.6B).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indian_address_parser-0.1.2.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

indian_address_parser-0.1.2-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file indian_address_parser-0.1.2.tar.gz.

File metadata

  • Download URL: indian_address_parser-0.1.2.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for indian_address_parser-0.1.2.tar.gz
Algorithm Hash digest
SHA256 1c1a41c0adb13e9df42f3170c16a25ab1b0f99642d0a536ba28a6d0d7a3facae
MD5 ccaf5e40727e941938a8f52d72520ba0
BLAKE2b-256 8ade45f4fe32270e7992522541d4ceb270a7fae9efcc219bcff207aa48c921a1

See more details on using hashes here.

File details

Details for the file indian_address_parser-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for indian_address_parser-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a41a22fd7db5986b10cfc58a037e19be7bcb4dcc0e0911fb299c3148872579b0
MD5 cc3894659a71a2eec47e13ce11044e3f
BLAKE2b-256 b71d223a2b955567dfcf9812c226250287bc32ffe60d48d7a2455668d16b51c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page