A dual-head NER-based parser for location strings

These details have not been verified by PyPI

Project links

Intended Audience
- Developers
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Text Processing :: Linguistic

Project description

decide-location-formatter

A Python package for parsing and structuring ocation strings into their individual address components. Built around a dual-head NER model (svercoutere/abb-dual-location-component-ner) fine-tuned on top of XLM-RoBERTa base.

How it works

Raw location strings like "Scaldisstraat 23-25, 2000 Antwerpen" or "Cafe den Draak, Lovegemlaan 7, 9000 Gent" are common in municipal decision text but are inconsistently formatted and often contain multiple distinct locations in a single string.

The pipeline has three steps:

Text cleaning — normalises whitespace, unicode, and newlines.
Dual-head NER inference — the model runs two independent CRF-decoded classification heads over every token simultaneously:
- Component head — tags each token as one of 12 address component types (street, city, postcode, …).
- Location head — groups tokens that belong to the same physical location into B-LOCATION / I-LOCATION spans, allowing multi-location strings to be split.
Post-processing — component spans are nested inside their parent location spans, housenumber ranges/sequences (e.g. 23-25, 7 en 9) are expanded into individual entries, and bus numbers are split into a separate field.

Architecture

Component	Detail
Base encoder	`xlm-roberta-base` (12 layers, 768 hidden)
Component head	Linear(768 → 256) · GELU · Dropout(0.1) · Linear(256 → 25) + CRF
Location head	Linear(768 → 256) · GELU · Dropout(0.1) · Linear(256 → 3) + CRF
Tokenisation	Word-level regex tokeniser; sub-word alignments via fast tokenizer `word_ids()`
Max input length	256 sub-word tokens

Entity types (component head)

Label	Description
`STREET`	Street name (no house number)
`ROAD`	Road or route name
`HOUSENUMBER`	House/building number(s), ranges or sequences
`POSTCODE`	Postal or ZIP code
`CITY`	City or municipality name
`PROVINCE`	Province or region name
`BUILDING`	Named building, site or facility
`INTERSECTION`	Crossing or intersection of roads
`PARCEL`	Land parcel, section or lot number
`DISTRICT`	District, neighbourhood or borough
`GRAVE_LOCATION`	Plot/row/number within a cemetery
`DOMAIN_ZONE_AREA`	Domain, zone or area name

Evaluation

Evaluated on a held-out 10 % split of ~10 000 Belgian municipal decision location strings.

Metric	Score
Combined F1	0.9435
Component F1	0.9295
Location F1	0.9576

Installation

From source (recommended during development)

git clone https://github.com/semantic-ai/decide-location-formatter.git
cd decide-location-formatter
pip install -e .

Dependencies only

pip install torch>=2.0 transformers>=4.35 pytorch-crf>=0.7.2

The model weights (~1 GB) are downloaded automatically from the Hugging Face Hub on first use.

Usage

Quick start

from locationformatter import LocationFormatter

lf = LocationFormatter()   # loads model once; reuse for many calls

result = lf.parse("Scaldisstraat 23-25, 2000 Antwerpen")
print(result)

{
  "original": "Scaldisstraat 23-25, 2000 Antwerpen",
  "locations": [
    {
      "location": "Scaldisstraat 23-25, 2000 Antwerpen",
      "street": "Scaldisstraat",
      "housenumber": "23",
      "housenumber_type": "single",
      "postcode": "2000",
      "city": "Antwerpen"
    },
    {
      "location": "Scaldisstraat 23-25, 2000 Antwerpen",
      "street": "Scaldisstraat",
      "housenumber": "25",
      "housenumber_type": "single",
      "postcode": "2000",
      "city": "Antwerpen"
    }
  ]
}

Multi-location strings

Strings that contain several distinct locations are automatically split:

result = lf.parse("Lovegemlaan 7, 9000 Gent en Dorpstraat 12, 9240 Zele")
for loc in result["locations"]:
    print(loc)

Raw prediction (no housenumber expansion)

predict() returns spans straight from the model without expanding ranges or splitting bus numbers:

raw = lf.predict("Heikeesstraat 2-4, 9240 Zele")
# raw["locations"][0]["housenumber"] == "2-4"
# raw["locations"][0]["housenumber_type"] == "range"

One-shot helper

For a single call without keeping the model in memory:

from locationformatter import parse_location

result = parse_location("Grote Markt 1, 2000 Antwerpen")

Note: parse_location reloads the model on every call. Use LocationFormatter for repeated parsing.

Custom model or device

lf = LocationFormatter(
    repo="your-org/your-model",   # any compatible HF Hub repo
    device="cuda",                # "cpu" or "cuda"; auto-detected when omitted
)

API reference

`LocationFormatter`

class LocationFormatter:
    def __init__(self, repo: str = "svercoutere/abb-dual-location-component-ner",
                 device: str | None = None): ...

    def parse(self, text: str) -> dict: ...
    # Full pipeline: clean → NER → expand housenumbers.
    # Returns {"original": str, "locations": list[dict]}

    def predict(self, text: str) -> dict: ...
    # NER only, no housenumber expansion.
    # Returns {"original": str, "locations": list[dict]}

Helper functions

from locationformatter import clean_string, clean_house_number, extract_house_and_bus_number

clean_string("  Grote   Markt\n1  ")
# → "Grote Markt 1"

clean_house_number("3 t.e.m. 7")
# → ["3", "4", "5", "6", "7"]

clean_house_number("10-14")
# → ["10", "11", "12", "13", "14"]

extract_house_and_bus_number("5 bus 3")
# → {"housenumber": "5", "bus": "3"}

Output schema

Each entry in the locations list is a flat dict. Only fields detected by the model are included.

Field	Type	Description
`location`	`str`	The substring corresponding to this location
`street`	`str`	Street name
`road`	`str`	Road/route name
`housenumber`	`str`	Individual house number (after expansion)
`housenumber_type`	`str`	`"single"`, `"range"`, or `"sequence"`
`bus`	`str`	Bus/apartment number (when present)
`postcode`	`str`	Postal code
`city`	`str`	City or municipality
`province`	`str`	Province
`building`	`str`	Named building or facility
`intersection`	`str`	Road intersection
`parcel`	`str`	Land parcel identifier
`district`	`str`	District or neighbourhood
`grave_location`	`str`	Cemetery plot/row/number
`domain_zone_area`	`str`	Zone or area name

Development

Running tests

pytest tests/

The unit tests for the helper functions (clean_string, clean_house_number, extract_house_and_bus_number) do not require the model to be loaded and run offline.

Project details

These details have not been verified by PyPI

Project links

Intended Audience
- Developers
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Text Processing :: Linguistic

Release history Release notifications | RSS feed

This version

0.1.3

Apr 3, 2026

0.1.2

Mar 30, 2026

0.1.1

Mar 27, 2026

0.1.0

Mar 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

locationformatter-0.1.3-py3-none-any.whl (10.5 kB view details)

Uploaded Apr 3, 2026 Python 3

File details

Details for the file locationformatter-0.1.3-py3-none-any.whl.

File metadata

Download URL: locationformatter-0.1.3-py3-none-any.whl
Upload date: Apr 3, 2026
Size: 10.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for locationformatter-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`974a48d24cb3a148e050aa3e20e8e10160db1f4d06cfd1a0a0ef63494e55f127`
MD5	`afb33106c94a0bc76000afc26cf8b861`
BLAKE2b-256	`cee3b442b0b6440c99dfdb33128b2b4cf1c2e6d96e5ecd3f5900244dd64095b1`

See more details on using hashes here.

locationformatter 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

decide-location-formatter

How it works

Architecture

Entity types (component head)

Evaluation

Installation

From source (recommended during development)

Dependencies only

Usage

Quick start

Multi-location strings

Raw prediction (no housenumber expansion)

One-shot helper

Custom model or device

API reference

LocationFormatter

Helper functions

Output schema

Development

Running tests

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

`LocationFormatter`