Skip to main content

UK address utility based on machine learning and optimised search to parse, standardise, and compare addresses.

Project description

ukaddresskit

CI PyPI version Downloads

UK address utility based on machine learning and optimised search to parse, standardise, and compare addresses.

Address NER tagger is trained using crfsuite with help of 2 million uk housing addresses.

Install - alpha stage

pip install ukaddresskit

Quick Start

Tagger

from ukaddresskit.parser import tag

print(tag("10 Downing Street SW1A 2AA"))

Output

{'BuildingNumber': '10', 'Locality': 'DOWNING', 'TownName': 'STREET', 'Postcode': 'SW1A 2AA'}

Postcode Helpers

from ukaddresskit.postcode import *

normalize_postcode("sw1a2aa")  # "SW1A 2AA"
get_town("SW1A 2AA")      # "LONDON"
get_county("SW1A 2AA")         # "Greater London" (if in mapping)
get_county("SW1A 2AA") 
get_locality(postcode: str)
get_streets(postcode: str)
get_property_mix(postcode: str) -> Dict[str, float]

---

from ukaddresskit.locality import *

get_town_by_locality("Ab Kettleby")                 -> "MELTON MOWBRAY"
get_town_by_locality("Abberton", ambiguity="all")   -> ["COLCHESTER", "PERSHORE"]
list_towns_for_locality("Abberton")                 -> ["COLCHESTER", "PERSHORE"]

Todo

  • Add outcode_to_county.csv into lookups
  • Fix bugs in library not loading on Colab
  • Create postcode fill utility
    • get_town(postcode)
    • get_county(postcode)
    • get_locality(postcode)
    • get_streets(postcode) → array of street names
    • get_property_mix(postcode)
    • add test cases
  • Create address populate utility (add missing address components - town, county, etc)
  • Create address linkage utility / comparing
  • Define test cases, organise code
  • Improve machine learning models
  • Create .parquet sqlite storage, indexes for optimal searches
  • Create online docs
  • Improve Address Parser

AddressParser (Pre & Post processing -- needs testing)

import pandas as pd
from ukaddresskit.pipeline import AddressParser

ap = AddressParser()
df = pd.DataFrame({"ADDRESS": [
    "Flat 2, 10 Queen Street, Bury BL8 1JG",
]})
out = ap.parse(df)
fields = [
    "SubBuildingName", "BuildingName", "BuildingNumber",
    "StreetName", "Locality", "TownName", "Postcode", "County",
    "PAOstartNumber", "PAOendNumber", "PAOstartSuffix", "PAOendSuffix",
    "SAOStartNumber", "SAOEndNumber", "SAOStartSuffix", "SAOEndSuffix",
]

for i, row in out.iterrows():
    print(f"\nAddress #{i}")
    for col in fields:
        val = row.get(col)
        if pd.notna(val) and str(val) != "":
            print(f"  {col:16} {val}")

Output

Address #0
  SubBuildingName  FLAT 2
  BuildingNumber   10
  StreetName       QUEEN STREET
  TownName         BURY
  Postcode         BL81JG
  PAOstartNumber   10.0
  SAOStartNumber   2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ukaddresskit-0.0.5.tar.gz (31.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ukaddresskit-0.0.5-py3-none-any.whl (32.0 MB view details)

Uploaded Python 3

File details

Details for the file ukaddresskit-0.0.5.tar.gz.

File metadata

  • Download URL: ukaddresskit-0.0.5.tar.gz
  • Upload date:
  • Size: 31.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for ukaddresskit-0.0.5.tar.gz
Algorithm Hash digest
SHA256 2e58e7891fe819a1c4deb6e28448a46325c0bd0bd7cc7e601941a7549224e2e8
MD5 89786d353bff7b85f99fb808aac5163c
BLAKE2b-256 ffc5e666654e7edd534afbe846c59a31dcd292cd405d2c84b2ffb26be7ae31a7

See more details on using hashes here.

File details

Details for the file ukaddresskit-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: ukaddresskit-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 32.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for ukaddresskit-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 bf9ea292d25b3da72aea144dcd994d0d24561225da252c020bed6454ba501ab7
MD5 648066c7411e4a90b2748c24a5df9b78
BLAKE2b-256 cabc654f93e9fe13dc0cda12596630b01493229532522796e1cc9d91c69f1d26

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page