UK address utility based on machine learning and optimised search to parse, standardise, and compare addresses.
Project description
ukaddresskit
UK address utility based on machine learning and optimised search to parse, standardise, and compare addresses.
Address NER tagger is trained using crfsuite with help of 2 million uk housing addresses.
Install - alpha stage
pip install ukaddresskit
Quick Start
Tagger
from ukaddresskit.parser import tag
print(tag("10 Downing Street SW1A 2AA"))
Output
{'BuildingNumber': '10', 'Locality': 'DOWNING', 'TownName': 'STREET', 'Postcode': 'SW1A 2AA'}
Postcode Helpers
from ukaddresskit.postcode import *
normalize_postcode("sw1a2aa") # "SW1A 2AA"
get_town("SW1A 2AA") # "LONDON"
get_county("SW1A 2AA") # "Greater London" (if in mapping)
get_county("SW1A 2AA")
get_locality(postcode: str)
get_streets(postcode: str)
get_property_mix(postcode: str) -> Dict[str, float]
---
from ukaddresskit.locality import *
get_town_by_locality("Ab Kettleby") -> "MELTON MOWBRAY"
get_town_by_locality("Abberton", ambiguity="all") -> ["COLCHESTER", "PERSHORE"]
list_towns_for_locality("Abberton") -> ["COLCHESTER", "PERSHORE"]
Todo
- Add outcode_to_county.csv into lookups
- Fix bugs in library not loading on Colab
- Create postcode fill utility
- get_town(postcode)
- get_county(postcode)
- get_locality(postcode)
- get_streets(postcode) → array of street names
- get_property_mix(postcode)
- add test cases
- Create address populate utility (add missing address components - town, county, etc)
- Create address linkage utility / comparing
- Define test cases, organise code
- Improve machine learning models
- Create .parquet sqlite storage, indexes for optimal searches
- Create online docs
- Improve Address Parser
AddressParser (Pre & Post processing -- needs testing)
import pandas as pd
from ukaddresskit.pipeline import AddressParser
ap = AddressParser()
df = pd.DataFrame({"ADDRESS": [
"Flat 2, 10 Queen Street, Bury BL8 1JG",
]})
out = ap.parse(df)
fields = [
"SubBuildingName", "BuildingName", "BuildingNumber",
"StreetName", "Locality", "TownName", "Postcode", "County",
"PAOstartNumber", "PAOendNumber", "PAOstartSuffix", "PAOendSuffix",
"SAOStartNumber", "SAOEndNumber", "SAOStartSuffix", "SAOEndSuffix",
]
for i, row in out.iterrows():
print(f"\nAddress #{i}")
for col in fields:
val = row.get(col)
if pd.notna(val) and str(val) != "":
print(f" {col:16} {val}")
Output
Address #0
SubBuildingName FLAT 2
BuildingNumber 10
StreetName QUEEN STREET
TownName BURY
Postcode BL81JG
PAOstartNumber 10.0
SAOStartNumber 2
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ukaddresskit-0.0.5.tar.gz.
File metadata
- Download URL: ukaddresskit-0.0.5.tar.gz
- Upload date:
- Size: 31.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e58e7891fe819a1c4deb6e28448a46325c0bd0bd7cc7e601941a7549224e2e8
|
|
| MD5 |
89786d353bff7b85f99fb808aac5163c
|
|
| BLAKE2b-256 |
ffc5e666654e7edd534afbe846c59a31dcd292cd405d2c84b2ffb26be7ae31a7
|
File details
Details for the file ukaddresskit-0.0.5-py3-none-any.whl.
File metadata
- Download URL: ukaddresskit-0.0.5-py3-none-any.whl
- Upload date:
- Size: 32.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf9ea292d25b3da72aea144dcd994d0d24561225da252c020bed6454ba501ab7
|
|
| MD5 |
648066c7411e4a90b2748c24a5df9b78
|
|
| BLAKE2b-256 |
cabc654f93e9fe13dc0cda12596630b01493229532522796e1cc9d91c69f1d26
|