An open-source tool for linking free-text addresses to UPRN
Project description
FLAP
FLAP is an open-source tool for linking free-text addresses to Ordinance Survey Unique Property Reference Number (OS UPRN). You need to have a licence of OS UPRN and download the address premium product to use FLAP FLAP can be used at scale with a few lines of syntax.
Quick start of FLAP tool
Installation
We recommend you to create a virtual environment with venv
.
python3 -m venv [YOUR_PATH]/flap_lite
source [YOUR_PATH]/flap_lite/bin/activate
Install with pip
:
pip install --upgrade flap-lite
For now, please contact the developer for downloading the trained model. Copy the model to
[YOUR_PATH]/flap_lite/lib/python3.9/site-packages/flap/model/
cp [PATH_TO_MODEL_FILE] [YOUR_PATH]/flap_lite/lib/python3.9/site-packages/flap/model/
Quick Start
Building the database
Use flap.create_database
for building the database.
from flap import create_database
create_database(db_path=[PATH_FOR_THE_DB], raw_db_file=[PATH_TO_DB_ZIP])
Matching
Use flap.match
for matching address to database
from flap import match
input_csv = '[PATH_TO_INPUT_CSV_FILE]'
db_path = '[PATH_TO_THE_DB]'
results = match(
input_csv=input_csv,
db_path=db_path
)
Matching results will be saved to [$pwd]/output.csv
by default. By default, FLAP uses all available CPUs and
process the addresses in batches of 10,000.
Some useful options are:
batch_size
for number of addresses in each batchmax_workers
for CPU cores usedin_memory_db
for if in-memory SQLite is used
How does it work?
Briefly, FLAP parses the structured parts of addresses (e.g. POSTCODE "AB12 3CD"). And all the deterministic parts (e.g. numbers "111", letters "A")
An SQL query is made based on the parsed fields to narrow down to a few rows in the database.
select * from indexed where POSTCODE='AB12 3CD'
Features are generated:
- For the deterministic parts: pairwise comparison to see if equal
- Linear assignment alignment for the textual parts
- For postcode: comparison to see if parts are equal
A trained Random Forest Classifier predict a score based on the generated feature. The address with best score is deemed as a match.
The above is a simplified description.
Coming soon
- Command Line Interface
- More documentation
- Dummy database for trying it out with an example notebook
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file flap-lite-0.6.19.tar.gz
.
File metadata
- Download URL: flap-lite-0.6.19.tar.gz
- Upload date:
- Size: 77.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 352edc50dea145c1edc3b14244e3dd2a047265231d36fce7fd70b44d9d907eb1 |
|
MD5 | 920de6557b2140d4e2812530eed8729e |
|
BLAKE2b-256 | dfe473e0ffa5d2b50d4650dd0bd7b87eac3407756119017c073d7da66dcc30a4 |
File details
Details for the file flap_lite-0.6.19-py3-none-any.whl
.
File metadata
- Download URL: flap_lite-0.6.19-py3-none-any.whl
- Upload date:
- Size: 84.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d785d57a63168157a691c5f6e35446d4f092da1fef164992bae1d386710edff |
|
MD5 | 8380fd2182a5c5130ed9b3d5097d5bf7 |
|
BLAKE2b-256 | 44dfe4507f0c9b35f3aa7b2a2ad0da57e3bd8590813a0b35cf42db2239a9cac1 |