Skip to main content

Open source geocoding in Python

Project description

Whereabouts

A light-weight, fast geocoder for Python using DuckDB. Try it out online at Huggingface

Description

Whereabouts is an open-source geocoding library for Python, allowing you to geocode and standardize address data all within your own environment:

Features:

  • Two line installation
  • No additional database setup required. Uses DuckDB to run all queries
  • No need to send data to an external geocoding API
  • Fast (Geocode 1000s / sec depending on your setup)
  • Robust to typographical errors

Requirements

  • Python 3.8+
  • requirements.txt (found in repo)

Installation: via PIP

whereabouts can be installed either from this repo using pip / uv / conda

pip install whereabouts

Download a geocoder database or create your own

You will need a geocoding database to match addresses against. You can either download a pre-built database or create your own using a dataset of high quality reference addresses for a given country, state or other geographic region.

Option 1: Download a geocoder database

Pre-built geocoding database are available from Huggingface. The list of available databases can be found here

As an example, to install the small size geocoder database for all of Australia:

python -m whereabouts download au_all_sm

Option 2: Create a geocoder database

Rather than using a pre-built database, you can create your own geocoder database if you have your own address file. This file should be a single csv or parquet file with the following columns:

Column name Description Data type
ADDRESS_DETAIL_PID Unique identifier for address int
ADDRESS_LABEL The full address str
ADDRESS_SITE_NAME Name of the site. This is usually null str
LOCALITY_NAME Name of the suburb or locality str
POSTCODE Postcode of address int
STATE State str
LATITUDE Latitude of geocoded address float
LONGITUDE Longitude of geocoded address float

These fields should be specified in a setup.yml file. Once the setup.yml is created and a reference dataset is available, the geocoding database can be created:

python -m whereabouts setup_geocoder setup.yml

Geocoding examples

Geocode a list of addresses

from whereabouts.Matcher import Matcher

matcher = Matcher(db_name='au_all_sm')
matcher.geocode(addresslist, how='standard')

For more accurate geocoding you can use trigram phrases rather than token phrases. Note you will need one of the large databases to use trigram geocoding.

matcher.geocode(addresslist, how='trigram')

How it works

The algorithm employs simple record linkage techniques, making it suitable for implementation in around 10 lines of SQL. It is based on the following papers

Documentation

Work in progress: https://whereabouts.readthedocs.io/en/latest/

To do:

  • Additional countries (US, NZ, France, UK)
  • Geocode street corners
  • Geocode individual suburb, street name pairs (without house numbers)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whereabouts-0.3.14.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

whereabouts-0.3.14-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file whereabouts-0.3.14.tar.gz.

File metadata

  • Download URL: whereabouts-0.3.14.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.5.0

File hashes

Hashes for whereabouts-0.3.14.tar.gz
Algorithm Hash digest
SHA256 c510b57575a60bada95d4b5bc6032eaa64aee04572c9316eae90e2e13fc799fc
MD5 17b34a0ec6b9f4a7a395f4415fc4388d
BLAKE2b-256 340c71b7399228387dfe66c8db0a57213b1f9dd25244afdd83a1a7f0b277059a

See more details on using hashes here.

File details

Details for the file whereabouts-0.3.14-py3-none-any.whl.

File metadata

  • Download URL: whereabouts-0.3.14-py3-none-any.whl
  • Upload date:
  • Size: 33.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.5.0

File hashes

Hashes for whereabouts-0.3.14-py3-none-any.whl
Algorithm Hash digest
SHA256 bd8f65d4f8608ba136e122b5279f78aa73322fdd9810db72afe7a9fd59c654e5
MD5 82598b81dc8f9a978f8024fc23beb5b0
BLAKE2b-256 80130b3debb7ac8f56220f989c5a9107b9f0d3e52d6649e440b4d4970903d766

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page