Open source geocoding in Python
Project description
Whereabouts
Fast, scalable geocoding for Python using DuckDB. The geocoding algorithms are based on the following papers:
Description
Geocode addresses and reverse geocode coordinates directly from Python in your own environment.
- No additional database setup required. Uses DuckDB to run all queries
- No need to send data to an external geocoding API
- Fast (Geocode 1000s / sec and reverse geocode 200,000s / sec)
- Robust to typographical errors
Requirements
- Python 3.8+
- Poetry (for package management)
Installation
Once Poetry is installed and you are in the project directory:
poetry shell
poetry install
Create a geocoder database
To start geocoding, a geocoding database has to be created, which uses a reference dataset containing addresses and corresponding latitude, longitude values.
The reference file should be a single csv file with at least three fields: the complete address, latitude, longitude. These fields should be specified in a setup.yml
file. An example is included.
Once the setup.yml
is created and a reference dataset is available, the geocoding database can be created using the setup_geocoder
function from whereabouts.utils.
The current process for using Australian data from the GNAF is as follows:
- Download the latest version of GNAF core from https://geoscape.com.au/data/g-naf-core/
- Update the
setup.yml
file to point to the location of the GNAF core file - Finally, setup the geocoder. This creates the required reference tables
python -m whereabouts setup_geocoder setup.yml
To use address data from another country, the file should have the following columns:
Column name | Description |
---|---|
ADDRESS_DETAIL_PID | Unique identifier for address |
ADDRESS_LABEL | The full address |
ADDRESS_SITE_NAME | Name of the site. This is usually null |
LOCALITY_NAME | Name of the suburb or locality |
POSTCODE | Postcode of address |
STATE | State |
LATITUDE | Latitude of geocoded address |
LONGITUDE | Longitude of geocoded address |
Examples
Geocode a list of addresses
from whereabouts.Matcher import Matcher
matcher = Matcher(db_name='gnaf_au')
matcher.geocode(addresslist, how='standard')
For more accurate geocoding you can use trigram phrases rather than token phrases (note that the trigram option has to have been specified in the setup.yml file as part of the setup)
matcher.geocode(addresslist, how='trigram')
Once a Matcher object is created, the KD-tree for fast geocoding will also be created. A list of latitude, longitude values can then be reverse geocoded as follows
matcher.reverse_geocode(coordinates)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file whereabouts-0.3.7.tar.gz
.
File metadata
- Download URL: whereabouts-0.3.7.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.12.2 Darwin/23.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4d0c25606ef5cd14dd0be66df879ace1bdf5fba3a84bb27c4f98f889895c388 |
|
MD5 | 184419593f99eb8ac3381a4f3b1f5359 |
|
BLAKE2b-256 | ceab631f6b54fe49cd120463dce79e08d2440830aa3fbe151fed1dc085f7b155 |
File details
Details for the file whereabouts-0.3.7-py3-none-any.whl
.
File metadata
- Download URL: whereabouts-0.3.7-py3-none-any.whl
- Upload date:
- Size: 27.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.12.2 Darwin/23.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33540bc5a08c9ee445e14c18e1ef68f72be5cbb147f56d0785768a66f931c0cc |
|
MD5 | 7657c226b46dd9c5f6ae78474de13981 |
|
BLAKE2b-256 | 2bfdcdcef8f687750c27f38d9b7dbc3f9de26127656e8be2ac2c9abc933ce998 |