Skip to main content

Bath parse and geocode addresses from followthemoney entities

Project description

ftm-geocoder

Batch parse and geocode addresses from followthemoney entities. Simply geocoding just address strings works as well, of course.

There are as well some parsing / normalization helpers.

Features

  • Parse/normalize addresses via libpostal
  • Geocoding via geopy
  • Cache geocoding results in a sql database (using dataset)
  • Optional fallback geocoders when preferred geocoder doesn't match
  • Create, update and merge Address entities for ftm data

Usage

command line

ftmgeo --help

The command line interface is designed for piping input / output streams, but for each command a -i <input_file> and -o <output_file> can be used as well.

Geocode an input stream of ftm entities with nominatim and google maps as fallback (geocoders are tried in the given order):

cat entitis.ftm.ijson | ftmgeo geocode -g nominatim -g google > entities_geocoded.ftm.ijson

This looks for the address prop on input entities and creates address entities with reference to the input entities. The output contains all entities from the input stream plus newly created addresses.

If an input entity is itself an Address entity, it will be geocoded as well and their props (country, city, ...) will be merged with the geocoder result.

During the process, addresses are parsed and normalized and looked up in the address cache database before actual geocoding. After geocoding, new addresses are added to the cache database.

Address ids will be rewritten based on normalization (addressEntity refs are updated on other entities), to keep the original ids, add the flag --no-rewrite-ids

Geocoders can be set via GEOCODERS and default to nominatim

More information:

ftmgeo geocode --help

geocoding just address strings

csv format (for all csv input streams) first column address, optional second column country (name or code) and third language for postal context

To ftm address entities:

cat addresses.csv | ftmgeo geocode --input-format=csv > addresses.ftm.ijson

To csv:

cat addresses.csv | ftmgeo geocode --input-format=csv --output-format=csv > addresses.csv

formatting / normalization

Get a cleaned address line from messy input strings.

cat addresses.txt | ftmgeo format-line > clean_addresses.csv

libpostal parsed components

Get a csv with all the parsed components from libpostal.

cat addresses.txt | ftmgeo parse-components > components.csv

mapping

Generate address entities from input stream (without geocoding):

cat entities.ftm.ijson | ftmgeo map > entities.ftm.ijson
cat addresses.csv | ftmgeo map --input-format=csv > addresses.ftm.ijson

database cache

ftmgeo cache --help

During geocoding, addresses are first looked up in the local cache, and new geocoding results are added.

Ignore cache during geocoding (new results are still written to it):

ftmgeo geocode --no-cache ...

Export cache:

ftmgeo cache iterate > geocoded_addresses.ftm.ijsonl
ftmgeo cache iterate --output-format=csv > geocoded_addresses.csv

Populate cache:

csv input: address_id,canonical_id,original_line,result_line,country,lat,lon,geocoder,geocoder_place_id

optional field: geocoder_raw - json of geocoder response

cat geocoded_addresses.csv | ftmgeo cache populate

apply cache / re-geocode

ftmgeo cache apply-csv --help

To get addresses from cache without geocoding from a csv input stream, passing through additional csv data from input:

cat addresses.csv | ftmgeo cache apply-csv --output-format csv > results.csv

Only get missing to re-geocode (e.g. with a different geocoder):

cat addresses.csv | ftmgeo cache apply-csv --output-format csv --get-missing | ftmgeo geocode

Configuration

geocoders

Default geocoders: env var GEOCODERS They are used in the given order

Make sure to configure the geocoders as needed for geopy (endpoints, api keys, ...):

export FTMGEO_<GEOCODERNAME>_<SETTING>=...

Persistent cache

The cache database is set via FTM_STORE_URI (so it is the same as the ftm store, if any, otherwise it defaults to sqlite:///cache.db

Installation

Required external is libpostal, see installation instructions there.

Once libpostal is installed on your system, you can install:

pip install ftm-geocoder[postal]

Verify that this works without errors:

ftmgeo --help

echo "Cowley Road, Cambridge, UK" | ftmgeo geocode --input-format=csv --no-header

Testing

make install
make test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ftm-geocode-0.0.11.tar.gz (21.2 MB view details)

Uploaded Source

Built Distribution

ftm_geocode-0.0.11-py3-none-any.whl (30.5 kB view details)

Uploaded Python 3

File details

Details for the file ftm-geocode-0.0.11.tar.gz.

File metadata

  • Download URL: ftm-geocode-0.0.11.tar.gz
  • Upload date:
  • Size: 21.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for ftm-geocode-0.0.11.tar.gz
Algorithm Hash digest
SHA256 da1567a140cbcf3d57eb24fbd584fc7e4fa9ea10d0ffd6cbf8770d2d56e7b171
MD5 aa5e92b6d1fa7c9a9dacde0a2c088b8d
BLAKE2b-256 ee49eb54285d3ccfd56a42237624a55277a19f8f99bc667291f833918b1d76b6

See more details on using hashes here.

File details

Details for the file ftm_geocode-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: ftm_geocode-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for ftm_geocode-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 0f4b1d1463bbe4b836974a989dab8090359d5591e2e5f6ddcefba701d5116e66
MD5 6e30a866030b066c203d1c1e3627f2f0
BLAKE2b-256 f56a500302810cb8309139192cee4e7486850246af68aa9c49bb2b3e09bb2acd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page