Skip to main content

Geo distance scoring plugin for csv-reconcile

Project description

CSV Reconcile Geo distance scoring plugin

A scoring plugin for csv-reconcile using geodesic distance. See csv-reconcile for details.

Reconciliation

This plugin is used to reconcile values representing points on the globe. It expects those values to be in well-known text format for a point. That is, like so: POINT( longitude latitude ).

The pre-processor automatically strips off literal datatypes when present as well as double quotes.

The CSV column to be reconciled needs to be in the same format. In addition, there must be at most one instance of any id column. For instance, if reconciling against coordinate location for a wikidata item, there must be at most one location per item.

Scoring

The scoring used is more or less arbitrary but has the following properties:

  • The highest score is 100 and occurs when the distance to the reconciliation candidate is zero
  • The lower the score the greater the distance to the reconciliation candidate
  • The score is scaled so that a distance of 10km yields a score of 50

Configuration

The plugin can be controlled via SCOREOPTIONS in the csv-reconcile --config file. SCOREOPTIONS is a Python dictionary and thus has the following form SCOREOPTIONS={ "key1":"value1,"key2":"value2"}.

  • SCALE set distance in kilometers at which a score of 50 occurs. ( Default 10km ) e.g. "SCALE":2
  • COORDRANGE If supplied do a precheck that both the latitude and the longitude of the compared values are within range. This is for performance to avoid the more expensive distance calculation for points farther apart. e.g. "COORDRANGE":"1"

Future enhancements

Some of the current implementation was driven by the current design of csv-reconcile. Both may be updated to accommodate the following:

  • Allow for separate latitude and longitude column in the CSV file
  • Add some scoring options such as the following:
    • Allow for overriding the scaling function
    • etc.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv-reconcile-geo-0.1.7.tar.gz (4.4 kB view details)

Uploaded Source

File details

Details for the file csv-reconcile-geo-0.1.7.tar.gz.

File metadata

  • Download URL: csv-reconcile-geo-0.1.7.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.8.10 Darwin/20.2.0

File hashes

Hashes for csv-reconcile-geo-0.1.7.tar.gz
Algorithm Hash digest
SHA256 c53dc3801c28852830d9bec9bef73b127d21ced92e5e2125ab5cb10ad81dfc0d
MD5 58a88970d38a473e2e26941211c4a0f5
BLAKE2b-256 5f13cd4679b2ed5ecadef3e2ac3d8bd727fb9931b416957bc1548f4e6237682e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page