Skip to main content

A scoring plugin for csv-reconcile using fingerprint clustering.

Project description

csv-reconcile-fingerprint

PyPI Tests Changelog License

A scoring plugin for csv-reconcile using fingerprint clustering. It generates a fingerprint of the input string by normalizing, removing punctuation, and sorting unique tokens. Based on the OpenRefine clustering implementation https://openrefine.org/docs/technical-reference/clustering-in-depth and code from this gist by @pietz.

The resulting strings are compared with Jaccard distance to output a score between 0 and 100.

Installation and Usage

Install this library using pip:

pip install csv-reconcile

This a plugin to the csv reconciliation plugin. So you just have to install csv reconcile package and specify the scorer with '--scorer fingerprint' when initiating the reconciliation service.

Development

To contribute to this library, first checkout the code. Then create a new virtual environment:

cd csv-reconcile-fingerprint
python -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

python -m pip install -e '.[test]'

To run the tests:

python -m pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv_reconcile_fingerprint-0.1.8.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csv_reconcile_fingerprint-0.1.8-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file csv_reconcile_fingerprint-0.1.8.tar.gz.

File metadata

File hashes

Hashes for csv_reconcile_fingerprint-0.1.8.tar.gz
Algorithm Hash digest
SHA256 08db829e922344f121cb715b33760e6d39881b80777a34d0c8f4df6c3939618d
MD5 da7755fab991ed01574e4de22f003a9b
BLAKE2b-256 12dac60d12d470c6023356b776d167ac43bb9dc2a84b169fc0253379a839d24e

See more details on using hashes here.

Provenance

The following attestation bundles were made for csv_reconcile_fingerprint-0.1.8.tar.gz:

Publisher: publish.yml on cutterkom/csv-reconcile-fingerprint

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file csv_reconcile_fingerprint-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for csv_reconcile_fingerprint-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 59b6fad4d7fb4094b38e051e6b2c38964d1456b16837b60f4e3311096fd00d45
MD5 025a3cf84f52cb6cfb5346d1b5259a3a
BLAKE2b-256 e91151d8459b8ccaf6d17fd8d9edbe1e167e445ab4f89a3abbe5dc9b5354128c

See more details on using hashes here.

Provenance

The following attestation bundles were made for csv_reconcile_fingerprint-0.1.8-py3-none-any.whl:

Publisher: publish.yml on cutterkom/csv-reconcile-fingerprint

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page