Simple tool to compare almost similar names which are coming from the same source (for example list of all company owners and officers of that company). Helps to cluster together persons with a slight difference in name spelling/typos. Better suited for Cyrillic names, but should work everywhere
Project description
comparator
Simple tool to compare almost similar names which are coming from the same source (for example list of all company owners and officers of that company). Helps to cluster together persons with a slight difference in name spelling/typos. Better suited for Cyrillic names, but should work everywhere.
It’s heuristic based and as any algorithm of such a nature it make errors sometime (see Accuracy section below for details).
Installation
Install from PyPI.
$ pip install names_comparator
Accuracy
You can test your installation on a ground truth data and check confusion matrix by running:
$ python comparator/__init__.py comparator/data/ground_truth.csv
+--------------------+----------+----------+
| | Positive | Negative |
+--------------------+----------+----------+
| Predicted positive | 290 | 9 |
| Predicted negative | 14 | 687 |
+--------------------+----------+----------+
Precision: 0.97
Recall: 0.95
F1 score: 0.96
You can also run it with debug flag to see all the errors algorithm made:
$ python comparator/__init__.py comparator/data/ground_truth.csv yes
Usage
>>> from comparator import full_compare
>>> full_compare("Barack Hussein Obama", "Obama, Barak")
True
>>> full_compare("Петро Мазепа", "Мазепа Петро")
True
>>> full_compare("Марченко Петро Миколайович", "Панченко Петро Миколайович")
False
>>> full_compare("Овдієнко Сергій Костантинович", "Овдієнко Сергій Костянтинович")
True
>>> full_compare("Іванов Михайло Юрійович", "Іванов Юрій Михайлович")
False
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file names_comparator-1.0.0.tar.gz
.
File metadata
- Download URL: names_comparator-1.0.0.tar.gz
- Upload date:
- Size: 30.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4c036215cae08efd07d110b84e01d75b5815ec0cbadde88fc495904b5b7c9b8 |
|
MD5 | ff4c3af4a4c38a588932078663663ffe |
|
BLAKE2b-256 | 7106d29a804841b19961339ca6c2d86ae85208d83f5fe2c93ff612097e4a0b80 |