Skip to main content

Simple tool to compare almost similar names which are coming from the same source (for example list of all company owners and officers of that company). Helps to cluster together persons with a slight difference in name spelling/typos. Better suited for Cyrillic names, but should work everywhere

Project description

comparator

Simple tool to compare almost similar names which are coming from the same source (for example list of all company owners and officers of that company). Helps to cluster together persons with a slight difference in name spelling/typos. Better suited for Cyrillic names, but should work everywhere.

It’s heuristic based and as any algorithm of such a nature it make errors sometime (see Accuracy section below for details).

Installation

Install from PyPI.

$ pip install names_comparator

Accuracy

You can test your installation on a ground truth data and check confusion matrix by running:

$ python comparator/__init__.py comparator/data/ground_truth.csv
    +--------------------+----------+----------+
    |                    | Positive | Negative |
    +--------------------+----------+----------+
    | Predicted positive |   290    |    9     |
    | Predicted negative |    14    |   687    |
    +--------------------+----------+----------+
    Precision:  0.97
    Recall:  0.95
    F1 score:  0.96

You can also run it with debug flag to see all the errors algorithm made:

$ python comparator/__init__.py comparator/data/ground_truth.csv yes

Usage

>>> from comparator import full_compare
>>> full_compare("Barack Hussein Obama", "Obama, Barak")
True
>>> full_compare("Петро Мазепа", "Мазепа Петро")
True
>>> full_compare("Марченко Петро Миколайович", "Панченко Петро Миколайович")
False
>>> full_compare("Овдієнко Сергій Костантинович", "Овдієнко Сергій Костянтинович")
True
>>> full_compare("Іванов Михайло Юрійович", "Іванов Юрій Михайлович")
False

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

names_comparator-1.0.0.tar.gz (30.5 kB view details)

Uploaded Source

File details

Details for the file names_comparator-1.0.0.tar.gz.

File metadata

  • Download URL: names_comparator-1.0.0.tar.gz
  • Upload date:
  • Size: 30.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.6

File hashes

Hashes for names_comparator-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b4c036215cae08efd07d110b84e01d75b5815ec0cbadde88fc495904b5b7c9b8
MD5 ff4c3af4a4c38a588932078663663ffe
BLAKE2b-256 7106d29a804841b19961339ca6c2d86ae85208d83f5fe2c93ff612097e4a0b80

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page