Skip to main content

Fuzzily biject people's names between two lists.

Project description

names-matcher

Build Status Code coverage PyPI package

Fuzzily biject people's names between two lists.

Let's define an identity as a series of names belonging to the same person. The algorithm is:

  1. Parse, normalize, and split names in each identity. The result is a set of strings per each.
  2. Define the similarity between identities as max(ratio, token_set_ratio), where ratio
    and token_set_ratio are inspired by string comparison functions from rapidfuzz.
  3. Construct the distance matrix between identities in two specified lists.
  4. Solve the Linear Assignment Problem (LAP) on that matrix.

Our LAP's solution scales up to ~1000-s of identities.

Example:

>>> from names_matcher import NamesMatcher
>>> NamesMatcher()([["Vadim Markovtsev", "vmarkovtsev"], ["Long, Waren", "warenlg"]], \
                    [["Warren"], ["VMarkovtsev"], ["Eiso Kant"]])
(array([1, 0], dtype=int32), array([0.75      , 0.57142857]))

The first resulting tuple element is the mapping indexes: of same length as the first sequence, with indexes in the second sequence. The second element is the corresponding confidence values from 0 to 1.

Installation

pip3 install names-matcher

Command line interface

Given one identity per line in two files, print the matches to standard output:

python3 -m names_matcher path/to/file/1 path/to/file/2

Each identity is several names merged with |, for example:

Vadim Markovtsev|vmarkovtsev|vadim

Contributing

Contributions are very welcome and desired! Please follow the code of conduct and read the contribution guidelines.

License

Apache-2.0, see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

names-matcher-2.0.13.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

names_matcher-2.0.13-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file names-matcher-2.0.13.tar.gz.

File metadata

  • Download URL: names-matcher-2.0.13.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for names-matcher-2.0.13.tar.gz
Algorithm Hash digest
SHA256 de5827b69da14d56b3646309f9e3058f48f3291f758025ed8a11fcf16cc560fb
MD5 27b29c1edfbdc55afe5510cbd6186cfd
BLAKE2b-256 c00a120a8dc9b54d61eab63820cfa0bbe048a36b1f5b6b4d6191f4ea8303b489

See more details on using hashes here.

File details

Details for the file names_matcher-2.0.13-py3-none-any.whl.

File metadata

File hashes

Hashes for names_matcher-2.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 89325e430f669cc140e250ed5d474d703ebc39bdae98ba0ce570ed84f7d37b5f
MD5 2e12a9edeb471b9c318db7033043873e
BLAKE2b-256 c2a0329a53db337032dd1f18d0e464c6b1ccf4bf6a6b74776d1fc3d13b47734c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page