Fuzzily biject people's names between two lists.
Project description
names-matcher
Fuzzily biject people's names between two lists.
Let's define an identity as a series of names belonging to the same person. The algorithm is:
- Parse, normalize, and split names in each identity. The result is a set of strings per each.
- Define the similarity between identities as
max(ratio, token_set_ratio)
, whereratio
andtoken_set_ratio
are inspired by string comparison functions from rapidfuzz. - Construct the distance matrix between identities in two specified lists.
- Solve the Linear Assignment Problem (LAP) on that matrix.
Our LAP's solution scales up to ~1000-s of identities.
Example:
>>> from names_matcher import NamesMatcher
>>> NamesMatcher()([["Vadim Markovtsev", "vmarkovtsev"], ["Long, Waren", "warenlg"]], \
[["Warren"], ["VMarkovtsev"], ["Eiso Kant"]])
(array([1, 0], dtype=int32), array([0.75 , 0.57142857]))
The first resulting tuple element is the mapping indexes: of same length as the first sequence, with indexes in the second sequence. The second element is the corresponding confidence values from 0 to 1.
Installation
pip3 install names-matcher
Command line interface
Given one identity per line in two files, print the matches to standard output:
python3 -m names_matcher path/to/file/1 path/to/file/2
Each identity is several names merged with |
, for example:
Vadim Markovtsev|vmarkovtsev|vadim
Contributing
Contributions are very welcome and desired! Please follow the code of conduct and read the contribution guidelines.
License
Apache-2.0, see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file names-matcher-2.0.13.tar.gz
.
File metadata
- Download URL: names-matcher-2.0.13.tar.gz
- Upload date:
- Size: 11.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | de5827b69da14d56b3646309f9e3058f48f3291f758025ed8a11fcf16cc560fb |
|
MD5 | 27b29c1edfbdc55afe5510cbd6186cfd |
|
BLAKE2b-256 | c00a120a8dc9b54d61eab63820cfa0bbe048a36b1f5b6b4d6191f4ea8303b489 |
File details
Details for the file names_matcher-2.0.13-py3-none-any.whl
.
File metadata
- Download URL: names_matcher-2.0.13-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89325e430f669cc140e250ed5d474d703ebc39bdae98ba0ce570ed84f7d37b5f |
|
MD5 | 2e12a9edeb471b9c318db7033043873e |
|
BLAKE2b-256 | c2a0329a53db337032dd1f18d0e464c6b1ccf4bf6a6b74776d1fc3d13b47734c |