A super-fast canonical name lookup service
Project description
juditha
A super-fast lookup service for canonical names based on tantivy.
juditha wants to solve the noise/garbage problem occurring when working with Named Entity Recognition. Given the availability of huge lists of known names, such as company registries or lists of persons of interest, one could canonize ner-results against this service to check if they are known.
The implementation uses a pre-populated tantivy index. Data is either FollowTheMoney entities or simply list of names.
quickstart
pip install juditha
populate
echo "Jane Doe\nAlice" | juditha load-names
lookup
juditha lookup "jane doe"
"Jane Doe"
To match more fuzzy, reduce the threshold (default 0.97):
juditha lookup "doe, jane" --threshold 0.5
"Jane Doe"
data import
from ftm entities
cat entities.ftm.json | juditha load-entities
juditha build
from anywhere
juditha load-names -i s3://my_bucket/names.txt
juditha load-entities -i https://data.ftm.store/eu_authorities/entities.ftm.json
juditha build
a complete dataset or catalog
Following the nomenklatura specification, a dataset json config needs names.txt or entities.ftm.json in its resources.
juditha load-dataset https://data.ftm.store/eu_authorities/index.json
juditha load-catalog https://data.ftm.store/investigraph/catalog.json
juditha build
use in python applications
from juditha import lookup
assert lookup("jane doe") == "Jane Doe"
assert lookup("doe, jane") is None
assert lookup("doe, jane", threshold=0.5) == "Jane Doe"
the name
Juditha Dommer was the daughter of a coppersmith and raised seven children, while her husband Johann Pachelbel wrote a canon.
Versioning
To mark the compatibility with followthemoney, juditha follows the same major version, which is currently 4.x.x.
License and Copyright
juditha, (C) 2024 investigativedata.io
juditha, (C) 2025 Data and Research Center – DARC
juditha is licensed under the AGPLv3 or later license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file juditha-4.3.0.tar.gz.
File metadata
- Download URL: juditha-4.3.0.tar.gz
- Upload date:
- Size: 28.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.13.5 Linux/6.12.63+deb13-amd64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f812b69e57edaa60908fdf0bc02711a17a1d4a8300f669b82bd7974f28052ea
|
|
| MD5 |
1bfa90a1c9a35f18a7f2631024b4aeca
|
|
| BLAKE2b-256 |
c29655b4e07e6e79e2322112dfe71e441b206268a00c7b83cbaf70c62355904b
|
File details
Details for the file juditha-4.3.0-py3-none-any.whl.
File metadata
- Download URL: juditha-4.3.0-py3-none-any.whl
- Upload date:
- Size: 31.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.13.5 Linux/6.12.63+deb13-amd64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e86021ac6f96e58bb3f7dda07c039418b7e09e594ae97516bfd1c584fbc6199
|
|
| MD5 |
216913f66fec32e5d06243d7723c5b68
|
|
| BLAKE2b-256 |
41d9e78f930441c2a0d242642a277dac268e641d46751101ab961e6370354a06
|