Skip to main content

A super-fast canonical name lookup service

Project description

juditha on pypi PyPI Downloads PyPI - Python Version Python test and package pre-commit Coverage Status AGPLv3+ License Pydantic v2

juditha

A super-fast lookup service for canonical names based on tantivy.

juditha wants to solve the noise/garbage problem occurring when working with Named Entity Recognition. Given the availability of huge lists of known names, such as company registries or lists of persons of interest, one could canonize ner-results against this service to check if they are known.

The implementation uses a pre-populated tantivy index. Data is either FollowTheMoney entities or simply list of names.

quickstart

pip install juditha

populate

echo "Jane Doe\nAlice" | juditha load-names

lookup

juditha lookup "jane doe"
"Jane Doe"

To match more fuzzy, reduce the threshold (default 0.97):

juditha lookup "doe, jane" --threshold 0.5
"Jane Doe"

data import

from ftm entities

cat entities.ftm.json | juditha load-entities
juditha build

from anywhere

juditha load-names -i s3://my_bucket/names.txt
juditha load-entities -i https://data.ftm.store/eu_authorities/entities.ftm.json
juditha build

a complete dataset or catalog

Following the nomenklatura specification, a dataset json config needs names.txt or entities.ftm.json in its resources.

juditha load-dataset https://data.ftm.store/eu_authorities/index.json
juditha load-catalog https://data.ftm.store/investigraph/catalog.json
juditha build

use in python applications

from juditha import lookup

assert lookup("jane doe") == "Jane Doe"
assert lookup("doe, jane") is None
assert lookup("doe, jane", threshold=0.5) == "Jane Doe"

the name

Juditha Dommer was the daughter of a coppersmith and raised seven children, while her husband Johann Pachelbel wrote a canon.

Versioning

To mark the compatibility with followthemoney, juditha follows the same major version, which is currently 4.x.x.

License and Copyright

juditha, (C) 2024 investigativedata.io

juditha, (C) 2025 Data and Research Center – DARC

juditha is licensed under the AGPLv3 or later license.

see NOTICE and LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

juditha-4.3.0.tar.gz (28.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

juditha-4.3.0-py3-none-any.whl (31.7 kB view details)

Uploaded Python 3

File details

Details for the file juditha-4.3.0.tar.gz.

File metadata

  • Download URL: juditha-4.3.0.tar.gz
  • Upload date:
  • Size: 28.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.13.5 Linux/6.12.63+deb13-amd64

File hashes

Hashes for juditha-4.3.0.tar.gz
Algorithm Hash digest
SHA256 3f812b69e57edaa60908fdf0bc02711a17a1d4a8300f669b82bd7974f28052ea
MD5 1bfa90a1c9a35f18a7f2631024b4aeca
BLAKE2b-256 c29655b4e07e6e79e2322112dfe71e441b206268a00c7b83cbaf70c62355904b

See more details on using hashes here.

File details

Details for the file juditha-4.3.0-py3-none-any.whl.

File metadata

  • Download URL: juditha-4.3.0-py3-none-any.whl
  • Upload date:
  • Size: 31.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.13.5 Linux/6.12.63+deb13-amd64

File hashes

Hashes for juditha-4.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4e86021ac6f96e58bb3f7dda07c039418b7e09e594ae97516bfd1c584fbc6199
MD5 216913f66fec32e5d06243d7723c5b68
BLAKE2b-256 41d9e78f930441c2a0d242642a277dac268e641d46751101ab961e6370354a06

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page