A library to generate entity fingerprints.
Project description
fingerprints
UPDATE 2025-05: the next generation of the fingerprints codebase is now included in rigour. See the documentation here: https://opensanctions.github.io/rigour/names/. This library is now UNMAINTAINED.
This library helps with the generation of fingerprints for entity data. A fingerprint in this context is understood as a simplified entity identifier, derived from it's name or address and used for cross-referencing of entity across different datasets.
Usage
import fingerprints
fp = fingerprints.generate('Mr. Sherlock Holmes')
assert fp == 'holmes sherlock'
fp = fingerprints.generate('Siemens Aktiengesellschaft')
assert fp == 'ag siemens'
fp = fingerprints.generate('New York, New York')
assert fp == 'new york'
Company type names
A significant part of what fingerprints does it to recognize company legal form
names. For example, fingerprints will be able to simplify Общество с ограниченной ответственностью to ООО, or Aktiengesellschaft to AG. The required database
is based on two different sources:
- A Google Spreadsheet created by OCCRP.
- The ISO 20275: Entity Legal Forms Code List
Wikipedia also maintains an index of types of business entity.
See also
- Clustering in Depth, part of the OpenRefine documentation discussing how to create collisions in data clustering.
- probablepeople, parser for western names made by the brilliant folks at datamade.us.
- The study Developing a Legal Form Classification and Extraction Approach For Company Entity Matching by Kruse et al. (2021) investigates four approaches for identifying and classifying legal forms in company names.
- List of Legal Forms from AnaCredit dataset by ECB (one of the Annexes).
- Transformer-based Entity Legal Form Classification by Arimond et al. (2023).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fingerprints-1.3.1-py3-none-any.whl.
File metadata
- Download URL: fingerprints-1.3.1-py3-none-any.whl
- Upload date:
- Size: 25.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb246a3e2730689a494f1239a8418e8df98419bb4c2bfed25925f2c624b523c0
|
|
| MD5 |
7ce1cde4d5f70175f766fe55c8d4ba61
|
|
| BLAKE2b-256 |
ff298998d631aae6b9940af9248fbb9d269491785055d6f1763f224af0e6cc02
|
Provenance
The following attestation bundles were made for fingerprints-1.3.1-py3-none-any.whl:
Publisher:
build.yml on opensanctions/fingerprints
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fingerprints-1.3.1-py3-none-any.whl -
Subject digest:
eb246a3e2730689a494f1239a8418e8df98419bb4c2bfed25925f2c624b523c0 - Sigstore transparency entry: 339427119
- Sigstore integration time:
-
Permalink:
opensanctions/fingerprints@4e0126d8380a21b59736700b3c2b87e6f7a7ece2 -
Branch / Tag:
refs/tags/1.3.1 - Owner: https://github.com/opensanctions
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build.yml@4e0126d8380a21b59736700b3c2b87e6f7a7ece2 -
Trigger Event:
push
-
Statement type: