Make record linkages in followthemoney data.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

opensanctions pudo

These details have not been verified by PyPI

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.11
- Python :: 3.12

Project description

nomenklatura

Nomenklatura de-duplicates and integrates different Follow the Money entities. It serves to clean up messy data and to find links between different datasets.

screenshot

Usage

You can install nomenklatura via PyPI:

$ pip install nomenklatura

Command-line usage

Much of the functionality of nomenklatura can be used as a command-line tool. In the following example, we'll assume that you have a file containing Follow the Money entities in your local directory, named entities.ijson. If you just want try it out, you can use the file tests/fixtures/donations.ijson in this repository for testing (it contains German campaign finance data).

With the file in place, you will cross-reference the entities to generate de-duplication candidates, then run the interactive de-duplication UI in your console, and eventually apply the judgements to generate a new file with merged entities:

# generate merge candidates using an in-memory index:
$ nomenklatura xref entities.ijson
# note there is now a sqlite database, `nomenklatura.db` that contains de-duplication info.
$ nomenklatura dedupe entities.ijson
# will pop up a user interface.
$ nomenklatura apply entities.ijson -o merged.ijson
# de-duplicated data goes into `merged.ijson`:
$ cat entities.ijson | wc -l 
474
$ cat merged.ijson | wc -l 
468

The resolver graph database location can be customised by setting the environment variable NOMENKLATURA_DB_URL

Programmatic usage

The command-line use of nomenklatura is targeted at small datasets which need to be de-duplicated. For more involved scenarios, the package also offers a Python API which can be used to control the semantics of de-duplication.

nomenklatura.Dataset - implements a basic dataset for describing a set of entities.
nomenklatura.Store - a general purpose access mechanism for entities. By default, a store is used to access entity data stored in files as an in-memory cache, but the store can be subclassed to work with entities from a database system.
nomenklatura.blocker.Index - a cross-reference blocker for correlating entities inside of a dataset, or across different datasets.
nomenklatura.Resolver - the core of the de-duplication process, the resolver is essentially a graph with edges made out of entity judgements. The resolver can be used to store judgements or get the canonical ID for a given entity.

All of the API classes have extensive type annotations, which should make their integration in any modern Python API simpler.

Design

This package offers an implementation of a data deduplication framework centered around the FtM data model. The idea is the following workflow:

Accept FtM-shaped entities from a given source (e.g. a JSON file, or a database)
Build an inverted index of the entities for dedupe blocking
Generate merge candidates using the blocking index and FtM compare
Provide a SQL persistence abstraction for merge challenges and decisions
Provide a text-based user interface to let users make merge decisions
Export consolidated entities that cluster source entity data

The Enrichment framework enables linking entities to records in other data sources, and enriching them with information from those records.

Later on, the following might be added:

A web application to let users make merge decisions on the web

Resolver graph

The key implementation detail of nomenklatura is the Resolver, a graph structure that manages user decisions regarding entity identity. Edges are Judgements of whether two entity IDs are the same, not the same, or undecided. The resolver implements an algorithm for computing connected components, which can the be used to find the best available ID for a cluster of entities. It can also be used to evaluate transitive judgements, e.g. if A <> B, and B = C, then we don't need to ask if A = C.

Reading

Contact, contributions etc.

This codebase is licensed under the terms of an MIT license (see LICENSE).

We're keen for any contributions, bug fixes and feature suggestions, please use the GitHub issue tracker for this repository.

Nomenklatura is currently developed thanks to a Prototypefund grant for OpenSanctions. Previous iterations of the package were developed with support from Knight-Mozilla OpenNews and the Open Knowledge Foundation Labs.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

opensanctions pudo

These details have not been verified by PyPI

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.11
- Python :: 3.12

Release history Release notifications | RSS feed

This version

4.7.0

Feb 26, 2026

4.6.9

Feb 25, 2026

4.6.8

Feb 23, 2026

4.6.7

Feb 17, 2026

4.6.6

Feb 13, 2026

4.6.5

Feb 12, 2026

4.6.4

Feb 11, 2026

4.6.3

Feb 9, 2026

4.6.2

Feb 4, 2026

4.6.1

Jan 28, 2026

4.6.0

Jan 22, 2026

4.5.3

Jan 9, 2026

4.5.2

Jan 6, 2026

4.5.1

Jan 6, 2026

4.5.0

Jan 5, 2026

4.4.3

Dec 23, 2025

4.4.2

Dec 23, 2025

4.4.1

Dec 19, 2025

4.4.0

Dec 19, 2025

4.3.2

Dec 9, 2025

4.3.1

Nov 20, 2025

4.3.0

Nov 17, 2025

4.2.0

Nov 11, 2025

4.1.10

Oct 31, 2025

4.1.9

Oct 1, 2025

4.1.8

Sep 23, 2025

4.1.7

Sep 15, 2025

4.1.6

Sep 14, 2025

4.1.5

Sep 13, 2025

4.1.4

Sep 8, 2025

4.1.3

Sep 8, 2025

4.1.2

Aug 26, 2025

4.1.1

Aug 5, 2025

4.1.0

Jul 27, 2025

4.0.4

Jul 21, 2025

4.0.3

Jul 21, 2025

4.0.2

Jul 21, 2025

4.0.1

Jul 17, 2025

4.0.0

Jul 16, 2025

3.17.5

Jul 11, 2025

3.17.4

Jun 17, 2025

3.17.3

Apr 24, 2025

3.17.2

Apr 2, 2025

3.17.1

Mar 25, 2025

3.17.0

Mar 24, 2025

3.16.3

Mar 13, 2025

3.16.2

Mar 11, 2025

3.16.1

Mar 9, 2025

3.15.2

Jan 26, 2025

3.15.1

Jan 25, 2025

3.15.0

Jan 25, 2025

3.14.2

Jan 24, 2025

3.14.1

Jan 24, 2025

3.14.0

Nov 26, 2024

3.13.2

Nov 18, 2024

3.13.1

Sep 9, 2024

3.13.0

Aug 2, 2024

3.12.5

Jul 3, 2024

3.12.4

Jul 2, 2024

3.12.3

Jun 21, 2024

3.12.1

Jun 16, 2024

3.12.0

Jun 13, 2024

3.11.5

Jun 7, 2024

3.11.4

Jun 4, 2024

3.11.3

May 28, 2024

3.11.2

May 26, 2024

3.11.1

May 20, 2024

3.10.6

Mar 19, 2024

3.10.5

Feb 27, 2024

3.10.4

Feb 5, 2024

3.10.3

Feb 2, 2024

3.10.2

Jan 30, 2024

3.10.1

Jan 30, 2024

3.9.3

Jan 16, 2024

3.9.2

Jan 14, 2024

3.9.1

Jan 4, 2024

3.9.0

Dec 1, 2023

3.8.5

Nov 16, 2023

3.8.4

Nov 16, 2023

3.8.3

Nov 15, 2023

3.8.2

Nov 15, 2023

3.8.0

Nov 15, 2023

3.7.0

Nov 2, 2023

3.6.9

Oct 7, 2023

3.6.8

Oct 6, 2023

3.6.7

Oct 6, 2023

3.6.6

Oct 6, 2023

3.6.5

Oct 5, 2023

3.6.4

Oct 5, 2023

3.6.3

Oct 4, 2023

3.6.2

Oct 2, 2023

3.6.1

Oct 2, 2023

3.6.0

Sep 29, 2023

3.5.2

Sep 22, 2023

3.5.1

Sep 16, 2023

3.5.0

Sep 16, 2023

3.4.2

Sep 14, 2023

3.4.1

Sep 14, 2023

3.4.0

Aug 23, 2023

3.3.9

Aug 14, 2023

3.3.8

Aug 7, 2023

3.3.7

Aug 3, 2023

3.3.6

Jul 27, 2023

3.3.5

Jul 26, 2023

3.3.4

Jul 21, 2023

3.3.3

Jul 19, 2023

3.3.2

Jul 18, 2023

3.3.1

Jul 15, 2023

3.3.0

Jul 15, 2023

3.2.2

Jul 13, 2023

3.2.1

Jul 11, 2023

3.2.0

Jul 7, 2023

3.1.0

Jul 3, 2023

3.0.3

Jun 29, 2023

3.0.1

Jun 28, 2023

3.0.0

Jun 26, 2023

2.14.1

Jun 18, 2023

2.14.0

Jun 4, 2023

2.13.2

May 30, 2023

2.13.0

May 30, 2023

2.12.0

May 27, 2023

2.11.0

May 6, 2023

2.10.1

May 4, 2023

2.10.0

May 3, 2023

2.9.5

Apr 25, 2023

2.9.4

Apr 19, 2023

2.9.3

Apr 15, 2023

2.9.2

Apr 13, 2023

2.9.1

Mar 28, 2023

2.9.0

Mar 28, 2023

2.8.2

Mar 23, 2023

2.8.1

Mar 4, 2023

2.8.0

Feb 10, 2023

2.7.7

Dec 19, 2022

2.7.6

Dec 19, 2022

2.7.5

Dec 5, 2022

2.7.4

Nov 30, 2022

2.7.3

Nov 28, 2022

2.7.2

Nov 28, 2022

2.7.1

Nov 28, 2022

2.7.0

Nov 28, 2022

2.6.6

Nov 22, 2022

2.6.5

Nov 22, 2022

2.6.4

Nov 8, 2022

2.6.3

Nov 7, 2022

2.6.1

Nov 4, 2022

2.6.0

Nov 1, 2022

2.5.9

Oct 12, 2022

2.5.8

Oct 12, 2022

2.5.7

Sep 6, 2022

2.5.6

Aug 24, 2022

2.5.5

Jul 27, 2022

2.5.4

Jul 27, 2022

2.5.3

Jul 26, 2022

2.5.2

Jul 22, 2022

2.5.1

Jul 6, 2022

2.5.0

Jun 29, 2022

2.4.7

Jun 9, 2022

2.4.6

Jun 8, 2022

2.4.5

Jun 8, 2022

2.4.4

May 31, 2022

2.4.3

May 30, 2022

2.4.2

May 30, 2022

2.4.1

May 29, 2022

2.4.0

May 26, 2022

2.3.1

May 24, 2022

2.3.0

May 17, 2022

2.2.8

May 10, 2022

2.2.7

Apr 20, 2022

2.2.6

Apr 19, 2022

2.2.5

Apr 16, 2022

2.2.4

Apr 13, 2022

2.2.3

Apr 12, 2022

2.2.2

Apr 3, 2022

2.2.1

Mar 29, 2022

2.1.1

Feb 3, 2022

2.1.0

Jan 16, 2022

2.0.1

Jan 13, 2022

2.0.0

Jan 9, 2022

1.4.5

Jan 2, 2022

1.4.4

Dec 27, 2021

1.4.3

Dec 13, 2021

1.4.2

Dec 13, 2021

1.4.1

Nov 8, 2021

1.4.0

Nov 8, 2021

1.3.0

Nov 3, 2021

1.2.5

Oct 29, 2021

1.2.4

Oct 26, 2021

1.2.3

Oct 26, 2021

1.2.2

Oct 21, 2021

1.2.1

Sep 30, 2021

1.2.0

Sep 27, 2021

1.1.2

Sep 25, 2021

1.1.0

Sep 24, 2021

1.0.3

Sep 18, 2021

1.0.1

Sep 16, 2021

1.0.0

Sep 16, 2021

0.1.0

Sep 12, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nomenklatura-4.7.0-py3-none-any.whl (146.2 kB view details)

Uploaded Feb 26, 2026 Python 3

File details

Details for the file nomenklatura-4.7.0-py3-none-any.whl.

File metadata

Download URL: nomenklatura-4.7.0-py3-none-any.whl
Upload date: Feb 26, 2026
Size: 146.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nomenklatura-4.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d50d4f05786c25d0519a8d0a0671763dd4d0f0bd647498841ea366038feefe66`
MD5	`2dad26d478c95b6bcf5493b82579b5b0`
BLAKE2b-256	`574e70190701050ee6e4afc31d112345be4f668b2f65612f5135835138a855ce`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nomenklatura-4.7.0-py3-none-any.whl:

Publisher: build.yml on opensanctions/nomenklatura

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nomenklatura-4.7.0-py3-none-any.whl
- Subject digest: d50d4f05786c25d0519a8d0a0671763dd4d0f0bd647498841ea366038feefe66
- Sigstore transparency entry: 1000170751
- Sigstore integration time: Feb 26, 2026
Source repository:
- Permalink: opensanctions/nomenklatura@f105b92a5981c777ef5ba59954bf4ac05ba13572
- Branch / Tag: refs/tags/4.7.0
- Owner: https://github.com/opensanctions
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: build.yml@f105b92a5981c777ef5ba59954bf4ac05ba13572
- Trigger Event: push

nomenklatura 4.7.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

nomenklatura

Usage

Command-line usage

Programmatic usage

Design

Resolver graph

Reading

Contact, contributions etc.

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

Provenance