Record linkage with dense blocking using text embeddings and LLM matching
Project description
denselinkage
Record linkage with dense blocking using text embeddings and LLM matching.
Status — beta. The dependency-free core is implemented and runs:
link/dedupe/match_pairs, connected-components clustering, and the linkage / blocking / clustering metrics — all on numpy + pandas. The heavy extras (FAISS, sentence-transformers, LangChain) are experimental this release: their adapters are declared but raiseNotImplementedError.
Usage
from denselinkage import DenseLinker, Source, TemplateSerializer
from denselinkage.core.results import LabeledPairs
from denselinkage.metrics import linkage_metrics
linker = DenseLinker.with_defaults() # picks a sensible embedder/index/matcher
left = Source(df_a, id_column="id_a", serializer=TemplateSerializer("Name: {name}, City: {city}"))
right = Source(df_b, id_column="id_b", serializer=TemplateSerializer(
"Name: {name}, City: {city}", column_mapping={"company_name": "name", "headquarters": "city"}))
result = linker.link(left, right) # one call, no fit/predict, no mutation
metrics = linkage_metrics(result, gold=LabeledPairs.from_pairs([("A1", "B1")]))
result.to_frame() # left_id, right_id, match, confidence, reason, similarity
Deduplicate one dataset with linker.dedupe(src); reuse an index with
idx = linker.index(left); idx.query(right). See examples/ —
00_quickstart.py is the shortest path, 01_end_to_end_linkage.py shows full
component control.
Install
pip install denselinkage # core (numpy, pandas)
pip install "denselinkage[faiss]" # + FAISS vector index
pip install "denselinkage[sentence-transformers]"
pip install "denselinkage[langchain]" # + LLM matcher
pip install "denselinkage[all]"
The
[faiss],[sentence-transformers], and[langchain]extras are reserved but experimental this release — their adapters raiseNotImplementedError; the dependency-free core runs without them.
Development
Requires uv.
uv sync --dev
uv run ruff check . && uv run ruff format --check . && uv run mypy && uv run pytest
See CONTRIBUTING.md for details. CI runs lint, format, strict mypy, and tests on Python 3.10–3.13.
Changelog
See CHANGELOG.md.
License
MIT © 2026 Alvaro
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file denselinkage-1.0.0b1.tar.gz.
File metadata
- Download URL: denselinkage-1.0.0b1.tar.gz
- Upload date:
- Size: 26.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8564b523a180e68cd0d50a199f06374044dada7338cef4c14dcfb9fb3809f16
|
|
| MD5 |
ea5d0a9fdaa0b7d5870f8a721f4a7547
|
|
| BLAKE2b-256 |
5fdd92d03f5157d06b064216eca86083edc46dcce64fa30d7e0089bd865c1e12
|
Provenance
The following attestation bundles were made for denselinkage-1.0.0b1.tar.gz:
Publisher:
release.yml on caalvaro/denselinkage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
denselinkage-1.0.0b1.tar.gz -
Subject digest:
e8564b523a180e68cd0d50a199f06374044dada7338cef4c14dcfb9fb3809f16 - Sigstore transparency entry: 1735808025
- Sigstore integration time:
-
Permalink:
caalvaro/denselinkage@659cecd949fb3a61848e14f2fbc26f33ef2c0220 -
Branch / Tag:
refs/tags/v1.0.0b1 - Owner: https://github.com/caalvaro
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@659cecd949fb3a61848e14f2fbc26f33ef2c0220 -
Trigger Event:
push
-
Statement type:
File details
Details for the file denselinkage-1.0.0b1-py3-none-any.whl.
File metadata
- Download URL: denselinkage-1.0.0b1-py3-none-any.whl
- Upload date:
- Size: 46.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8114a4afc3d2fdf45547c23d0f63bc7c803c7f16fe9d0bbe548be3635200ef0
|
|
| MD5 |
0f1110dc8afde6bea27eee41e6f56ab1
|
|
| BLAKE2b-256 |
457d9670d486c30724a39536611f46ad297d5755c3f5e5542a58eff179f7bfa0
|
Provenance
The following attestation bundles were made for denselinkage-1.0.0b1-py3-none-any.whl:
Publisher:
release.yml on caalvaro/denselinkage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
denselinkage-1.0.0b1-py3-none-any.whl -
Subject digest:
a8114a4afc3d2fdf45547c23d0f63bc7c803c7f16fe9d0bbe548be3635200ef0 - Sigstore transparency entry: 1735808154
- Sigstore integration time:
-
Permalink:
caalvaro/denselinkage@659cecd949fb3a61848e14f2fbc26f33ef2c0220 -
Branch / Tag:
refs/tags/v1.0.0b1 - Owner: https://github.com/caalvaro
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@659cecd949fb3a61848e14f2fbc26f33ef2c0220 -
Trigger Event:
push
-
Statement type: