Skip to main content

Record linkage with dense blocking using text embeddings and LLM matching

Project description

denselinkage

CI License: MIT

Record linkage with dense blocking using text embeddings and LLM matching.

Status — beta. The dependency-free core is implemented and runs: link / dedupe / match_pairs, connected-components clustering, and the linkage / blocking / clustering metrics — all on numpy + pandas. The heavy extras (FAISS, sentence-transformers, LangChain) are experimental this release: their adapters are declared but raise NotImplementedError.

Usage

from denselinkage import DenseLinker, Source, TemplateSerializer
from denselinkage.core.results import LabeledPairs
from denselinkage.metrics import linkage_metrics

linker = DenseLinker.with_defaults()  # picks a sensible embedder/index/matcher
left  = Source(df_a, id_column="id_a", serializer=TemplateSerializer("Name: {name}, City: {city}"))
right = Source(df_b, id_column="id_b", serializer=TemplateSerializer(
    "Name: {name}, City: {city}", column_mapping={"company_name": "name", "headquarters": "city"}))

result  = linker.link(left, right)               # one call, no fit/predict, no mutation
metrics = linkage_metrics(result, gold=LabeledPairs.from_pairs([("A1", "B1")]))
result.to_frame()  # left_id, right_id, match, confidence, reason, similarity

Deduplicate one dataset with linker.dedupe(src); reuse an index with idx = linker.index(left); idx.query(right). See examples/00_quickstart.py is the shortest path, 01_end_to_end_linkage.py shows full component control.

Install

pip install denselinkage                       # core (numpy, pandas)
pip install "denselinkage[faiss]"              # + FAISS vector index
pip install "denselinkage[sentence-transformers]"
pip install "denselinkage[langchain]"          # + LLM matcher
pip install "denselinkage[all]"

The [faiss], [sentence-transformers], and [langchain] extras are reserved but experimental this release — their adapters raise NotImplementedError; the dependency-free core runs without them.

Development

Requires uv.

uv sync --dev
uv run ruff check . && uv run ruff format --check . && uv run mypy && uv run pytest

See CONTRIBUTING.md for details. CI runs lint, format, strict mypy, and tests on Python 3.10–3.13.

Changelog

See CHANGELOG.md.

License

MIT © 2026 Alvaro

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

denselinkage-1.0.0b1.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

denselinkage-1.0.0b1-py3-none-any.whl (46.9 kB view details)

Uploaded Python 3

File details

Details for the file denselinkage-1.0.0b1.tar.gz.

File metadata

  • Download URL: denselinkage-1.0.0b1.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for denselinkage-1.0.0b1.tar.gz
Algorithm Hash digest
SHA256 e8564b523a180e68cd0d50a199f06374044dada7338cef4c14dcfb9fb3809f16
MD5 ea5d0a9fdaa0b7d5870f8a721f4a7547
BLAKE2b-256 5fdd92d03f5157d06b064216eca86083edc46dcce64fa30d7e0089bd865c1e12

See more details on using hashes here.

Provenance

The following attestation bundles were made for denselinkage-1.0.0b1.tar.gz:

Publisher: release.yml on caalvaro/denselinkage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file denselinkage-1.0.0b1-py3-none-any.whl.

File metadata

  • Download URL: denselinkage-1.0.0b1-py3-none-any.whl
  • Upload date:
  • Size: 46.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for denselinkage-1.0.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 a8114a4afc3d2fdf45547c23d0f63bc7c803c7f16fe9d0bbe548be3635200ef0
MD5 0f1110dc8afde6bea27eee41e6f56ab1
BLAKE2b-256 457d9670d486c30724a39536611f46ad297d5755c3f5e5542a58eff179f7bfa0

See more details on using hashes here.

Provenance

The following attestation bundles were made for denselinkage-1.0.0b1-py3-none-any.whl:

Publisher: release.yml on caalvaro/denselinkage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page