Skip to main content

Record linkage with dense blocking using text embeddings and LLM matching

Project description

denselinkage

CI License: MIT

Record linkage with dense blocking using text embeddings and LLM matching.

Status — beta. The dependency-free core is implemented and runs: link / dedupe / match_pairs, connected-components clustering, and the linkage / blocking / clustering metrics — all on numpy + pandas. The heavy extras (FAISS, sentence-transformers, LangChain) are experimental this release: their adapters are declared but raise NotImplementedError.

Usage

from denselinkage import DenseLinker, Source, TemplateSerializer
from denselinkage.core.results import LabeledPairs
from denselinkage.metrics import linkage_metrics

linker = DenseLinker.with_defaults()  # picks a sensible embedder/index/matcher
left  = Source(df_a, id_column="id_a", serializer=TemplateSerializer("Name: {name}, City: {city}"))
right = Source(df_b, id_column="id_b", serializer=TemplateSerializer(
    "Name: {name}, City: {city}", column_mapping={"company_name": "name", "headquarters": "city"}))

result  = linker.link(left, right)               # one call, no fit/predict, no mutation
metrics = linkage_metrics(result, gold=LabeledPairs.from_pairs([("A1", "B1")]))
result.to_frame()  # left_id, right_id, match, confidence, reason, similarity

Deduplicate one dataset with linker.dedupe(src); reuse an index with idx = linker.index(left); idx.query(right). See examples/00_quickstart.py is the shortest path, 01_end_to_end_linkage.py shows full component control.

Install

pip install denselinkage                       # core (numpy, pandas)
pip install "denselinkage[faiss]"              # + FAISS vector index
pip install "denselinkage[sentence-transformers]"
pip install "denselinkage[langchain]"          # + LLM matcher
pip install "denselinkage[all]"

The [faiss], [sentence-transformers], and [langchain] extras are reserved but experimental this release — their adapters raise NotImplementedError; the dependency-free core runs without them.

Development

Requires uv.

uv sync --dev
uv run ruff check . && uv run ruff format --check . && uv run mypy && uv run pytest

See CONTRIBUTING.md for details. CI runs lint, format, strict mypy, and tests on Python 3.10–3.13.

Changelog

See CHANGELOG.md.

License

MIT © 2026 Alvaro

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

denselinkage-1.0.0b2.tar.gz (32.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

denselinkage-1.0.0b2-py3-none-any.whl (56.1 kB view details)

Uploaded Python 3

File details

Details for the file denselinkage-1.0.0b2.tar.gz.

File metadata

  • Download URL: denselinkage-1.0.0b2.tar.gz
  • Upload date:
  • Size: 32.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for denselinkage-1.0.0b2.tar.gz
Algorithm Hash digest
SHA256 16b0d718bd9dabac027ab36c8867172f70f1277f2bd0c9c1ea0ff3b8dbda9c45
MD5 555afe84756efc53dd1f2b1ecb0f097f
BLAKE2b-256 4cc29ffc249954100559017a575f4ea241e5539003cb073382e4b164f0af8e50

See more details on using hashes here.

Provenance

The following attestation bundles were made for denselinkage-1.0.0b2.tar.gz:

Publisher: release.yml on caalvaro/denselinkage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file denselinkage-1.0.0b2-py3-none-any.whl.

File metadata

  • Download URL: denselinkage-1.0.0b2-py3-none-any.whl
  • Upload date:
  • Size: 56.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for denselinkage-1.0.0b2-py3-none-any.whl
Algorithm Hash digest
SHA256 f584cc1edd54b2ae903569106ecb92305151ef53a75f1278e0e93d6a8caf8e6b
MD5 b4597eb9e806fb4b702a6947800b9710
BLAKE2b-256 9e776f7fbff7dfdca856ecc4f9d0b3c8d32b1817d18c6602b795294051e0c97c

See more details on using hashes here.

Provenance

The following attestation bundles were made for denselinkage-1.0.0b2-py3-none-any.whl:

Publisher: release.yml on caalvaro/denselinkage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page