Skip to main content

Identity resolution as code

Project description

kanoniv

Identity resolution as code. Define matching rules in YAML, reconcile locally in Python.

PyPI License

Installation

pip install kanoniv

Quick Start

import kanoniv

# 1. Load your spec
spec = kanoniv.Spec.from_file("kanoniv.yml")

# 2. Validate it
result = kanoniv.validate(spec)
result.raise_on_error()

# 3. Load sources
sources = [
    kanoniv.Source.from_csv("crm", "data/crm_contacts.csv", primary_key="id"),
    kanoniv.Source.from_csv("billing", "data/billing_accounts.csv", primary_key="id"),
]

# 4. Reconcile
result = kanoniv.reconcile(sources, spec)

# 5. Golden records as a DataFrame
df = result.to_pandas()
print(f"{result.cluster_count} entities, {result.merge_rate:.0%} merge rate")

Every record in the output DataFrame gets a kanoniv_id — a stable identifier that groups duplicate records across sources into a single entity.

What the Spec Covers

The YAML spec is the single source of truth for your identity resolution pipeline:

  • Sources — canonical field mappings from each system
  • Blocking — composite keys to reduce O(n²) comparisons
  • Scoring — Fellegi-Sunter probabilistic matching with EM training
  • Normalizers — email, phone, name, nickname, domain (built-in)
  • Survivorship — golden record assembly rules (source priority, most complete)
  • Governance — freshness checks, schema validation, shadow-mode deploys

See the spec reference for the full schema.

Source Adapters

# Pandas DataFrame
source = kanoniv.Source.from_pandas("crm", df, primary_key="contact_id")

# CSV file
source = kanoniv.Source.from_csv("billing", "data/billing.csv", primary_key="account_id")

# Warehouse table (requires sqlalchemy)
source = kanoniv.Source.from_warehouse(
    "erp", table="raw.erp_customers", connection_string="postgresql://..."
)

# dbt model (requires sqlalchemy)
source = kanoniv.Source.from_dbt("staging", model="stg_customers")

Validation & Planning

# Validate spec for errors
result = kanoniv.validate(spec)
if not result:
    print(result.errors)

# Preview the execution plan
plan = kanoniv.plan(spec)
print(plan.summary())

Diffing Specs

# Compare two spec versions
diff = kanoniv.diff(spec_v1, spec_v2)
print(diff.summary)

Cloud API (Optional)

For managed reconciliation, monitoring, and collaboration, install with the cloud extra:

pip install kanoniv[cloud]
client = kanoniv.Client(api_key="kn_...")

result = client.resolve(system="crm", external_id="003xxx")
entities = client.entities.search(q="john@acme.com")

See the cloud API docs for the full reference.

Links

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kanoniv-0.2.14.tar.gz (242.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kanoniv-0.2.14-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

kanoniv-0.2.14-cp311-cp311-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

kanoniv-0.2.14-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file kanoniv-0.2.14.tar.gz.

File metadata

  • Download URL: kanoniv-0.2.14.tar.gz
  • Upload date:
  • Size: 242.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.0

File hashes

Hashes for kanoniv-0.2.14.tar.gz
Algorithm Hash digest
SHA256 1a41024432993ca5202b3d83a52dca014a3509048ca4ed26b808f608a8e98eab
MD5 e5afd24cedd73de4a7a71647fa93718a
BLAKE2b-256 d87a9357715831fee8b2e2575bf38d3ee807ae7bcebb3b6c89ab759072fc3c31

See more details on using hashes here.

File details

Details for the file kanoniv-0.2.14-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kanoniv-0.2.14-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b9cec0267c7764db9f61c6df903e3025cd3784826ddaad6f2ea9b393f8813ac3
MD5 a84830abc02ed34c93afae375fd698f1
BLAKE2b-256 c16f2baa9793236c2dd3f5ed29f40483a26065196a27097137ffdcf20f7b5f0b

See more details on using hashes here.

File details

Details for the file kanoniv-0.2.14-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for kanoniv-0.2.14-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 1b2844ca68eb682cd1179ee84d38d5b56adb42d9d4125528cdce967ddc628eca
MD5 d0003e6181413b0d733272caefbd4745
BLAKE2b-256 4198b1b69e04697dca8afd41d171f6fb6668c1c87ed1d8a26bfcb1c3caf7d783

See more details on using hashes here.

File details

Details for the file kanoniv-0.2.14-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kanoniv-0.2.14-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f4963a8e1b4dc35b774ef5c07064e55493e3c29f6eaf115569d5dd78144294f3
MD5 a3245a43187f813173855e6e8e2fb34e
BLAKE2b-256 eb24af769bae0f4e19ad7accbe8dbf13275944b3605f2bac2a85926154938d27

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page