Skip to main content

Identity resolution as code

Project description

kanoniv

Identity resolution as code. Define matching rules in YAML, reconcile locally in Python.

PyPI License

Installation

pip install kanoniv

Quick Start

import kanoniv

# 1. Load your spec
spec = kanoniv.Spec.from_file("kanoniv.yml")

# 2. Validate it
result = kanoniv.validate(spec)
result.raise_on_error()

# 3. Load sources
sources = [
    kanoniv.Source.from_csv("crm", "data/crm_contacts.csv", primary_key="id"),
    kanoniv.Source.from_csv("billing", "data/billing_accounts.csv", primary_key="id"),
]

# 4. Reconcile
result = kanoniv.reconcile(sources, spec)

# 5. Golden records as a DataFrame
df = result.to_pandas()
print(f"{result.cluster_count} entities, {result.merge_rate:.0%} merge rate")

Every record in the output DataFrame gets a kanoniv_id — a stable identifier that groups duplicate records across sources into a single entity.

What the Spec Covers

The YAML spec is the single source of truth for your identity resolution pipeline:

  • Sources — canonical field mappings from each system
  • Blocking — composite keys to reduce O(n²) comparisons
  • Scoring — Fellegi-Sunter probabilistic matching with EM training
  • Normalizers — email, phone, name, nickname, domain (built-in)
  • Survivorship — golden record assembly rules (source priority, most complete)
  • Governance — freshness checks, schema validation, shadow-mode deploys

See the spec reference for the full schema.

Source Adapters

# Pandas DataFrame
source = kanoniv.Source.from_pandas("crm", df, primary_key="contact_id")

# CSV file
source = kanoniv.Source.from_csv("billing", "data/billing.csv", primary_key="account_id")

# Warehouse table (requires sqlalchemy)
source = kanoniv.Source.from_warehouse(
    "erp", table="raw.erp_customers", connection_string="postgresql://..."
)

# dbt model (requires sqlalchemy)
source = kanoniv.Source.from_dbt("staging", model="stg_customers")

Validation & Planning

# Validate spec for errors
result = kanoniv.validate(spec)
if not result:
    print(result.errors)

# Preview the execution plan
plan = kanoniv.plan(spec)
print(plan.summary())

Diffing Specs

# Compare two spec versions
diff = kanoniv.diff(spec_v1, spec_v2)
print(diff.summary)

Cloud API (Optional)

For managed reconciliation, monitoring, and collaboration, install with the cloud extra:

pip install kanoniv[cloud]
client = kanoniv.Client(api_key="kn_...")

result = client.resolve(system="crm", external_id="003xxx")
entities = client.entities.search(q="john@acme.com")

See the cloud API docs for the full reference.

Links

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kanoniv-0.2.13.tar.gz (234.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kanoniv-0.2.13-cp311-cp311-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

kanoniv-0.2.13-cp311-cp311-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

kanoniv-0.2.13-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file kanoniv-0.2.13.tar.gz.

File metadata

  • Download URL: kanoniv-0.2.13.tar.gz
  • Upload date:
  • Size: 234.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.0

File hashes

Hashes for kanoniv-0.2.13.tar.gz
Algorithm Hash digest
SHA256 da8f92479bed94c1fed64acdd2db7da26f72d095f4e0318f6c0182e3269841c6
MD5 6162bf09bdaf71748952ca75ce5e0907
BLAKE2b-256 b1d72d1592d8c1e4b766b8b668a4d27e4dd2a42313ee1f452404ca66e7befb51

See more details on using hashes here.

File details

Details for the file kanoniv-0.2.13-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kanoniv-0.2.13-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b749e3016a37ce25ac772073a53cec120f9d175514d95a4b9ae0ee6a61d2b1c9
MD5 17efa0839699f52e617355e33966ec67
BLAKE2b-256 da0083c98a6e9ec164338d7d534e627337ff6b8d4cbbb9f26a5e3518c3c14890

See more details on using hashes here.

File details

Details for the file kanoniv-0.2.13-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for kanoniv-0.2.13-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 75d2236e96df3b2c021c07522eaafc8a0b019d40d5d9a8b68491f106943f55de
MD5 000d4cfb8fa6f2498572f373b1a08e60
BLAKE2b-256 132f94001ea159bae6c3c69ee03c6f917e38b95fe7a2520e9e9851ab76288190

See more details on using hashes here.

File details

Details for the file kanoniv-0.2.13-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kanoniv-0.2.13-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 314878b43b7566c859e9e74f7fb396cee0367a287752f7785df4cbbdbf619996
MD5 1101f58f7a480a30a93a2b1877ba00b3
BLAKE2b-256 402405bfd4382fca8ddd123d542b8f896b970a855632af8c776b9b5a2127e140

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page