Skip to main content

Identity resolution as code

Project description

kanoniv

Identity resolution as code. Define matching rules in YAML, reconcile locally in Python.

PyPI License

Installation

pip install kanoniv

Quick Start

import kanoniv

# 1. Load your spec
spec = kanoniv.Spec.from_file("kanoniv.yml")

# 2. Validate it
result = kanoniv.validate(spec)
result.raise_on_error()

# 3. Load sources
sources = [
    kanoniv.Source.from_csv("crm", "data/crm_contacts.csv", primary_key="id"),
    kanoniv.Source.from_csv("billing", "data/billing_accounts.csv", primary_key="id"),
]

# 4. Reconcile
result = kanoniv.reconcile(sources, spec)

# 5. Golden records as a DataFrame
df = result.to_pandas()
print(f"{result.cluster_count} entities, {result.merge_rate:.0%} merge rate")

Every record in the output DataFrame gets a kanoniv_id — a stable identifier that groups duplicate records across sources into a single entity.

What the Spec Covers

The YAML spec is the single source of truth for your identity resolution pipeline:

  • Sources — canonical field mappings from each system
  • Blocking — composite keys to reduce O(n²) comparisons
  • Scoring — Fellegi-Sunter probabilistic matching with EM training
  • Normalizers — email, phone, name, nickname, domain (built-in)
  • Survivorship — golden record assembly rules (source priority, most complete)
  • Governance — freshness checks, schema validation, shadow-mode deploys

See the spec reference for the full schema.

Source Adapters

# Pandas DataFrame
source = kanoniv.Source.from_pandas("crm", df, primary_key="contact_id")

# CSV file
source = kanoniv.Source.from_csv("billing", "data/billing.csv", primary_key="account_id")

# Warehouse table (requires sqlalchemy)
source = kanoniv.Source.from_warehouse(
    "erp", table="raw.erp_customers", connection_string="postgresql://..."
)

# dbt model (requires sqlalchemy)
source = kanoniv.Source.from_dbt("staging", model="stg_customers")

Validation & Planning

# Validate spec for errors
result = kanoniv.validate(spec)
if not result:
    print(result.errors)

# Preview the execution plan
plan = kanoniv.plan(spec)
print(plan.summary())

Diffing Specs

# Compare two spec versions
diff = kanoniv.diff(spec_v1, spec_v2)
print(diff.summary)

Cloud API (Optional)

For managed reconciliation, monitoring, and collaboration, install with the cloud extra:

pip install kanoniv[cloud]
client = kanoniv.Client(api_key="kn_...")

result = client.resolve(system="crm", external_id="003xxx")
entities = client.entities.search(q="john@acme.com")

See the cloud API docs for the full reference.

Links

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kanoniv-0.2.12.tar.gz (246.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kanoniv-0.2.12-cp311-cp311-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

kanoniv-0.2.12-cp311-cp311-macosx_10_12_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

kanoniv-0.2.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file kanoniv-0.2.12.tar.gz.

File metadata

  • Download URL: kanoniv-0.2.12.tar.gz
  • Upload date:
  • Size: 246.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.11.5

File hashes

Hashes for kanoniv-0.2.12.tar.gz
Algorithm Hash digest
SHA256 904a2ab7ffd25f467e3b1c08040a05b0f806cb63e02e1c2ea2498bda1c8093a7
MD5 37cce06592fa97bedd49e00e7cd6e0c9
BLAKE2b-256 2d8dfa54347bc9c1fda26f27c4c8e2e45571786337f9c057cf5d9c8a36bc2d14

See more details on using hashes here.

File details

Details for the file kanoniv-0.2.12-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kanoniv-0.2.12-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7dc640c91900009cceb120c95bd69de743f517278ea50fbaf7212ef4164fab16
MD5 d672428dc6443fb281a5f4a441b20408
BLAKE2b-256 af3688cd52e77212b5f956835a03eba2913d6371e92d706d6aa19403ba484133

See more details on using hashes here.

File details

Details for the file kanoniv-0.2.12-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for kanoniv-0.2.12-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 87ba31059cf9f483ac5028c016146456c02ecc0e02bfde0b3d11a08b08a587bf
MD5 999f018957a73c0427ed72f642c6b2e1
BLAKE2b-256 115b92840d2bc80ad11732bb2f94831028ad083431c1165647c23e14c0d382f0

See more details on using hashes here.

File details

Details for the file kanoniv-0.2.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kanoniv-0.2.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7c8f6620f5bd4088bf113f9b85878550ba16ebf4248d1e6f799ff3cebbe0846d
MD5 e074351b6fe68355021573a4eb8e53c9
BLAKE2b-256 065203db200409dfcc3bfae0e8f0468673bf9bacd05299f61ef7eee8eb52fad3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page