Skip to main content

Identity resolution as code

Project description

kanoniv

Identity resolution as code. Define matching rules in YAML, reconcile locally in Python.

PyPI License

Installation

pip install kanoniv

Quick Start

import kanoniv

# 1. Load your spec
spec = kanoniv.Spec.from_file("kanoniv.yml")

# 2. Validate it
result = kanoniv.validate(spec)
result.raise_on_error()

# 3. Load sources
sources = [
    kanoniv.Source.from_csv("crm", "data/crm_contacts.csv", primary_key="id"),
    kanoniv.Source.from_csv("billing", "data/billing_accounts.csv", primary_key="id"),
]

# 4. Reconcile
result = kanoniv.reconcile(sources, spec)

# 5. Golden records as a DataFrame
df = result.to_pandas()
print(f"{result.cluster_count} entities, {result.merge_rate:.0%} merge rate")

Every record in the output DataFrame gets a kanoniv_id — a stable identifier that groups duplicate records across sources into a single entity.

What the Spec Covers

The YAML spec is the single source of truth for your identity resolution pipeline:

  • Sources — canonical field mappings from each system
  • Blocking — composite keys to reduce O(n²) comparisons
  • Scoring — Fellegi-Sunter probabilistic matching with EM training
  • Normalizers — email, phone, name, nickname, domain (built-in)
  • Survivorship — golden record assembly rules (source priority, most complete)
  • Governance — freshness checks, schema validation, shadow-mode deploys

See the spec reference for the full schema.

Source Adapters

# Pandas DataFrame
source = kanoniv.Source.from_pandas("crm", df, primary_key="contact_id")

# CSV file
source = kanoniv.Source.from_csv("billing", "data/billing.csv", primary_key="account_id")

# Warehouse table (requires sqlalchemy)
source = kanoniv.Source.from_warehouse(
    "erp", table="raw.erp_customers", connection_string="postgresql://..."
)

# dbt model (requires sqlalchemy)
source = kanoniv.Source.from_dbt("staging", model="stg_customers")

Validation & Planning

# Validate spec for errors
result = kanoniv.validate(spec)
if not result:
    print(result.errors)

# Preview the execution plan
plan = kanoniv.plan(spec)
print(plan.summary())

Diffing Specs

# Compare two spec versions
diff = kanoniv.diff(spec_v1, spec_v2)
print(diff.summary)

Cloud API (Optional)

For managed reconciliation, monitoring, and collaboration, install with the cloud extra:

pip install kanoniv[cloud]
client = kanoniv.Client(api_key="kn_...")

result = client.resolve(system="crm", external_id="003xxx")
entities = client.entities.search(q="john@acme.com")

See the cloud API docs for the full reference.

Links

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kanoniv-0.2.16.tar.gz (273.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kanoniv-0.2.16-cp311-cp311-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

kanoniv-0.2.16-cp311-cp311-macosx_10_12_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

kanoniv-0.2.16-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file kanoniv-0.2.16.tar.gz.

File metadata

  • Download URL: kanoniv-0.2.16.tar.gz
  • Upload date:
  • Size: 273.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.2

File hashes

Hashes for kanoniv-0.2.16.tar.gz
Algorithm Hash digest
SHA256 405a2e475a9031fb8e6ee4711e5ab1107fd266044770f433064c153db3ec5df2
MD5 9e27a84b016e409791e64180033f16a1
BLAKE2b-256 914d9e87ddf8be16c174047945877dc72d4aff7bf6c767559b6ae2f635e08b03

See more details on using hashes here.

File details

Details for the file kanoniv-0.2.16-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kanoniv-0.2.16-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7d2670e88a8352013aa15400c88f4a8cda068947281ff67202561f5b3401e42d
MD5 b225528698c652e6f1ff744bf50a7bb2
BLAKE2b-256 5328eaa6b00394355b52c3b9785eedc6d3bb71e6f523526b5e4433c14752695e

See more details on using hashes here.

File details

Details for the file kanoniv-0.2.16-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for kanoniv-0.2.16-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 acabdf26607f5847f92f8d0d058517da23e59656223dbdf53709a181d16e1928
MD5 22607ab4ea5303d9fd6d58f6f7b9ec15
BLAKE2b-256 5fd0e0034cc88dbc093968532866f06f25f2b0bd320cc9bac5b1eaeb86cafc6b

See more details on using hashes here.

File details

Details for the file kanoniv-0.2.16-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kanoniv-0.2.16-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6dde1412b467452eb2f9cd5f34c7c9887832c83fdcff1264e89df3c0961dc6b6
MD5 640950c6cf94a11cb551e611e56ae67c
BLAKE2b-256 bf1d578d50c1ae7df3761cb80ba233a855f98684eded4f25b0cf818585dfed00

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page