Identity resolution for AI applications - resolve duplicates in 10 lines of Python

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

c1v-id-mvp

These details have not been verified by PyPI

Project description

c1v-id

Identity resolution for AI applications

AI agents that interact with customers, CRMs, or any system of record face a critical decision point: is this person already in our system, or should we create a new record? Because both input and existing data are often messy, agents can confuse customer records, pollute data with duplicates, or deliver poor customer experiences.

c1v-id is an open-source identity resolution library that sits between the agent and the system of record, answering identity queries in milliseconds. It uses probabilistic record linkage with blocking strategies (~O(n) vs naive O(n²)), weighted multi-field scoring, and transitive clustering. Designed as a drop-in for LangChain agents, n8n workflows, and RAG pipelines. Zero ML dependencies. Configurable survivorship rules.

Installation

pip install c1v-id

Quick Start

Resolve duplicates in 10 lines of Python:

from c1v_id import IdentityResolver

resolver = IdentityResolver()

records = [
    {"email": "john@gmail.com", "name": "John Doe", "phone": "555-1234"},
    {"email": "john@gmail.com", "name": "J. Doe", "phone": "555-1234"},
    {"email": "jane@gmail.com", "name": "Jane Smith"},
]

golden = resolver.resolve(records)
print(f"Input: {len(records)} records → Output: {len(golden)} golden records")
# Input: 3 records → Output: 2 golden records

Match Two Records

result = resolver.match(
    {"email": "john@gmail.com", "name": "John"},
    {"email": "john@gmail.com", "name": "Johnny"}
)

print(result.score)       # 0.97
print(result.decision)    # 'auto_merge'
print(result.matched_on)  # ['email', 'name']

Find Matches in Existing Data

incoming = {"email": "john@gmail.com", "name": "John"}
existing = [
    {"id": "1", "email": "john@gmail.com", "name": "John Doe"},
    {"id": "2", "email": "jane@gmail.com", "name": "Jane Doe"},
]

matches = resolver.find_matches(incoming, existing)
# Returns best matches sorted by score

Custom Configuration

from c1v_id import IdentityResolver, ResolverConfig, Thresholds, Weights

config = ResolverConfig(
    thresholds=Thresholds(auto_merge=0.95, needs_review=0.8),
    weights=Weights(email=0.6, phone=0.3, name=0.1, address=0.0),
)

resolver = IdentityResolver(config=config)

Why c1v-id?

vs. Splink

	c1v-id	Splink
Hello World	10 lines	50+ lines
Target	AI builders	Data analysts
Setup	`pip install`	Spark/DuckDB config
ML Required	No	Optional
Use Case	Real-time matching	Batch analytics

Splink is powerful for large-scale data linkage projects with dedicated analysts. c1v-id is for developers who need identity resolution as a feature, not a project.

vs. dedupe

	c1v-id	dedupe
Maintenance	Active	Stale (2+ years)
Dependencies	3 (pandas, rapidfuzz, pyyaml)	10+
Learning Curve	Minimal	Requires training data
API Style	`resolve(records)`	Iterative labeling

dedupe requires interactive labeling to train a model. c1v-id works out of the box with sensible defaults.

vs. Enterprise CDPs (Segment, mParticle)

	c1v-id	Enterprise CDP
Cost	Free	$100K+/year
Data Location	Your infrastructure	Their cloud
Customization	Full control	Limited
Integration	Any Python app	Vendor lock-in

Enterprise CDPs solve identity as part of a larger platform. c1v-id gives you just the identity resolution piece to embed anywhere.

Core Concepts

Concept	What It Does	Why It Matters
Normalization	Cleans emails, phones, names	`John.Doe+tag@Gmail.com` → `johndoe@gmail.com`
Blocking	Groups likely matches	Reduces O(n²) to ~O(n)
Scoring	Calculates similarity	Weighted fuzzy matching across fields
Clustering	Groups transitive matches	If A≈B and B≈C, then A∈C
Golden Records	Merges duplicates	Best value wins per survivorship rules

Low-Level API

For custom pipelines, use the building blocks directly:

Normalization

from c1v_id import norm_email, norm_phone, norm_name

norm_email("John.Doe+tag@Gmail.com")  # 'johndoe@gmail.com'
norm_phone("(555) 123-4567")          # '5551234567'
norm_name("  JOHN   DOE  ")           # 'john doe'

Blocking

from c1v_id import email_domain_last4, phone_last7, make_blocks

email_domain_last4("john@gmail.com")  # 'gmail.com|john'
phone_last7("555-123-4567")           # '1234567'

blocks = make_blocks(df, ["email_domain_last4", "phone_last7"])

Clustering

from c1v_id import UnionFind

uf = UnionFind([1, 2, 3, 4, 5])
uf.union(1, 2)
uf.union(2, 3)
uf.find(1) == uf.find(3)  # True (transitive)
uf.get_clusters()         # {1: [1, 2, 3], 4: [4], 5: [5]}

Golden Records

from c1v_id import build_golden_records, SurvivorshipRule

rules = {
    "email": SurvivorshipRule.MOST_RECENT,
    "address": SurvivorshipRule.LONGEST,
    "first": SurvivorshipRule.FIRST_NON_NULL,
}

golden = build_golden_records(df, clusters, rules, source_priority=["crm", "web"])

Use Cases

AI Agents: Check if a customer exists before creating a new record
CRM Deduplication: Merge duplicate contacts from multiple sources
Lead Routing: Match incoming leads to existing opportunities
Customer Support: Find customer context across fragmented records
Data Migration: Deduplicate when merging systems

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

c1v-id-mvp

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jan 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

c1v_id-0.1.0.tar.gz (18.2 kB view details)

Uploaded Jan 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

c1v_id-0.1.0-py3-none-any.whl (22.1 kB view details)

Uploaded Jan 24, 2026 Python 3

File details

Details for the file c1v_id-0.1.0.tar.gz.

File metadata

Download URL: c1v_id-0.1.0.tar.gz
Upload date: Jan 24, 2026
Size: 18.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for c1v_id-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`080282e8f1c325cbecb0ac8647c25cbd7f59f6b2f9188a43520769b8b018accc`
MD5	`352179645a1bd6a8511b04610e9b7efc`
BLAKE2b-256	`ad704a644e6cae5fb66c5605c71986eddbc655a0364dfd3ed30ef0418282a713`

See more details on using hashes here.

Provenance

The following attestation bundles were made for c1v_id-0.1.0.tar.gz:

Publisher: publish.yml on davidancor/c1v-id

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: c1v_id-0.1.0.tar.gz
- Subject digest: 080282e8f1c325cbecb0ac8647c25cbd7f59f6b2f9188a43520769b8b018accc
- Sigstore transparency entry: 850154385
- Sigstore integration time: Jan 24, 2026
Source repository:
- Permalink: davidancor/c1v-id@b3a00f8c49c89a3aca2d636f411a333aa0cb9f1b
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/davidancor
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b3a00f8c49c89a3aca2d636f411a333aa0cb9f1b
- Trigger Event: release

File details

Details for the file c1v_id-0.1.0-py3-none-any.whl.

File metadata

Download URL: c1v_id-0.1.0-py3-none-any.whl
Upload date: Jan 24, 2026
Size: 22.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for c1v_id-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d0ca779c292e550238482a39b8f2c3306cd8582f58c23999ce46a90fb8074fbf`
MD5	`1f3f696a2b9f1d2810567299e35c2d2e`
BLAKE2b-256	`118dd14e900ff1a8d39708c28f5dd0438c85fd61356a0021b166abb84ff3a038`

See more details on using hashes here.

Provenance

The following attestation bundles were made for c1v_id-0.1.0-py3-none-any.whl:

Publisher: publish.yml on davidancor/c1v-id

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: c1v_id-0.1.0-py3-none-any.whl
- Subject digest: d0ca779c292e550238482a39b8f2c3306cd8582f58c23999ce46a90fb8074fbf
- Sigstore transparency entry: 850154386
- Sigstore integration time: Jan 24, 2026
Source repository:
- Permalink: davidancor/c1v-id@b3a00f8c49c89a3aca2d636f411a333aa0cb9f1b
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/davidancor
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b3a00f8c49c89a3aca2d636f411a333aa0cb9f1b
- Trigger Event: release

c1v-id 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

c1v-id

Installation

Quick Start

Match Two Records

Find Matches in Existing Data

Custom Configuration

Why c1v-id?

vs. Splink

vs. dedupe

vs. Enterprise CDPs (Segment, mParticle)

Core Concepts

Low-Level API

Normalization

Blocking

Clustering

Golden Records

Use Cases

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance