rolodexter

The universal contact field mapper — route messy, inconsistent contact data to a clean canonical schema.

These details have not been verified by PyPI

Project links

Project description

The universal contact field mapper.

Route messy, inconsistent contact data from any source to a clean, canonical schema.

The Problem

Every CRM, email platform, and CSV export uses different field names for the same data:

Service	First Name	Phone	Company
HubSpot	`firstname`	`mobilephone`	`company`
Salesforce	`FirstName`	`MobilePhone`	`Company`
Mailchimp	`FNAME`	`PHONE`	`COMPANY`
Google CSV	`Given Name`	`Phone 1 - Value`	`Organization 1 - Name`
Random CSV	`Column A`	`Column B`	`Column C`

The Solution

from rolodexter import ContactMapper

mapper = ContactMapper()

result = mapper.map_payload({
    "fname": "jane",
    "surname": "doe",
    "mobile": "+1-650-253-0000",
    "employer": "Tech Corp",
    "Column 1": "jane.doe@example.com",  # auto-detected by value shape
})

print(result.normalized)
# {
#     "first_name": "Jane",
#     "last_name": "Doe",
#     "phone": "+16502530000",
#     "company": "Tech Corp",
#     "email": "jane.doe@example.com"
# }

Installation

# Core (phonenumbers + nameparser)
pip install rolodexter

# With fuzzy matching for typo recovery
pip install rolodexter[fuzzy]

# With on-demand i18n translation (40 languages)
pip install rolodexter[i18n]

# Everything
pip install rolodexter[all]

# Development
pip install rolodexter[dev]

Features

🎯 Four-Layer Matching Pipeline

Every field runs through the strategy chain in priority order:

Exact Match — O(1) lookup against 615+ known aliases across 62 canonical fields
Normalized Match — handles CamelCase, dot.path, space → underscore, and similar variations
Fuzzy Match — rapidfuzz catches typos like "phne_nmbr" → phone
Heuristic Match — regex detects emails, phones, URLs, postal codes by data shape

📊 Confidence Scoring

Every match comes with a confidence score (0.0–1.0):

match = mapper.identify("fname")
# FieldMatch(original='fname', canonical='first_name', confidence=1.0, strategy='exact')

match = mapper.identify("phne")
# FieldMatch(original='phne', canonical='phone', confidence=0.85, strategy='fuzzy')

match = mapper.identify("Column X", value="jane@test.com")
# FieldMatch(original='Column X', canonical='email', confidence=0.6, strategy='heuristic')

� Per-Caller Field Overrides

For vendor-specific or account-level field names that won't be in the standard alias table:

mapper = ContactMapper(
    overrides={
        "MMERGE6": "company",   # Mailchimp custom merge field
        "cf_lead_score": "tags",
    }
)

📱 Phone Extraction

# Extract phones embedded in arbitrary string values
result = mapper.map_payload(
    {"notes": "call me at +1-650-253-0000 or +44 20 7946 0958"},
    extract_embedded_phones=True,
)
print(result.get_all_phones())
# ['+16502530000', '+442079460958']

🗂️ Tags / List Fields

Fields like tags are automatically list-normalised — comma-separated strings, JSON arrays, and Python lists all collapse to a clean list:

result = mapper.map_payload({"tags": "vip, newsletter, beta"})
print(result.normalized["tags"])
# ['vip', 'newsletter', 'beta']

🌍 On-Demand i18n (40 Languages)

English ships by default. Request any of 40 supported languages and aliases are generated on the fly via Google Translate, then cached so translation only happens once:

from rolodexter import ContactMapper

# Load Spanish aliases on demand
mapper = ContactMapper(languages=["es"])
result = mapper.map_payload({"correo_electronico": "juan@example.com"})
print(result.normalized["email"])  # juan@example.com

# CLI: generate and cache all 40 languages
python -m rolodexter.i18n

# Or specific languages
python -m rolodexter.i18n --languages es,fr,de

# List supported languages
python -m rolodexter.i18n --list

Supported: Spanish, French, German, Portuguese, Italian, Dutch, Polish, Romanian, Turkish, Russian, Japanese, Chinese (Simplified), Korean, Arabic, Hindi, Swedish, Danish, Norwegian, Finnish, Czech, Ukrainian, Greek, Hungarian, Thai, Vietnamese, Indonesian, Malay, Hebrew, Bulgarian, Croatian, Slovak, Slovenian, Serbian, Lithuanian, Latvian, Estonian, Catalan, Filipino, Swahili, Afrikaans.

🧹 Value Normalization

Automatic cleanup on matched fields:

Phone → E.164 format via libphonenumber (+16502530000)
Email → lowercase, trimmed
Names → title case with particle awareness ("jane van der berg" → "Jane van der Berg")
Addresses → excess whitespace collapsed, title-cased
Tags → normalized to list[str]

📦 Batch Processing

results = mapper.map_batch([contact1, contact2, contact3, ...])

📈 Rich Diagnostics

result = mapper.map_payload(data)

print(result.match_rate)        # 0.857
print(result.matched_count)     # 6
print(result.unmatched_count)   # 1
print(result.get_all_phones())  # ['+16502530000']
print(result.to_dict())         # Full JSON-serializable report

🔢 Nested Payload Support

# Flatten one level of nesting with depth=2
result = mapper.map_payload(
    {"contact": {"fname": "Jane", "lname": "Doe"}},
    depth=2,
)
# Accesses "contact.fname" and "contact.lname"

API Reference

`ContactMapper`

ContactMapper(
    *,
    patterns=None,        # Custom pattern dict (overrides built-in)
    patterns_path=None,   # Path to a custom patterns.json file
    normalize=True,       # Apply value normalization after mapping
    strategies=None,      # Override the default strategy pipeline
    languages=None,       # None=English only | "es" | ["es","fr"] | "all"
    overrides=None,       # Extra alias→canonical mappings {"MMERGE6": "company"}
)

Methods:

Method	Description
`identify(header, *, value)`	Resolve a single header to a `FieldMatch`
`map_payload(payload, *, depth, extract_embedded_phones)`	Normalize an entire dict
`map_batch(payloads, *, depth)`	Process a list of payloads
`registry`	Access the underlying `PatternRegistry`

`FieldMatch`

FieldMatch(
    original='fname',
    canonical='first_name',
    confidence=1.0,
    strategy='exact',      # 'exact' | 'normalized' | 'fuzzy' | 'heuristic' | 'none'
    is_matched=True,
)

`MappingResult`

Attribute / Method	Type	Description
`normalized`	`dict`	Canonical key → cleaned value
`unmapped`	`dict`	Fields that couldn't be resolved
`field_matches`	`tuple[FieldMatch, ...]`	Full match detail for every input field
`match_rate`	`float`	Fraction of fields successfully matched
`matched_count`	`int`	Count of matched fields
`unmatched_count`	`int`	Count of unmatched fields
`get_match(header)`	`FieldMatch \| None`	Look up the match for a specific input header
`get_all_phones()`	`list[str]`	All phone values across all phone-adjacent fields
`to_dict()`	`dict`	Full JSON-serializable report

`CanonicalField`

Enum of all 62 canonical fields. Inherits from str for JSON compatibility:

from rolodexter import CanonicalField

assert CanonicalField.EMAIL == "email"
assert CanonicalField.PHONE.value == "phone"

All 62 canonical fields

first_name · last_name · full_name · email · phone · home_phone · work_phone · fax · whatsapp · address · address_line_2 · city · state · postal_code · country · company · job_title · department · website · linkedin · twitter · instagram · facebook · birthday · gender · language · timezone · currency · tags · notes · source · source_id · source_service · subscribed · verified · created_at · updated_at · ip_address · user_agent · referrer · utm_source · utm_medium · utm_campaign · utm_term · utm_content · revenue · lifetime_value · company_size · industry · message · subject · salutation · suffix · nickname · middle_name · maiden_name · preferred_name · pronouns · age · annual_income · score · unknown

Custom Patterns

custom = {
    "fields": {
        "first_name": ["fname", "given", "nombre"],
        "loyalty_tier": ["tier", "vip_level", "membership"],
    }
}

mapper = ContactMapper(patterns=custom)

Architecture

rolodexter/
├── __init__.py      # Public API
├── core.py          # ContactMapper, PatternRegistry, strategies, normalizers
├── _phone.py        # E.164 phone parser (wraps libphonenumber)
├── i18n.py          # On-demand i18n generator (40 languages, cached)
├── patterns.json    # Master alias table (615+ aliases, 62 canonical fields)
└── i18n/            # Cached language files (generated on demand)

Contributing

git clone https://github.com/LunarWerx/rolodexter.git
cd rolodexter
pip install -e ".[dev]"
pytest

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.6.5

Mar 2, 2026

This version

2.6.4

Mar 2, 2026

2.6.3

Mar 2, 2026

2.6.2

Mar 2, 2026

2.6.1

Mar 2, 2026

2.6.0

Mar 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rolodexter-2.6.4.tar.gz (66.0 kB view details)

Uploaded Mar 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rolodexter-2.6.4-py3-none-any.whl (33.8 kB view details)

Uploaded Mar 2, 2026 Python 3

File details

Details for the file rolodexter-2.6.4.tar.gz.

File metadata

Download URL: rolodexter-2.6.4.tar.gz
Upload date: Mar 2, 2026
Size: 66.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for rolodexter-2.6.4.tar.gz
Algorithm	Hash digest
SHA256	`1d8a32b95f3d635cb451fe15d5fa2ce37272d67ac9d5d9055d0f07e15d978f80`
MD5	`42c7f7ba5033963ac5c80ab9c8a789c3`
BLAKE2b-256	`e0e5b0506a884f245eb0f7d5fd749a8b02bf4d05dad56110c265819573b539cc`

See more details on using hashes here.

File details

Details for the file rolodexter-2.6.4-py3-none-any.whl.

File metadata

Download URL: rolodexter-2.6.4-py3-none-any.whl
Upload date: Mar 2, 2026
Size: 33.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for rolodexter-2.6.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b4e622e0736f7a846d8c67f8aaea9a74e4f4a8095da0d12c6c4db791e0095b1d`
MD5	`f92ab637909592eb1bd97b2578ff2105`
BLAKE2b-256	`1d0129f0c58075a45d1ad0a7268a1b94063ace37377c4daf515371a73dc699e2`

See more details on using hashes here.

rolodexter 2.6.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

The Problem

The Solution

Installation

Features

🎯 Four-Layer Matching Pipeline

📊 Confidence Scoring

� Per-Caller Field Overrides

📱 Phone Extraction

🗂️ Tags / List Fields

🌍 On-Demand i18n (40 Languages)

🧹 Value Normalization

📦 Batch Processing

📈 Rich Diagnostics

🔢 Nested Payload Support

API Reference

ContactMapper

FieldMatch

MappingResult

CanonicalField

Custom Patterns

Architecture

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`ContactMapper`

`FieldMatch`

`MappingResult`

`CanonicalField`