Skip to main content

The universal contact field mapper — route messy, inconsistent contact data to a clean canonical schema.

Project description

RoloDexter

The universal contact field mapper.

Route messy, inconsistent contact data from any source to a clean, canonical schema.

CI PyPI Python License: MIT


The Problem

Every CRM, email platform, and CSV export uses different field names for the same data:

Service First Name Phone Company
HubSpot firstname mobilephone company
Salesforce FirstName MobilePhone Company
Mailchimp FNAME PHONE COMPANY
Google CSV Given Name Phone 1 - Value Organization 1 - Name
Random CSV Column A Column B Column C

The Solution

from rolodexter import ContactMapper

mapper = ContactMapper()

result = mapper.map_payload({
    "fname": "jane",
    "surname": "doe",
    "mobile": "+1-650-253-0000",
    "employer": "Tech Corp",
    "Column 1": "jane.doe@example.com",  # auto-detected by value shape
})

print(result.normalized)
# {
#     "first_name": "Jane",
#     "last_name": "Doe",
#     "phone": "+16502530000",
#     "company": "Tech Corp",
#     "email": "jane.doe@example.com"
# }

Installation

# Core (phonenumbers + nameparser)
pip install rolodexter

# With fuzzy matching for typo recovery
pip install rolodexter[fuzzy]

# With on-demand i18n translation (40 languages)
pip install rolodexter[i18n]

# Everything
pip install rolodexter[all]

# Development
pip install rolodexter[dev]

Features

🎯 Four-Layer Matching Pipeline

Every field runs through the strategy chain in priority order:

  1. Exact Match — O(1) lookup against 615+ known aliases across 62 canonical fields
  2. Normalized Match — handles CamelCase, dot.path, space → underscore, and similar variations
  3. Fuzzy Matchrapidfuzz catches typos like "phne_nmbr"phone
  4. Heuristic Match — regex detects emails, phones, URLs, postal codes by data shape

📊 Confidence Scoring

Every match comes with a confidence score (0.0–1.0):

match = mapper.identify("fname")
# FieldMatch(original='fname', canonical='first_name', confidence=1.0, strategy='exact')

match = mapper.identify("phne")
# FieldMatch(original='phne', canonical='phone', confidence=0.85, strategy='fuzzy')

match = mapper.identify("Column X", value="jane@test.com")
# FieldMatch(original='Column X', canonical='email', confidence=0.6, strategy='heuristic')

� Per-Caller Field Overrides

For vendor-specific or account-level field names that won't be in the standard alias table:

mapper = ContactMapper(
    overrides={
        "MMERGE6": "company",   # Mailchimp custom merge field
        "cf_lead_score": "tags",
    }
)

📱 Phone Extraction

# Extract phones embedded in arbitrary string values
result = mapper.map_payload(
    {"notes": "call me at +1-650-253-0000 or +44 20 7946 0958"},
    extract_embedded_phones=True,
)
print(result.get_all_phones())
# ['+16502530000', '+442079460958']

🗂️ Tags / List Fields

Fields like tags are automatically list-normalised — comma-separated strings, JSON arrays, and Python lists all collapse to a clean list:

result = mapper.map_payload({"tags": "vip, newsletter, beta"})
print(result.normalized["tags"])
# ['vip', 'newsletter', 'beta']

🌍 On-Demand i18n (40 Languages)

English ships by default. Request any of 40 supported languages and aliases are generated on the fly via Google Translate, then cached so translation only happens once:

from rolodexter import ContactMapper

# Load Spanish aliases on demand
mapper = ContactMapper(languages=["es"])
result = mapper.map_payload({"correo_electronico": "juan@example.com"})
print(result.normalized["email"])  # juan@example.com
# CLI: generate and cache all 40 languages
python -m rolodexter.i18n

# Or specific languages
python -m rolodexter.i18n --languages es,fr,de

# List supported languages
python -m rolodexter.i18n --list

Supported: Spanish, French, German, Portuguese, Italian, Dutch, Polish, Romanian, Turkish, Russian, Japanese, Chinese (Simplified), Korean, Arabic, Hindi, Swedish, Danish, Norwegian, Finnish, Czech, Ukrainian, Greek, Hungarian, Thai, Vietnamese, Indonesian, Malay, Hebrew, Bulgarian, Croatian, Slovak, Slovenian, Serbian, Lithuanian, Latvian, Estonian, Catalan, Filipino, Swahili, Afrikaans.

🧹 Value Normalization

Automatic cleanup on matched fields:

  • Phone → E.164 format via libphonenumber (+16502530000)
  • Email → lowercase, trimmed
  • Names → title case with particle awareness ("jane van der berg""Jane van der Berg")
  • Addresses → excess whitespace collapsed, title-cased
  • Tags → normalized to list[str]

📦 Batch Processing

results = mapper.map_batch([contact1, contact2, contact3, ...])

📈 Rich Diagnostics

result = mapper.map_payload(data)

print(result.match_rate)        # 0.857
print(result.matched_count)     # 6
print(result.unmatched_count)   # 1
print(result.get_all_phones())  # ['+16502530000']
print(result.to_dict())         # Full JSON-serializable report

🔢 Nested Payload Support

# Flatten one level of nesting with depth=2
result = mapper.map_payload(
    {"contact": {"fname": "Jane", "lname": "Doe"}},
    depth=2,
)
# Accesses "contact.fname" and "contact.lname"

API Reference

ContactMapper

ContactMapper(
    *,
    patterns=None,        # Custom pattern dict (overrides built-in)
    patterns_path=None,   # Path to a custom patterns.json file
    normalize=True,       # Apply value normalization after mapping
    strategies=None,      # Override the default strategy pipeline
    languages=None,       # None=English only | "es" | ["es","fr"] | "all"
    overrides=None,       # Extra alias→canonical mappings {"MMERGE6": "company"}
)

Methods:

Method Description
identify(header, *, value) Resolve a single header to a FieldMatch
map_payload(payload, *, depth, extract_embedded_phones) Normalize an entire dict
map_batch(payloads, *, depth) Process a list of payloads
registry Access the underlying PatternRegistry

FieldMatch

FieldMatch(
    original='fname',
    canonical='first_name',
    confidence=1.0,
    strategy='exact',      # 'exact' | 'normalized' | 'fuzzy' | 'heuristic' | 'none'
    is_matched=True,
)

MappingResult

Attribute / Method Type Description
normalized dict Canonical key → cleaned value
unmapped dict Fields that couldn't be resolved
field_matches tuple[FieldMatch, ...] Full match detail for every input field
match_rate float Fraction of fields successfully matched
matched_count int Count of matched fields
unmatched_count int Count of unmatched fields
get_match(header) FieldMatch | None Look up the match for a specific input header
get_all_phones() list[str] All phone values across all phone-adjacent fields
to_dict() dict Full JSON-serializable report

CanonicalField

Enum of all 62 canonical fields. Inherits from str for JSON compatibility:

from rolodexter import CanonicalField

assert CanonicalField.EMAIL == "email"
assert CanonicalField.PHONE.value == "phone"
All 62 canonical fields

first_name · last_name · full_name · email · phone · home_phone · work_phone · fax · whatsapp · address · address_line_2 · city · state · postal_code · country · company · job_title · department · website · linkedin · twitter · instagram · facebook · birthday · gender · language · timezone · currency · tags · notes · source · source_id · source_service · subscribed · verified · created_at · updated_at · ip_address · user_agent · referrer · utm_source · utm_medium · utm_campaign · utm_term · utm_content · revenue · lifetime_value · company_size · industry · message · subject · salutation · suffix · nickname · middle_name · maiden_name · preferred_name · pronouns · age · annual_income · score · unknown

Custom Patterns

custom = {
    "fields": {
        "first_name": ["fname", "given", "nombre"],
        "loyalty_tier": ["tier", "vip_level", "membership"],
    }
}

mapper = ContactMapper(patterns=custom)

Architecture

rolodexter/
├── __init__.py      # Public API
├── core.py          # ContactMapper, PatternRegistry, strategies, normalizers
├── _phone.py        # E.164 phone parser (wraps libphonenumber)
├── i18n.py          # On-demand i18n generator (40 languages, cached)
├── patterns.json    # Master alias table (615+ aliases, 62 canonical fields)
└── i18n/            # Cached language files (generated on demand)

Contributing

git clone https://github.com/LunarWerx/rolodexter.git
cd rolodexter
pip install -e ".[dev]"
pytest

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rolodexter-2.6.4.tar.gz (66.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rolodexter-2.6.4-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file rolodexter-2.6.4.tar.gz.

File metadata

  • Download URL: rolodexter-2.6.4.tar.gz
  • Upload date:
  • Size: 66.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for rolodexter-2.6.4.tar.gz
Algorithm Hash digest
SHA256 1d8a32b95f3d635cb451fe15d5fa2ce37272d67ac9d5d9055d0f07e15d978f80
MD5 42c7f7ba5033963ac5c80ab9c8a789c3
BLAKE2b-256 e0e5b0506a884f245eb0f7d5fd749a8b02bf4d05dad56110c265819573b539cc

See more details on using hashes here.

File details

Details for the file rolodexter-2.6.4-py3-none-any.whl.

File metadata

  • Download URL: rolodexter-2.6.4-py3-none-any.whl
  • Upload date:
  • Size: 33.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for rolodexter-2.6.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b4e622e0736f7a846d8c67f8aaea9a74e4f4a8095da0d12c6c4db791e0095b1d
MD5 f92ab637909592eb1bd97b2578ff2105
BLAKE2b-256 1d0129f0c58075a45d1ad0a7268a1b94063ace37377c4daf515371a73dc699e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page