The universal contact field mapper — route messy, inconsistent contact data to a clean canonical schema.
Project description
The universal contact field mapper.
Route messy, inconsistent contact data from any source to a clean, canonical schema.
The Problem
Every CRM, email platform, and CSV export uses different field names for the same data:
| Service | First Name | Phone | Company |
|---|---|---|---|
| HubSpot | firstname |
mobilephone |
company |
| Salesforce | FirstName |
MobilePhone |
Company |
| Mailchimp | FNAME |
PHONE |
COMPANY |
| Google CSV | Given Name |
Phone 1 - Value |
Organization 1 - Name |
| Random CSV | Column A |
Column B |
Column C |
The Solution
from rolodexter import ContactMapper
mapper = ContactMapper()
result = mapper.map_payload({
"fname": "jane",
"surname": "doe",
"mobile": "+1-650-253-0000",
"employer": "Tech Corp",
"Column 1": "jane.doe@example.com", # auto-detected by value shape
})
print(result.normalized)
# {
# "first_name": "Jane",
# "last_name": "Doe",
# "phone": "+16502530000",
# "company": "Tech Corp",
# "email": "jane.doe@example.com"
# }
Installation
# Core (phonenumbers + nameparser)
pip install rolodexter
# With fuzzy matching for typo recovery
pip install rolodexter[fuzzy]
# With on-demand i18n translation (40 languages)
pip install rolodexter[i18n]
# Everything
pip install rolodexter[all]
# Development
pip install rolodexter[dev]
Features
🎯 Four-Layer Matching Pipeline
Every field runs through the strategy chain in priority order:
- Exact Match — O(1) lookup against 615+ known aliases across 62 canonical fields
- Normalized Match — handles
CamelCase,dot.path,space → underscore, and similar variations - Fuzzy Match —
rapidfuzzcatches typos like"phne_nmbr"→phone - Heuristic Match — regex detects emails, phones, URLs, postal codes by data shape
📊 Confidence Scoring
Every match comes with a confidence score (0.0–1.0):
match = mapper.identify("fname")
# FieldMatch(original='fname', canonical='first_name', confidence=1.0, strategy='exact')
match = mapper.identify("phne")
# FieldMatch(original='phne', canonical='phone', confidence=0.85, strategy='fuzzy')
match = mapper.identify("Column X", value="jane@test.com")
# FieldMatch(original='Column X', canonical='email', confidence=0.6, strategy='heuristic')
� Per-Caller Field Overrides
For vendor-specific or account-level field names that won't be in the standard alias table:
mapper = ContactMapper(
overrides={
"MMERGE6": "company", # Mailchimp custom merge field
"cf_lead_score": "tags",
}
)
📱 Phone Extraction
# Extract phones embedded in arbitrary string values
result = mapper.map_payload(
{"notes": "call me at +1-650-253-0000 or +44 20 7946 0958"},
extract_embedded_phones=True,
)
print(result.get_all_phones())
# ['+16502530000', '+442079460958']
🗂️ Tags / List Fields
Fields like tags are automatically list-normalised — comma-separated strings, JSON arrays, and Python lists all collapse to a clean list:
result = mapper.map_payload({"tags": "vip, newsletter, beta"})
print(result.normalized["tags"])
# ['vip', 'newsletter', 'beta']
🌍 On-Demand i18n (40 Languages)
English ships by default. Request any of 40 supported languages and aliases are generated on the fly via Google Translate, then cached so translation only happens once:
from rolodexter import ContactMapper
# Load Spanish aliases on demand
mapper = ContactMapper(languages=["es"])
result = mapper.map_payload({"correo_electronico": "juan@example.com"})
print(result.normalized["email"]) # juan@example.com
# CLI: generate and cache all 40 languages
python -m rolodexter.i18n
# Or specific languages
python -m rolodexter.i18n --languages es,fr,de
# List supported languages
python -m rolodexter.i18n --list
Supported: Spanish, French, German, Portuguese, Italian, Dutch, Polish, Romanian, Turkish, Russian, Japanese, Chinese (Simplified), Korean, Arabic, Hindi, Swedish, Danish, Norwegian, Finnish, Czech, Ukrainian, Greek, Hungarian, Thai, Vietnamese, Indonesian, Malay, Hebrew, Bulgarian, Croatian, Slovak, Slovenian, Serbian, Lithuanian, Latvian, Estonian, Catalan, Filipino, Swahili, Afrikaans.
🧹 Value Normalization
Automatic cleanup on matched fields:
- Phone → E.164 format via libphonenumber (
+16502530000) - Email → lowercase, trimmed
- Names → title case with particle awareness (
"jane van der berg"→"Jane van der Berg") - Addresses → excess whitespace collapsed, title-cased
- Tags → normalized to
list[str]
📦 Batch Processing
results = mapper.map_batch([contact1, contact2, contact3, ...])
📈 Rich Diagnostics
result = mapper.map_payload(data)
print(result.match_rate) # 0.857
print(result.matched_count) # 6
print(result.unmatched_count) # 1
print(result.get_all_phones()) # ['+16502530000']
print(result.to_dict()) # Full JSON-serializable report
🔢 Nested Payload Support
# Flatten one level of nesting with depth=2
result = mapper.map_payload(
{"contact": {"fname": "Jane", "lname": "Doe"}},
depth=2,
)
# Accesses "contact.fname" and "contact.lname"
API Reference
ContactMapper
ContactMapper(
*,
patterns=None, # Custom pattern dict (overrides built-in)
patterns_path=None, # Path to a custom patterns.json file
normalize=True, # Apply value normalization after mapping
strategies=None, # Override the default strategy pipeline
languages=None, # None=English only | "es" | ["es","fr"] | "all"
overrides=None, # Extra alias→canonical mappings {"MMERGE6": "company"}
)
Methods:
| Method | Description |
|---|---|
identify(header, *, value) |
Resolve a single header to a FieldMatch |
map_payload(payload, *, depth, extract_embedded_phones) |
Normalize an entire dict |
map_batch(payloads, *, depth) |
Process a list of payloads |
registry |
Access the underlying PatternRegistry |
FieldMatch
FieldMatch(
original='fname',
canonical='first_name',
confidence=1.0,
strategy='exact', # 'exact' | 'normalized' | 'fuzzy' | 'heuristic' | 'none'
is_matched=True,
)
MappingResult
| Attribute / Method | Type | Description |
|---|---|---|
normalized |
dict |
Canonical key → cleaned value |
unmapped |
dict |
Fields that couldn't be resolved |
field_matches |
tuple[FieldMatch, ...] |
Full match detail for every input field |
match_rate |
float |
Fraction of fields successfully matched |
matched_count |
int |
Count of matched fields |
unmatched_count |
int |
Count of unmatched fields |
get_match(header) |
FieldMatch | None |
Look up the match for a specific input header |
get_all_phones() |
list[str] |
All phone values across all phone-adjacent fields |
to_dict() |
dict |
Full JSON-serializable report |
CanonicalField
Enum of all 62 canonical fields. Inherits from str for JSON compatibility:
from rolodexter import CanonicalField
assert CanonicalField.EMAIL == "email"
assert CanonicalField.PHONE.value == "phone"
All 62 canonical fields
first_name · last_name · full_name · email · phone · home_phone · work_phone · fax · whatsapp · address · address_line_2 · city · state · postal_code · country · company · job_title · department · website · linkedin · twitter · instagram · facebook · birthday · gender · language · timezone · currency · tags · notes · source · source_id · source_service · subscribed · verified · created_at · updated_at · ip_address · user_agent · referrer · utm_source · utm_medium · utm_campaign · utm_term · utm_content · revenue · lifetime_value · company_size · industry · message · subject · salutation · suffix · nickname · middle_name · maiden_name · preferred_name · pronouns · age · annual_income · score · unknown
Custom Patterns
custom = {
"fields": {
"first_name": ["fname", "given", "nombre"],
"loyalty_tier": ["tier", "vip_level", "membership"],
}
}
mapper = ContactMapper(patterns=custom)
Architecture
rolodexter/
├── __init__.py # Public API
├── core.py # ContactMapper, PatternRegistry, strategies, normalizers
├── _phone.py # E.164 phone parser (wraps libphonenumber)
├── i18n.py # On-demand i18n generator (40 languages, cached)
├── patterns.json # Master alias table (615+ aliases, 62 canonical fields)
└── i18n/ # Cached language files (generated on demand)
Contributing
git clone https://github.com/LunarWerx/rolodexter.git
cd rolodexter
pip install -e ".[dev]"
pytest
License
MIT — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rolodexter-2.6.4.tar.gz.
File metadata
- Download URL: rolodexter-2.6.4.tar.gz
- Upload date:
- Size: 66.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d8a32b95f3d635cb451fe15d5fa2ce37272d67ac9d5d9055d0f07e15d978f80
|
|
| MD5 |
42c7f7ba5033963ac5c80ab9c8a789c3
|
|
| BLAKE2b-256 |
e0e5b0506a884f245eb0f7d5fd749a8b02bf4d05dad56110c265819573b539cc
|
File details
Details for the file rolodexter-2.6.4-py3-none-any.whl.
File metadata
- Download URL: rolodexter-2.6.4-py3-none-any.whl
- Upload date:
- Size: 33.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4e622e0736f7a846d8c67f8aaea9a74e4f4a8095da0d12c6c4db791e0095b1d
|
|
| MD5 |
f92ab637909592eb1bd97b2578ff2105
|
|
| BLAKE2b-256 |
1d0129f0c58075a45d1ad0a7268a1b94063ace37377c4daf515371a73dc699e2
|