Skip to main content

The universal contact field mapper โ€” route messy, inconsistent contact data to a clean canonical schema.

Project description

๐Ÿ“‡ Rolodexter

The universal contact field mapper.

Route messy, inconsistent contact data from any source to a clean, canonical schema.

CI PyPI Python License: MIT


The Problem

Every CRM, email platform, and CSV export uses different field names for the same data:

Service First Name Phone Company
HubSpot firstname mobilephone company
Salesforce FirstName MobilePhone Company
Mailchimp FNAME PHONE COMPANY
Google CSV Given Name Phone 1 - Value Organization 1 - Name
Random CSV Column A Column B Column C

The Solution

from rolodexter import ContactMapper

mapper = ContactMapper()

result = mapper.map_payload({
    "fname": "jane",
    "surname": "doe",
    "mobile": "+1-555-019-9876",
    "employer": "Tech Corp",
    "Column 1": "jane.doe@example.com",  # auto-detected by shape
})

print(result.normalized)
# {
#     "first_name": "Jane",
#     "last_name": "Doe",
#     "phone": "+15550199876",
#     "company": "Tech Corp",
#     "email": "jane.doe@example.com"
# }

Installation

# Core (zero dependencies)
pip install rolodexter

# With fuzzy matching for typo recovery
pip install rolodexter[fuzzy]

# With on-demand i18n translation (40 languages)
pip install rolodexter[i18n]

# Everything
pip install rolodexter[all]

# Development
pip install rolodexter[dev]

Features

๐ŸŽฏ Four-Layer Matching Pipeline

Every field runs through the strategy chain in priority order:

  1. Service Match โ€” instant lookup against 20+ platform-specific dictionaries
  2. Exact Match โ€” O(1) hit against 300+ known aliases
  3. Fuzzy Match โ€” rapidfuzz catches typos like "phne_nmbr" โ†’ phone
  4. Heuristic Match โ€” regex detects emails, phones, URLs, postal codes by data shape

๐Ÿ“Š Confidence Scoring

Every match comes with a confidence score (0.0โ€“1.0):

match = mapper.identify("fname")
# FieldMatch(original='fname', canonical='first_name', confidence=1.0, strategy='exact')

match = mapper.identify("phne")
# FieldMatch(original='phne', canonical='phone', confidence=0.85, strategy='fuzzy')

match = mapper.identify("Column X", value="jane@test.com")
# FieldMatch(original='Column X', canonical='email', confidence=0.6, strategy='heuristic')

๐Ÿ”Œ 20+ Service Profiles

Built-in mappings for:

CRM / Sales Email / Marketing Productivity Other
HubSpot Mailchimp Google Contacts Stripe
Salesforce SendGrid Apple Contacts Notion
Pipedrive Brevo (Sendinblue) Outlook Airtable
Zoho ConvertKit (Kit) LinkedIn Export โ€”
Close CRM ActiveCampaign โ€” โ€”
Freshsales Omnisend โ€” โ€”
โ€” Beehiiv โ€” โ€”
โ€” Resend โ€” โ€”
โ€” Intercom โ€” โ€”

๐ŸŒ On-Demand i18n (40 Languages)

English ships by default. Request any of 40 supported languages and aliases are generated on the fly via Google Translate, then cached so translation only happens once:

from rolodexter import ContactMapper

# Load Spanish aliases on demand
mapper = ContactMapper(languages=["es"])
result = mapper.map_payload({"correo_electronico": "juan@example.com"})
print(result.normalized["email"])  # juan@example.com
# CLI: generate and cache all 40 languages
python -m rolodexter.i18n

# Or specific languages
python -m rolodexter.i18n --languages es,fr,de

# List supported languages
python -m rolodexter.i18n --list

๐Ÿ”„ Cross-Service Translation

# Translate HubSpot data directly to Salesforce schema
salesforce_data = mapper.translate(
    hubspot_payload,
    from_service="hubspot",
    to_service="salesforce",
)

๐Ÿงน Value Normalization

Automatic cleanup on matched fields:

  • Phone โ†’ strips formatting, adds + for international
  • Email โ†’ lowercase, trimmed
  • Names โ†’ title case with particle awareness ("jane van der berg" โ†’ "Jane van der Berg")
  • Addresses โ†’ excess whitespace collapsed, title-cased

๐Ÿ“ฆ Batch Processing

results = mapper.map_batch([contact1, contact2, contact3, ...])

๐Ÿ“ˆ Rich Diagnostics

result = mapper.map_payload(data)

print(result.match_rate)      # 0.857
print(result.matched_count)   # 6
print(result.unmatched_count)  # 1
print(result.to_dict())       # Full JSON-serializable report

API Reference

ContactMapper

ContactMapper(
    *,
    patterns=None,           # Custom pattern dict
    patterns_path=None,      # Path to custom patterns.json
    default_service=None,    # Default service profile
    normalize=True,          # Apply value normalization
    strategies=None,         # Override strategy pipeline
    languages=None,          # i18n: None=English only, "es", ["es","fr"], "all"
)

Methods:

Method Description
identify(header, *, value, service) Resolve a single field header
map_payload(payload, *, service) Normalize an entire dict
map_batch(payloads, *, service) Process multiple payloads
translate(payload, *, from_service, to_service) Cross-service translation

CanonicalField

Enum of all 50+ canonical fields. Inherits from str for JSON compatibility:

from rolodexter import CanonicalField

assert CanonicalField.EMAIL == "email"
assert CanonicalField.PHONE.value == "phone"

Custom Patterns

custom = {
    "fields": {
        "first_name": ["fname", "given", "nombre"],
        "loyalty_tier": ["tier", "vip_level", "membership"],
    },
    "services": {
        "my_crm": {
            "contact_first": "first_name",
            "loyalty": "loyalty_tier",
        }
    }
}

mapper = ContactMapper(patterns=custom)

Architecture

rolodexter/
โ”œโ”€โ”€ __init__.py          # Public API
โ”œโ”€โ”€ core.py              # ContactMapper, PatternRegistry, strategies, normalizers
โ”œโ”€โ”€ _phone.py            # Built-in E.164 phone parser (zero deps)
โ”œโ”€โ”€ i18n.py              # On-demand i18n generator (40 languages, cached)
โ””โ”€โ”€ _data/
    โ”œโ”€โ”€ patterns.json    # Master truth table (550+ aliases, 20+ services)
    โ””โ”€โ”€ i18n/            # Cached language files (generated on demand)

Contributing

git clone https://github.com/rolodexter/rolodexter.git
cd rolodexter
pip install -e ".[dev]"
pytest

License

MIT โ€” see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rolodexter-2.6.0.tar.gz (63.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rolodexter-2.6.0-py3-none-any.whl (32.5 kB view details)

Uploaded Python 3

File details

Details for the file rolodexter-2.6.0.tar.gz.

File metadata

  • Download URL: rolodexter-2.6.0.tar.gz
  • Upload date:
  • Size: 63.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for rolodexter-2.6.0.tar.gz
Algorithm Hash digest
SHA256 56c6454eb6b703463714752a94ae724ddc48d55b9d5d111bb41b089ce2caa688
MD5 7190152815289f3e2832ed9b7ca55c6d
BLAKE2b-256 7631aa200e0b3eb751660df2ba6246d1ef68cc57aa2f3cc936d5432cf8256d3f

See more details on using hashes here.

File details

Details for the file rolodexter-2.6.0-py3-none-any.whl.

File metadata

  • Download URL: rolodexter-2.6.0-py3-none-any.whl
  • Upload date:
  • Size: 32.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for rolodexter-2.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 196b837625fc58fc93f502318ad978223e759d2bd38c163e30c4182102037eca
MD5 ad29650ed2aae8ec02d59ab55db8001e
BLAKE2b-256 917784e57d5897d7f3c73d619a73dc4c675ebe12984c41b55ebaaf53071b0e7f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page