Clean, functional data processing for human-centric applications. Normalize and standardize names, emails, phones, departments, and job titles with a single unified API.
Project description
HumanMint
Clean, normalized contact data in one line of code.
Standardize names, emails, phones, addresses, departments, job titles, and organizations with intelligent parsing and fuzzy matching.
from humanmint import mint
result = mint(
name="Dr. John Q. Smith, PhD",
email="JOHN.SMITH@CITY.GOV",
phone="(202) 555-0173 ext 456",
department="001 - Public Works Dept",
title="Chief of Police"
)
print(result.name_str) # "John Q Smith"
print(result.email_str) # "john.smith@city.gov"
print(result.phone_str) # "+1 202-555-0173"
print(result.department_str) # "Public Works"
print(result.title_str) # "police chief"
Why HumanMint?
Real-world contact data is messy:
- Names with titles:
"Dr. Jane Smith, PhD" - Inconsistent formatting:
"JOHN@EXAMPLE.COM"vs"john.smith@example.com" - Phone number variations:
"(202) 555-0101 x101"vs"202.555.0101" - Departments with noise:
"000171 - Public Works 202-555-0150 ext 200" - Abbreviated titles:
"Sr. Water Engr."
HumanMint handles all of this with zero configuration.
Installation
pip install humanmint
Key Features
- Names: Parse, normalize, infer gender, detect nicknames, strip titles
- Emails: Validate, normalize, detect free providers (Gmail, Yahoo, etc.)
- Phones: Format (E.164), extract extensions, validate, detect type (mobile/landline)
- Departments: Canonicalize, categorize, fuzzy match (23K+ dept names → 64 categories)
- Titles: Three-tier matching system (73K+ job titles + 133 canonicals + 4.8K BLS), confidence scores
- Addresses: Parse US postal addresses (street, city, state, ZIP)
- Organizations: Normalize agency/org names
- Comparison:
compare(result_a, result_b)for deduplication with 0-100 similarity scores - Batch: Parallel processing with
bulk(records, workers=4)for high throughput - Export: JSON, CSV, Parquet, SQL with flatten option for direct database import
Quick Examples
Field Accessor Reference
All fields provide three access patterns:
| Pattern | Example | Description |
|---|---|---|
| Dict access | result.title["canonical"] |
Access specific processing stages |
| Property | result.title_str |
Shorthand for canonical/standardized form |
| Full dict | result.title |
All stages: raw, normalized, canonical, is_valid |
Available Properties by Field
Names:
name_str- Full namename_first- First namename_last- Last namename_middle- Middle namename_suffix- Suffix (Jr., Sr., etc.)name_gender- Inferred gender
Emails:
email_str- Normalized emailemail_domain- Domain partemail_valid- Is valid emailemail_generic- Is generic inbox (info@, admin@)email_free- Is free provider (Gmail, Yahoo)
Phones:
phone_str- Formatted phone (pretty or E.164)phone_e164- E.164 format (+12025550123)phone_pretty- Pretty format (+1 202-555-0123)phone_extension- Extension numberphone_valid- Is valid phonephone_type- Type (MOBILE, FIXED_LINE, etc.)
Departments:
department_str- Canonical department namedepartment_category- Department categorydepartment_normalized- Normalized (pre-canonical)department_override- Was override applied
Titles:
title_str- Canonical titletitle_raw- Original inputtitle_normalized- Normalized (intermediate)title_canonical- Standardized formtitle_valid- Is valid titletitle_confidence- Confidence score (0.0-1.0)
Addresses:
address_str/address_canonical- Full formatted addressaddress_raw- Original inputaddress_street- Street addressaddress_unit- Unit/apartment numberaddress_city- Cityaddress_state- Stateaddress_zip- ZIP codeaddress_country- Country
Organizations:
organization_raw- Original inputorganization_normalized- Normalized formorganization_canonical- Canonical formorganization_confidence- Confidence score (0.0-1.0)
Accessing title fields
result = mint(title="Chief of Police")
# Dict access - different processing stages
result.title["raw"] # "Chief of Police" (original input)
result.title["normalized"] # "Chief of Police" (cleaned)
result.title["canonical"] # "police chief" (standardized form)
result.title["is_valid"] # True
result.title["confidence"] # 0.98 (confidence score)
# Shorthand properties
result.title_str # "police chief" (same as canonical)
result.title_normalized # "Chief of Police"
result.title_confidence # 0.98
Title Matching Strategy: Three-Tier System
HumanMint uses an intelligent three-tier matching system to handle 73K+ real-world job titles:
Tier 1: Job Titles Database (73,380 titles)
- Exact matching against real government job titles
- Fuzzy matching for spelling variations and abbreviations
- Examples: "Driver" → "driver", "Dvr" → "driver" (0.92 confidence)
- High confidence: 0.98 for exact matches, 0.75+ for fuzzy matches
Tier 2: Canonical Titles (133 curated standardized forms)
- Fallback when Tier 1 doesn't match
- Includes BLS official titles (4,800+)
- Examples: "Chief of Police" → "police chief" (standardized form)
- Confidence: 0.75-0.95 based on match quality
Tier 3: Enrichment
- Department context for disambiguation
- BLS official categorization
- Confidence scoring based on match strength
Results:
- 100% success rate on complex government titles
- Example: "Deputy Chief Financial Officer" → "deputy chief financial officer" (0.93)
- Example: "Environmental Health Specialist" → "environmental health specialist" (0.98)
Comparing records
from humanmint import compare
r1 = mint(name="John Smith", email="john@example.com")
r2 = mint(name="Jon Smith", email="john.smith@example.com")
score = compare(r1, r2) # Returns 0-100 similarity score
# Typically: >85 = likely duplicate, >70 = similar, <50 = different
Batch processing
from humanmint import bulk
records = [
{"name": "Alice", "email": "alice@example.com"},
{"name": "Bob", "email": "bob@example.com"},
]
results = bulk(records, workers=4, progress=True)
Performance
| Dataset | Time | Per Record | Throughput |
|---|---|---|---|
| 1,000 | 561 ms | 0.56 ms | 1,783 rec/sec |
| 10,000 | 3.1 s | 0.31 ms | 3,178 rec/sec |
| 50,000 | 14.0 s | 0.28 ms | 3,576 rec/sec |
Documentation
- API Reference — Full function documentation
- Use Cases — Real-world examples (Government contacts, HR, Salesforce, etc.)
- Fields Guide — Access all returned fields
- Advanced — Custom weights, overrides, batch export
CLI
humanmint clean input.csv output.csv --name-col name --email-col email
Testing
pytest -q unittests
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file humanmint-2.0.0b3.tar.gz.
File metadata
- Download URL: humanmint-2.0.0b3.tar.gz
- Upload date:
- Size: 1.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
edac1c8518b01cc722a6fbaecb5fabf530e8a7af717e719bc2e45983a669abef
|
|
| MD5 |
c612d42c6aa401e6e81ebd8d2c73fe3a
|
|
| BLAKE2b-256 |
7f84e5a3fbdcc4af0992f8e6452f9b28902a1f93711a391d41e9c1cab57ee720
|
File details
Details for the file humanmint-2.0.0b3-py3-none-any.whl.
File metadata
- Download URL: humanmint-2.0.0b3-py3-none-any.whl
- Upload date:
- Size: 1.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76d6961f23bbc2f559c4189f0bf4eee21085fb5c01896dd4716251782c33fece
|
|
| MD5 |
38970836a4f502643a088c19429aa5bb
|
|
| BLAKE2b-256 |
0d8b1fda95dca920b5a674caab323c409fbe24b3970952616d9c23eafea40b4e
|