Clean, functional data processing for human-centric applications. Normalize and standardize names, emails, phones, departments, and job titles with a single unified API.
Project description
HumanMint
Clean, normalized contact data in one line of code.
Standardize names, emails, phones, addresses, departments, job titles, and organizations with intelligent parsing and fuzzy matching.
from humanmint import mint
result = mint(
name="Dr. John Q. Smith, PhD",
email="JOHN.SMITH@CITY.GOV",
phone="(202) 555-0173 ext 456",
department="001 - Public Works Dept",
title="Chief of Police"
)
print(result.name_str) # "John Q Smith"
print(result.email_str) # "john.smith@city.gov"
print(result.phone_str) # "+1 202-555-0173"
print(result.department_str) # "Public Works"
print(result.title_str) # "police chief"
Why HumanMint?
Real-world contact data is messy:
- Names with titles:
"Dr. Jane Smith, PhD" - Inconsistent formatting:
"JOHN@EXAMPLE.COM"vs"john.smith@example.com" - Phone number variations:
"(202) 555-0101 x101"vs"202.555.0101" - Departments with noise:
"000171 - Public Works 202-555-0150 ext 200" - Abbreviated titles:
"Sr. Water Engr."
HumanMint handles all of this with zero configuration.
Installation
pip install humanmint
Key Features
- Names: Parse, normalize, infer gender, detect nicknames, strip titles
- Emails: Validate, normalize, detect free providers (Gmail, Yahoo, etc.)
- Phones: Format (E.164), extract extensions, validate, detect type (mobile/landline)
- Departments: Canonicalize, categorize, fuzzy match (23K+ dept names → 64 categories)
- Titles: Standardize, match against curated list (100K+ job titles), confidence scores
- Addresses: Parse US postal addresses (street, city, state, ZIP)
- Organizations: Normalize agency/org names
- Comparison:
compare(result_a, result_b)for deduplication with 0-100 similarity scores - Batch: Parallel processing with
bulk(records, workers=4)for high throughput - Export: JSON, CSV, Parquet, SQL with flatten option for direct database import
Quick Examples
Field Accessor Reference
All fields provide three access patterns:
| Pattern | Example | Description |
|---|---|---|
| Dict access | result.title["canonical"] |
Access specific processing stages |
| Property | result.title_str |
Shorthand for canonical/standardized form |
| Full dict | result.title |
All stages: raw, normalized, canonical, is_valid |
Available Properties by Field
Names:
name_str- Full namename_first- First namename_last- Last namename_middle- Middle namename_suffix- Suffix (Jr., Sr., etc.)name_gender- Inferred gender
Emails:
email_str- Normalized emailemail_domain- Domain partemail_valid- Is valid emailemail_generic- Is generic inbox (info@, admin@)email_free- Is free provider (Gmail, Yahoo)
Phones:
phone_str- Formatted phone (pretty or E.164)phone_e164- E.164 format (+12025550123)phone_pretty- Pretty format (+1 202-555-0123)phone_extension- Extension numberphone_valid- Is valid phonephone_type- Type (MOBILE, FIXED_LINE, etc.)
Departments:
department_str- Canonical department namedepartment_category- Department categorydepartment_normalized- Normalized (pre-canonical)department_override- Was override applied
Titles:
title_str- Canonical titletitle_raw- Original inputtitle_normalized- Normalized (intermediate)title_canonical- Standardized formtitle_valid- Is valid titletitle_confidence- Confidence score (0.0-1.0)
Addresses:
address_str/address_canonical- Full formatted addressaddress_raw- Original inputaddress_street- Street addressaddress_unit- Unit/apartment numberaddress_city- Cityaddress_state- Stateaddress_zip- ZIP codeaddress_country- Country
Organizations:
organization_raw- Original inputorganization_normalized- Normalized formorganization_canonical- Canonical formorganization_confidence- Confidence score (0.0-1.0)
Accessing title fields
result = mint(title="Chief of Police")
# Dict access - different processing stages
result.title["raw"] # "Chief of Police" (original input)
result.title["normalized"] # "Chief of Police" (cleaned)
result.title["canonical"] # "police chief" (standardized form)
result.title["is_valid"] # True
# Shorthand properties
result.title_str # "police chief" (same as canonical)
result.title_normalized # "Chief of Police"
Comparing records
from humanmint import compare
r1 = mint(name="John Smith", email="john@example.com")
r2 = mint(name="Jon Smith", email="john.smith@example.com")
score = compare(r1, r2) # Returns 0-100 similarity score
# Typically: >85 = likely duplicate, >70 = similar, <50 = different
Batch processing
from humanmint import bulk
records = [
{"name": "Alice", "email": "alice@example.com"},
{"name": "Bob", "email": "bob@example.com"},
]
results = bulk(records, workers=4, progress=True)
Performance
| Dataset | Time | Per Record | Throughput |
|---|---|---|---|
| 1,000 | 561 ms | 0.56 ms | 1,783 rec/sec |
| 10,000 | 3.1 s | 0.31 ms | 3,178 rec/sec |
| 50,000 | 14.0 s | 0.28 ms | 3,576 rec/sec |
Documentation
- API Reference — Full function documentation
- Use Cases — Real-world examples (Government contacts, HR, Salesforce, etc.)
- Fields Guide — Access all returned fields
- Advanced — Custom weights, overrides, batch export
CLI
humanmint clean input.csv output.csv --name-col name --email-col email
Testing
pytest -q unittests
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file humanmint-0.1.12.tar.gz.
File metadata
- Download URL: humanmint-0.1.12.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7938264217c1543dacfb3ef8606af9a3281b89bc33cfdf6361afc3cccd89d132
|
|
| MD5 |
ea3371b523d24e74f6774d8833c270b2
|
|
| BLAKE2b-256 |
dd596dae9f9ad9115947033523869c0ba4b8e18e94a7d1b7d56ecdf6c101fd7a
|
File details
Details for the file humanmint-0.1.12-py3-none-any.whl.
File metadata
- Download URL: humanmint-0.1.12-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
949d1ef1ca86812beb644b18a40d8e863d58404e697eb52ffa79ae7a66063ccb
|
|
| MD5 |
05a5a94444387a34d73baa72c4ce6402
|
|
| BLAKE2b-256 |
bcbb01dde0111263f7d9c4fd97cfb596c58cd2c04101073ccd4fcfe4e542de51
|