Clean, functional data processing for human-centric applications. Normalize and standardize names, emails, phones, departments, and job titles with a single unified API.
Project description
HumanMint v2
HumanMint cleans and normalizes messy contact data with one call. It standardizes names, emails, phones, addresses, departments, titles, and organizations. It's built for both public-sector data and B2B (CEOs, VPs, directors, managers) and ships with curated public-sector mappings.
from humanmint import mint
result = mint(
name="Dr. John Q. Smith, PhD",
email="JOHN.SMITH@CITY.GOV",
phone="(202) 555-0173 ext 456",
department="001 - Public Works Dept",
title="Chief of Police",
address="123 N. Main St Apt 4B, Madison, WI 53703",
organization="City of Madison Police Department",
)
result.name_standardized # "John Q Smith"
result.email_standardized # "john.smith@city.gov"
result.phone_pretty # "+1 202-555-0173"
result.department_canonical # "Public Works"
result.title_canonical # "police chief"
result.address_canonical # "123 N. Main St Apt 4B Madison WI 53703 US"
Multi-person splitting:
mint(name="John and Jane Smith", split_multi=True)
# -> [MintResult(John Smith), MintResult(Jane Smith)]
Why HumanMint
- General-purpose: works for government and B2B without swapping libraries.
- Real-world chaos: handles titles inside names, departments with codes/phones, smashed addresses, anti-scraper emails, casing quirks.
- Unique data: 23K+ department variants -> 64 categories; 73K+ titles with curated canonicals + BLS; context-aware title mapping.
- Safe defaults: length guards, optional aggressive cleaning, semantic conflict checks, bulk dedupe, multi-person name splitting.
- Fast: lazy imports for quick startup, process-based bulk for CPU-bound speed, built-in dedupe to avoid redundant work.
AI extraction (optional)
Install the ML extra (pip install humanmint[ml]) and pass text= with use_gliner=True to extract from unstructured text, then normalize. Structured fields you pass always win. GLiNER extraction is experimental; prefer structured inputs when available.
from humanmint.gliner import GlinerConfig
result = mint(text=signature_block, use_gliner=True, gliner_cfg=GlinerConfig(threshold=0.85))
Installation
pip install humanmint
# Optional extras:
# pip install humanmint[address] # usaddress parsing
# pip install humanmint[pandas] # DataFrame helpers
# pip install humanmint[ml] # GLiNER2 extraction
Quickstart
from humanmint import mint, compare, bulk
r1 = mint(name="Jane Doe", email="jane.doe@city.gov", department="Public Works", title="Engineer")
r2 = mint(name="J. Doe", email="JANE.DOE@CITY.GOV", department="PW Dept", title="Public Works Engineer")
score, why = compare(r1, r2, explain=True)
records = [
{"name": "Alice", "email": "alice@example.com"},
{"name": "Bob", "email": "bob@example.com"},
]
results = bulk(records, workers=4)
Access Patterns
- Dicts:
result.title["canonical"],result.department["canonical"],result.department["category"] - Properties:
name_standardized,title_canonical,department_canonical,email_standardized,phone_standardized,address_canonical,organization_canonical - Full dicts:
result.title,result.department,result.email, etc.
Recommended Properties
- Names:
name_standardized,name_first,name_last,name_middle,name_suffix,name_gender,name_nickname - Name extras:
name_salutation(Mr./Ms./Mx.) - Emails:
email_standardized,email_domain,email_is_valid,email_is_generic_inbox,email_is_free_provider - Phones:
phone_standardized,phone_e164,phone_pretty,phone_extension,phone_is_valid,phone_type,phone_location,phone_time_zones - Departments:
department_canonical,department_category,department_normalized,department_override - Titles:
title_canonical,title_raw,title_normalized,title_is_valid,title_confidence,title_seniority - Addresses:
address_canonical,address_raw,address_street,address_unit,address_city,address_state,address_zip,address_country - Organizations:
organization_raw,organization_normalized,organization_canonical,organization_confidence
Use result.get("email.is_valid") to fetch nested dict values via dot paths.
Comparing Records
from humanmint import compare
score, reasons = compare(r1, r2, explain=True) # 0->100
Batch & Export
from humanmint import bulk, export_json, export_csv, export_parquet, export_sql
# Process records in parallel
results = bulk(records, workers=4, progress=True)
# Export results to various formats
export_json(results, "out.json")
export_csv(results, "out.csv", flatten=True)
# Note: For per-record overrides (dept_overrides, title_overrides), include them in each record dict
records_with_overrides = [
{**rec, "dept_overrides": {"IT": "Information Technology"}}
for rec in records
]
results = bulk(records_with_overrides, workers=4)
CLI
humanmint clean input.csv output.csv --name-col name --email-col email --phone-col phone --dept-col department --title-col title
Performance (current)
- Cold import: ~0.5 s (with pandas installed).
- First call warm-up: ~0.5 s (loads caches).
- Bulk: process-based parallelism; throughput scales with cores and workload size.
Notes
- US-focused address parsing;
usaddressused when available, otherwise heuristics. - Optional deps (pandas, pyarrow, sqlalchemy, rich, tqdm) enhance exports and progress bars.
- Department and title datasets are curated and updated regularly for best accuracy.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file humanmint-2.0.1.tar.gz.
File metadata
- Download URL: humanmint-2.0.1.tar.gz
- Upload date:
- Size: 1.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51931e970d722ed8c278e346f4b07827397fdfac5d0b31cc5bcf18d1533a21ef
|
|
| MD5 |
c9fa26c8b26398ea8db01a862d3c3c65
|
|
| BLAKE2b-256 |
dcac3236f0622a55f0c53067befd30e69b6bee940b196a9e015547706a2c0fa9
|
File details
Details for the file humanmint-2.0.1-py3-none-any.whl.
File metadata
- Download URL: humanmint-2.0.1-py3-none-any.whl
- Upload date:
- Size: 2.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05504f2567671fb3624eb4df8d4c0d16adf791829949cb60ec4ab9890d8d8aa8
|
|
| MD5 |
b0492afc1ffdce7ba9d4ce8cdd00209b
|
|
| BLAKE2b-256 |
876e9ea2129224cfa497c5616c308f46b7c9bf327084eed431937524baa7d194
|