Clean, functional data processing for human-centric applications. Normalize and standardize names, emails, phones, departments, and job titles with a single unified API.

These details have not been verified by PyPI

Project links

Project description

HumanMint

Clean, functional data processing for human-centric applications. Normalize and standardize names, emails, phones, addresses, departments, job titles, and organizations with a single unified API.

Installation

pip install humanmint

Optional Dependencies

Install with optional features for advanced use cases:

# Development tools (testing, linting, type checking)
pip install humanmint[dev]

# US address parsing support
pip install humanmint[address]

# Pandas integration for DataFrames
pip install humanmint[pandas]

# All optional features
pip install humanmint[dev,address,pandas]

Feature Details:

dev: Includes pytest, black, flake8, mypy, and faker for development and testing
address: Enables advanced US postal address parsing with usaddress (optional—falls back to basic regex parsing)
pandas: Adds DataFrame integration via df.humanmint.clean(...) accessor (optional—gracefully skipped if not installed)

Note on Optional Dependencies:

Address without usaddress: The normalize_address() function still works! It falls back to basic regex-based parsing (street, city, state, ZIP extraction). For advanced address parsing, install the address extra.
Pandas not installed: The base library works fine without pandas. The DataFrame accessor (df.humanmint.clean()) is silently skipped if pandas isn't available, so your code won't break.

Quick Start

from humanmint import mint

result = mint(
    name="Dr. John Smith",
    email="JOHN.SMITH@CITY.GOV",
    phone="(202) 555-0173",  # Use real area codes; 555 is reserved for fictional use
    address="123 Main St, Springfield, IL 62701",
    department="Public Works 850-123-1234 ext 200",
    title="Chief of Police"
)

print(result.name)              # {'full': 'John Smith', 'first': 'John', 'last': 'Smith', 'gender': 'Male'}
print(result.email)             # 'john.smith@city.gov'
print(result.phone)             # '+1 202-555-0173'
print(result.address)           # {'street': '123 Main St', 'city': 'Springfield', 'state': 'IL', 'zip': '62701'}
print(result.department)        # 'Public Works'
print(result.department_category) # 'Infrastructure'
print(result.title)             # {'canonical': 'police chief', 'is_valid': True, ...}

Why HumanMint?

Problem

Real-world contact data is messy:

Names with titles: "Dr. Jane Smith, PhD"
Phone numbers in various formats: "(202) 555-0123" vs "202.555.0123" (use real area codes; 555 is reserved for fiction)
Departments with codes and noise: "000171 - Public Works 202-555-0150 ext 200"
Job titles that need standardization: "Chief of Police" -> canonical form
Emails with inconsistent casing: "JOHN.SMITH@EXAMPLE.COM"
Single names that need gender inference: "Jo", "Al", "Ty"

Solution

HumanMint cleans and standardizes everything in one call.

Features

Unified API: One mint() call to normalize names, emails, phones, addresses, departments, titles, and organizations.
Rich names: Normalize, enrich, infer gender, detect nicknames; handles single, hyphenated, and titled names.
Emails: Lowercasing, validation, generic inbox detection, domain extraction, free provider detection.
Phones: E164/pretty formats, extension extraction, validation, type detection (mobile/landline/fax), VoIP and impossible number detection.
Addresses: US postal address parsing (street, city, state, ZIP).
Departments: 23,452 mappings to 64 canonicals with fuzzy matching, categorization, and custom overrides.
Titles: Canonicalization and fuzzy matching against curated heuristics with confidence scores.
Organizations: Normalize agency/organization names by removing civic prefixes and suffixes.
Pandas: df.humanmint.clean(...) accessor with heuristic column guessing or explicit name_col/email_col/... mapping.
CLI: humanmint clean input.csv output.csv with auto-guessing or explicit column flags.
Batch processing: Parallel processing with bulk() for handling large datasets.
Gzip-backed data: Reference data ships as .json.gz caches for fast loads; raw sources live under src/humanmint/data/original/.
Ethics: Gender inference is probabilistic from historical name data and not a determination of identity; downstream use should respect that.

API Reference

Core Functions

`mint(name, email, phone, address, department, title, organization, ...)`

One function. All your data cleaned.

from humanmint import mint

result = mint(
    name="Jane Doe",                    # optional
    email="jane@example.com",           # optional
    phone="(202) 555-0172",             # optional (use real area codes)
    address="123 Main St, City, ST ZIP", # optional
    department="Water Utilities",       # optional
    title="Chief of Water",             # optional
    organization="City of Springfield"  # optional
)

# Access cleaned data
result.name                 # dict: {full, first, last, middle, suffix, gender}
result.email                # str
result.phone                # str (E.164 format)
result.address              # dict: {street, city, state, zip}
result.department           # str (canonical)
result.department_category  # str
result.title                # dict: {raw, cleaned, canonical, is_valid, confidence}
result.organization         # dict: {raw, normalized, canonical, confidence}

# Convert to dict for JSON
result.model_dump()

`bulk(records, workers=4, progress=False)`

Process multiple records in parallel.

from humanmint import bulk

records = [
    {"name": "Alice", "email": "alice@example.com"},
    {"name": "Bob", "email": "bob@example.com"},
]

results = bulk(records, workers=4, progress=True)
# Returns: list[MintResult]

`compare(result_a, result_b)`

Compare two normalized records and return similarity score (0-100).

from humanmint import mint, compare

r1 = mint(name="John Smith", email="john@example.com")
r2 = mint(name="Jon Smith", email="john.smith@example.com")

similarity = compare(r1, r2)  # Returns float 0-100

Names Module

Import: from humanmint.names import ...

Function	Purpose	Returns
`normalize_name(raw)`	Parse and normalize a full name	Dict: {first, last, middle, suffix, full, canonical, is_valid}
`infer_gender(first_name, confidence=False)`	Infer gender from first name	Dict: {gender, confidence} or str
`enrich_name(normalized_dict, include_gender=True)`	Add gender/enrichment data to normalized name	Dict with enriched fields
`detect_nickname(first_name)`	Detect if a name is a nickname, return canonical form	Optional[str]
`get_nickname_variants(canonical_name)`	Get all known nicknames for a name	set[str]
`get_name_equivalents(name)`	Get all equivalent forms (nicknames + canonicals)	set[str]
`compare_first_names(name1, name2, use_nicknames=True)`	Compare two first names with fuzzy matching	float (0-1)
`compare_last_names(last1, last2)`	Compare two last names	float (0-1)
`match_names(raw1, raw2, strict=False)`	Full name matching with detailed scoring	Dict: {score, is_match, reasons, ...}

Examples:

from humanmint.names import normalize_name, infer_gender, detect_nickname

# Normalize
result = normalize_name("Dr. Jane Smith, PhD")
# {'first': 'Jane', 'last': 'Smith', 'full': 'Jane Smith', 'gender': None, ...}

# Infer gender
gender = infer_gender("Jane")  # {'gender': 'Female', 'confidence': 0.98}

# Detect nickname
canonical = detect_nickname("Bobby")  # 'Robert'
variants = get_name_equivalents("Robert")  # {'Robert', 'Bob', 'Bobby', 'Rob', ...}

Emails Module

Import: from humanmint.emails import ...

Function	Purpose	Returns
`normalize_email(raw, generic_inboxes=None)`	Normalize email and extract metadata	Dict: {email, local, domain, is_generic, is_free_provider, is_valid}
`is_free_provider(domain)`	Check if domain is a free email provider	bool
`guess_email(name, domain, known=[])`	Guess likely email pattern from known examples	str (email or empty)
`get_pattern_scores(known)`	Analyze known emails and return detected patterns	list[(pattern_id, confidence)]
`describe_pattern(pattern_id)`	Get documentation for an email pattern	Optional[Dict]

Examples:

from humanmint.emails import normalize_email, guess_email, is_free_provider

# Normalize
result = normalize_email("JOHN.SMITH@GMAIL.COM")
# {'email': 'john.smith@gmail.com', 'domain': 'gmail.com', 'is_free_provider': True, ...}

# Check if free
is_free = is_free_provider("gmail.com")  # True

# Guess email pattern
known = [("John Smith", "jsmith@company.com"), ("Jane Doe", "jdoe@company.com")]
guess = guess_email("Bob Jones", "company.com", known)  # 'bjones@company.com'

Phones Module

Import: from humanmint.phones import ...

Function	Purpose	Returns
`normalize_phone(raw, country="US")`	Normalize phone to E.164 format	Dict: {e164, pretty, extension, country, type, is_valid}
`detect_impossible(phone_dict)`	Detect if phone appears impossible/test/fake	bool
`detect_fax_pattern(phone_dict)`	Detect if phone matches known fax patterns	bool
`detect_voip_pattern(phone_dict)`	Detect if phone matches VoIP provider patterns	bool

Examples:

from humanmint.phones import normalize_phone, detect_fax_pattern

# Normalize
result = normalize_phone("(202) 555-0123 ext 201")
# {'e164': '+12025550123', 'pretty': '+1 202-555-0123', 'extension': '201', 'type': 'mobile', ...}

# Detect fax
fax = detect_fax_pattern(result)  # False

Departments Module

Import: from humanmint.departments import ...

Function	Purpose	Returns
`normalize_department(raw_dept)`	Normalize dept name (remove noise, standardize format)	str
`find_best_match(dept_name, threshold=0.6, normalize=True)`	Find best canonical match	Optional[str]
`find_all_matches(dept_name, threshold=0.6, top_n=3, normalize=True)`	Find all matches ranked by similarity	list[str]
`match_departments(dept_names, threshold=0.6, normalize=True)`	Match multiple depts at once	dict[str, Optional[str]]
`get_similarity_score(dept1, dept2)`	Calculate similarity between two depts	float (0-1)
`get_department_category(dept)`	Get category for canonical dept	Optional[str]
`get_all_categories()`	Get all available categories	set[str]
`get_departments_by_category(category)`	Get depts in a specific category	list[str]
`categorize_departments(depts)`	Categorize multiple depts	dict[str, Optional[str]]
`get_canonical_departments()`	Get all canonical dept names	list[str]
`is_canonical(dept)`	Check if dept is canonical	bool
`load_mappings()`	Load all dept mappings	dict[str, list[str]]
`get_mapping_for_original(original)`	Get canonical for original name	Optional[str]
`get_originals_for_canonical(canonical)`	Get all original names for canonical	list[str]

Examples:

from humanmint.departments import normalize_department, find_best_match, get_department_category

# Normalize
clean = normalize_department("000171 - Police Department")  # 'Police'

# Find match
match = find_best_match("Police Dept")  # 'Police'
category = get_department_category("Police")  # 'Public Safety'

# Find all matches
all_matches = find_all_matches("Finance", threshold=0.7)  # ['Finance', 'Budget']

Titles Module

Import: from humanmint.titles import ...

Function	Purpose	Returns
`normalize_title_full(raw_title, threshold=0.6, dept_canonical=None, overrides=None)`	Full normalization with confidence	TitleResult Dict: {raw, cleaned, canonical, is_valid, confidence}
`normalize_title(raw_title)`	Core title cleaning	str
`find_best_match(title, threshold=0.6, normalize=True)`	Find best canonical match	tuple[Optional[str], float]
`find_all_matches(title, threshold=0.6, top_n=3)`	Find all matches	list[str]
`get_similarity_score(title1, title2)`	Calculate similarity	float (0-1)
`get_canonical_titles()`	Get all canonical titles	list[str]
`is_canonical(title)`	Check if title is canonical	bool
`get_mapping_for_variant(variant)`	Get canonical for variant	Optional[str]
`get_all_mappings()`	Get all title mappings	dict[str, str]

Examples:

from humanmint.titles import normalize_title_full, find_best_match, get_canonical_titles

# Full normalization
result = normalize_title_full("0001 - Chief of Police (Downtown)")
# {'raw': '0001 - Chief of Police (Downtown)', 'cleaned': 'Chief of Police', 'canonical': 'police chief', ...}

# Find match with confidence
title, confidence = find_best_match("Police Chief", threshold=0.7)
# ('police chief', 0.95)

Addresses Module

Import: from humanmint.addresses import ...

Function	Purpose	Returns
`normalize_address(raw)`	Parse US postal address	Optional[Dict]: {street, city, state, zip, canonical}

Examples:

from humanmint.addresses import normalize_address

result = normalize_address("123 Main St, Springfield, IL 62701")
# {'street': '123 Main St', 'city': 'Springfield', 'state': 'IL', 'zip': '62701', ...}

Organizations Module

Import: from humanmint.organizations import ...

Function	Purpose	Returns
`normalize_organization(raw)`	Normalize agency/org name (remove civic suffixes)	Optional[Dict]: {raw, normalized, canonical, confidence}

Examples:

from humanmint.organizations import normalize_organization

result = normalize_organization("City of Springfield")
# {'raw': 'City of Springfield', 'normalized': 'Springfield', 'canonical': 'Springfield', ...}

Customization

You can steer canonicals without forking data files:

Department overrides: Map normalized departments to your preferred canonical. Example: mint(department="RevOps", dept_overrides={"revenue operations": "Sales"}) or pass the same dict into bulk()/pandas/CLI.
Title overrides / ignores: Map cleaned titles to a canonical string with title_overrides. To ignore a title, set it to None and drop records where result.title is None or result.title["is_valid"] is False.

Examples

Government Contact

result = mint(
    name="Chief Robert Patterson",
    email="robert.patterson@police.gov",
    phone="(202) 555-0178",  # Real area code; 555 is reserved
    department="000171 - Police",
    title="Chief of Police"
)

# Cleaned:
# name: {'full': 'Robert Patterson', 'first': 'Robert', 'last': 'Patterson', 'gender': 'Male'}
# email: 'robert.patterson@police.gov'
# phone: '+1 202-555-0178'
# department: 'Police' (category: 'Public Safety')
# title: {'canonical': 'police chief', 'is_valid': True}

Messy Data

result = mint(
    name="Dr. Jane Smith, PhD",
    email="JANE@EXAMPLE.COM",
    phone="415 555 0123",  # Real area code; 555 is reserved
    department="Planning and Development 415-555-0123 ext 200"
)

# Cleaned:
# name: {'full': 'Jane Smith', 'first': 'Jane', 'last': 'Smith', 'gender': 'Female'}
# email: 'jane@example.com'
# phone: '+1 415-555-0123'
# department: 'Planning' (category: 'Planning & Development')

Single Names

result = mint(name="Madonna")
# name: {'full': 'Madonna', 'first': 'Madonna', 'last': '', 'gender': 'Female'}

result = mint(name="Jo")
# name: {'full': 'Jo', 'first': 'Jo', 'last': '', 'gender': 'Female'}

Advanced Usage

Use individual modules for specific needs. See API Reference above for the complete list of functions across all modules:

# Names: normalize, enrich, infer gender, detect nicknames
from humanmint.names import normalize_name, infer_gender, detect_nickname

# Emails: validate, detect free providers, guess patterns
from humanmint.emails import normalize_email, is_free_provider, guess_email

# Phones: normalize, detect types, identify fax/VoIP/test numbers
from humanmint.phones import normalize_phone, detect_fax_pattern, detect_voip_pattern

# Departments: normalize, match, categorize
from humanmint.departments import normalize_department, find_best_match, get_department_category

# Titles: normalize, match, get mappings
from humanmint.titles import normalize_title_full, find_best_match as match_title

# Addresses: parse and normalize
from humanmint.addresses import normalize_address

# Organizations: normalize and extract canonical names
from humanmint.organizations import normalize_organization

# Batch processing and comparison
from humanmint import bulk, compare

Testing

pytest -q unittests

CLI

Clean a CSV (auto-detecting columns, or override with flags):

humanmint clean input.csv output.csv --name-col name --email-col email

Regenerating data caches

If you edit the raw datasets in src/humanmint/data/original/, rebuild the packaged .json.gz caches with:

python scripts/build_caches.py

Performance

Batch processing: ~0.1-0.2ms per contact
10 contacts: ~1-2ms
100 contacts: ~10-20ms

What Gets Cleaned

Input	Output
`name="Dr. Jane Smith, PhD"`	`first='Jane', last='Smith'`
`email="JOHN@EXAMPLE.COM"`	`john@example.com`
`phone="(202) 555-0101 x101"`	`+1 202-555-0101`, extension: `101`
`address="123 Main St, Springfield, IL 62701"`	`street='123 Main St', city='Springfield', state='IL', zip='62701'`
`department="000171 - Public Works 202-555-0150 ext 200"`	`Public Works` (category: Infrastructure)
`title="0001 - Chief of Police (Downtown)"`	`police chief`
`organization="City of Springfield Police"`	`normalized='Springfield', canonical='Springfield'`
`name="Jo"`	`first='Jo'`, gender: Female

Version

0.1.0

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.1

Dec 3, 2025

2.0.1b0 pre-release

Dec 3, 2025

2.0.0

Dec 1, 2025

2.0.0b3 pre-release

Dec 1, 2025

2.0.0b2 pre-release

Dec 1, 2025

2.0.0b1 pre-release

Dec 1, 2025

0.1.17

Dec 1, 2025

0.1.14

Nov 28, 2025

0.1.13

Nov 28, 2025

0.1.12

Nov 28, 2025

0.1.11

Nov 28, 2025

0.1.10

Nov 28, 2025

This version

0.1.8

Nov 28, 2025

0.1.7

Nov 28, 2025

0.1.6

Nov 28, 2025

0.1.5

Nov 28, 2025

0.1.4

Nov 28, 2025

0.1.3

Nov 28, 2025

0.1.2

Nov 28, 2025

0.1.1

Nov 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

humanmint-0.1.8.tar.gz (1.1 MB view details)

Uploaded Nov 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

humanmint-0.1.8-py3-none-any.whl (1.1 MB view details)

Uploaded Nov 28, 2025 Python 3

File details

Details for the file humanmint-0.1.8.tar.gz.

File metadata

Download URL: humanmint-0.1.8.tar.gz
Upload date: Nov 28, 2025
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for humanmint-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`9f115087d7e4e7f6c11f921e326930d3684b2aa38ae09bb14c3378c4b15cd737`
MD5	`e901e0bdba29d7b8ea0942d16e1f9ed9`
BLAKE2b-256	`3b4a001b6001ccdb79446a16a1352f99eb27872cfc79d0918e98dbad6eec0b0a`

See more details on using hashes here.

File details

Details for the file humanmint-0.1.8-py3-none-any.whl.

File metadata

Download URL: humanmint-0.1.8-py3-none-any.whl
Upload date: Nov 28, 2025
Size: 1.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for humanmint-0.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`37f6bc224295330867071b83c23f0d5c2a231de3cabb9d5a0adbe5769afeb030`
MD5	`d0198579751d121b2d6d6d421daac9f7`
BLAKE2b-256	`0a8fae907bcb055f3db90f6514c571672d51aecf1e3a9153aceedf3581c504ac`

See more details on using hashes here.

humanmint 0.1.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HumanMint

Installation

Optional Dependencies

Quick Start

Why HumanMint?

Problem

Solution

Features

API Reference

Core Functions

mint(name, email, phone, address, department, title, organization, ...)

bulk(records, workers=4, progress=False)

compare(result_a, result_b)

Names Module

Emails Module

Phones Module

Departments Module

Titles Module

Addresses Module

Organizations Module

Customization

Examples

Government Contact

Messy Data

Single Names

Advanced Usage

Testing

CLI

Regenerating data caches

Performance

What Gets Cleaned

Version

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`mint(name, email, phone, address, department, title, organization, ...)`

`bulk(records, workers=4, progress=False)`

`compare(result_a, result_b)`