Clean, functional data processing for human-centric applications. Normalize and standardize names, emails, phones, departments, and job titles with a single unified API.

These details have not been verified by PyPI

Project links

Project description

HumanMint

Clean, normalized contact data in one line of code.

Standardize names, emails, phones, addresses, departments, job titles, and organizations with intelligent parsing and fuzzy matching.

from humanmint import mint

result = mint(
    name="Dr. John Q. Smith, PhD",
    email="JOHN.SMITH@CITY.GOV",
    phone="(202) 555-0173 ext 456",
    department="001 - Public Works Dept",
    title="Chief of Police"
)

print(result.name_str)          # "John Q Smith"
print(result.email_str)         # "john.smith@city.gov"
print(result.phone_str)         # "+1 202-555-0173"
print(result.department_str)    # "Public Works"
print(result.title_str)         # "police chief"

Why HumanMint?

Real-world contact data is messy:

Names with titles: "Dr. Jane Smith, PhD"
Inconsistent formatting: "JOHN@EXAMPLE.COM" vs "john.smith@example.com"
Phone number variations: "(202) 555-0101 x101" vs "202.555.0101"
Departments with noise: "000171 - Public Works 202-555-0150 ext 200"
Abbreviated titles: "Sr. Water Engr."

HumanMint handles all of this with zero configuration.

Installation

pip install humanmint

Key Features

Names: Parse, normalize, infer gender, detect nicknames, strip titles
Emails: Validate, normalize, detect free providers (Gmail, Yahoo, etc.)
Phones: Format (E.164), extract extensions, validate, detect type (mobile/landline)
Departments: Canonicalize, categorize, fuzzy match (23K+ dept names → 64 categories)
Titles: Three-tier matching system (73K+ job titles + 133 canonicals + 4.8K BLS), confidence scores
Addresses: Parse US postal addresses (street, city, state, ZIP)
Organizations: Normalize agency/org names
Comparison: compare(result_a, result_b) for deduplication with 0-100 similarity scores
Batch: Parallel processing with bulk(records, workers=4) for high throughput
Export: JSON, CSV, Parquet, SQL with flatten option for direct database import

Quick Examples

Field Accessor Reference

All fields provide three access patterns:

Pattern	Example	Description
Dict access	`result.title["canonical"]`	Access specific processing stages
Property	`result.title_str`	Shorthand for canonical/standardized form
Full dict	`result.title`	All stages: raw, normalized, canonical, is_valid

Available Properties by Field

Names:

name_str - Full name
name_first - First name
name_last - Last name
name_middle - Middle name
name_suffix - Suffix (Jr., Sr., etc.)
name_gender - Inferred gender

Emails:

email_str - Normalized email
email_domain - Domain part
email_valid - Is valid email
email_generic - Is generic inbox (info@, admin@)
email_free - Is free provider (Gmail, Yahoo)

Phones:

phone_str - Formatted phone (pretty or E.164)
phone_e164 - E.164 format (+12025550123)
phone_pretty - Pretty format (+1 202-555-0123)
phone_extension - Extension number
phone_valid - Is valid phone
phone_type - Type (MOBILE, FIXED_LINE, etc.)

Departments:

department_str - Canonical department name
department_category - Department category
department_normalized - Normalized (pre-canonical)
department_override - Was override applied

Titles:

title_str - Canonical title
title_raw - Original input
title_normalized - Normalized (intermediate)
title_canonical - Standardized form
title_valid - Is valid title
title_confidence - Confidence score (0.0-1.0)

Addresses:

address_str / address_canonical - Full formatted address
address_raw - Original input
address_street - Street address
address_unit - Unit/apartment number
address_city - City
address_state - State
address_zip - ZIP code
address_country - Country

Organizations:

organization_raw - Original input
organization_normalized - Normalized form
organization_canonical - Canonical form
organization_confidence - Confidence score (0.0-1.0)

Accessing title fields

result = mint(title="Chief of Police")

# Dict access - different processing stages
result.title["raw"]         # "Chief of Police" (original input)
result.title["normalized"]  # "Chief of Police" (cleaned)
result.title["canonical"]   # "police chief" (standardized form)
result.title["is_valid"]    # True
result.title["confidence"]  # 0.98 (confidence score)

# Shorthand properties
result.title_str            # "police chief" (same as canonical)
result.title_normalized     # "Chief of Police"
result.title_confidence     # 0.98

Title Matching Strategy: Three-Tier System

HumanMint uses an intelligent three-tier matching system to handle 73K+ real-world job titles:

Tier 1: Job Titles Database (73,380 titles)

Exact matching against real government job titles
Fuzzy matching for spelling variations and abbreviations
Examples: "Driver" → "driver", "Dvr" → "driver" (0.92 confidence)
High confidence: 0.98 for exact matches, 0.75+ for fuzzy matches

Tier 2: Canonical Titles (133 curated standardized forms)

Fallback when Tier 1 doesn't match
Includes BLS official titles (4,800+)
Examples: "Chief of Police" → "police chief" (standardized form)
Confidence: 0.75-0.95 based on match quality

Tier 3: Enrichment

Department context for disambiguation
BLS official categorization
Confidence scoring based on match strength

Results:

100% success rate on complex government titles
Example: "Deputy Chief Financial Officer" → "deputy chief financial officer" (0.93)
Example: "Environmental Health Specialist" → "environmental health specialist" (0.98)

Comparing records

from humanmint import compare

r1 = mint(name="John Smith", email="john@example.com")
r2 = mint(name="Jon Smith", email="john.smith@example.com")

score = compare(r1, r2)  # Returns 0-100 similarity score
# Typically: >85 = likely duplicate, >70 = similar, <50 = different

Batch processing

from humanmint import bulk

records = [
    {"name": "Alice", "email": "alice@example.com"},
    {"name": "Bob", "email": "bob@example.com"},
]

results = bulk(records, workers=4, progress=True)

Performance

Dataset	Time	Per Record	Throughput
1,000	561 ms	0.56 ms	1,783 rec/sec
10,000	3.1 s	0.31 ms	3,178 rec/sec
50,000	14.0 s	0.28 ms	3,576 rec/sec

Documentation

API Reference — Full function documentation
Use Cases — Real-world examples (Government contacts, HR, Salesforce, etc.)
Fields Guide — Access all returned fields
Advanced — Custom weights, overrides, batch export

CLI

humanmint clean input.csv output.csv --name-col name --email-col email

Testing

pytest -q unittests

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.1

Dec 3, 2025

2.0.1b0 pre-release

Dec 3, 2025

2.0.0

Dec 1, 2025

This version

2.0.0b3 pre-release

Dec 1, 2025

2.0.0b2 pre-release

Dec 1, 2025

2.0.0b1 pre-release

Dec 1, 2025

0.1.17

Dec 1, 2025

0.1.14

Nov 28, 2025

0.1.13

Nov 28, 2025

0.1.12

Nov 28, 2025

0.1.11

Nov 28, 2025

0.1.10

Nov 28, 2025

0.1.8

Nov 28, 2025

0.1.7

Nov 28, 2025

0.1.6

Nov 28, 2025

0.1.5

Nov 28, 2025

0.1.4

Nov 28, 2025

0.1.3

Nov 28, 2025

0.1.2

Nov 28, 2025

0.1.1

Nov 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

humanmint-2.0.0b3.tar.gz (1.9 MB view details)

Uploaded Dec 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

humanmint-2.0.0b3-py3-none-any.whl (1.9 MB view details)

Uploaded Dec 1, 2025 Python 3

File details

Details for the file humanmint-2.0.0b3.tar.gz.

File metadata

Download URL: humanmint-2.0.0b3.tar.gz
Upload date: Dec 1, 2025
Size: 1.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for humanmint-2.0.0b3.tar.gz
Algorithm	Hash digest
SHA256	`edac1c8518b01cc722a6fbaecb5fabf530e8a7af717e719bc2e45983a669abef`
MD5	`c612d42c6aa401e6e81ebd8d2c73fe3a`
BLAKE2b-256	`7f84e5a3fbdcc4af0992f8e6452f9b28902a1f93711a391d41e9c1cab57ee720`

See more details on using hashes here.

File details

Details for the file humanmint-2.0.0b3-py3-none-any.whl.

File metadata

Download URL: humanmint-2.0.0b3-py3-none-any.whl
Upload date: Dec 1, 2025
Size: 1.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for humanmint-2.0.0b3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`76d6961f23bbc2f559c4189f0bf4eee21085fb5c01896dd4716251782c33fece`
MD5	`38970836a4f502643a088c19429aa5bb`
BLAKE2b-256	`0d8b1fda95dca920b5a674caab323c409fbe24b3970952616d9c23eafea40b4e`

See more details on using hashes here.

humanmint 2.0.0b3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HumanMint

Why HumanMint?

Installation

Key Features

Quick Examples

Field Accessor Reference

Available Properties by Field

Accessing title fields

Title Matching Strategy: Three-Tier System

Comparing records

Batch processing

Performance

Documentation

CLI

Testing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes