Clean, functional data processing for human-centric applications. Normalize and standardize names, emails, phones, departments, and job titles with a single unified API.

These details have not been verified by PyPI

Project links

Project description

HumanMint v2

HumanMint cleans and normalizes messy contact data with one line of code. It standardizes names, emails, phones, addresses, departments, titles, and organizations. It is a general-purpose cleaner for B2B and public-sector data, and ships with curated public-sector mappings you won’t find anywhere else.

from humanmint import mint

result = mint(
    name="Dr. John Q. Smith, PhD",
    email="JOHN.SMITH@CITY.GOV",
    phone="(202) 555-0173 ext 456",
    department="001 - Public Works Dept",
    title="Chief of Police",
    address="123 N. Main St Apt 4B, Madison, WI 53703",
    organization="City of Madison Police Department",
)

result.name_standardized          # "John Q Smith"
result.email_standardized         # "john.smith@city.gov"
result.phone_pretty               # "+1 202-555-0173"
result.department_canonical       # "Public Works"
result.title_canonical            # "police chief"
result.address_canonical          # "123 N. Main St Apt 4B Madison WI 53703 US"

# Split multi-person names when needed
results = mint(name="John and Jane Smith", split_multi=True)
# returns [MintResult(John Smith), MintResult(Jane Smith)]

Why HumanMint

General-purpose: works for government data and B2B (execs, VPs, directors, managers) without switching libraries.
Real-world chaos: titles inside names, departments with numbers/phone extensions, strange-casing emails, smashed-together addresses.
Unique data: 23K+ department variants → 64 categories; 73K+ titles with curated canonicals + BLS; context-aware (dept-informed) title mapping not available off-the-shelf.
Safe defaults: length guards, optional aggressive cleaning, semantic conflict checks, bulk dedupe, and optional multi-person name splitting.

Department & Title mapping you can’t get elsewhere

Curated public-sector mappings that solve the “impossible to Google” parts of contact normalization. Works for governments and B2B roles (CEOs, VPs, Directors, Managers) alike.

"City Administration"    -> "Administration"       [administration]
"Finance Department"     -> "Finance"              [finance]
"Public Works"           -> "Public Works"         [infrastructure]
"Police Department"      -> "Police"               [public safety]

Titles get similar treatment across 73K standardized forms with optional department context to boost accuracy.

All fields in one library

Names, emails, phones, addresses, departments, titles, organizations—one pipeline. Most libraries clean only one field (just names or just phones); HumanMint normalizes the entire record with canonicalization, categorization, and confidence.

Fast

Typical workloads run sub-millisecond per record with multithreading and built-in dedupe.

AI extraction (optional)

Install the ML extra (pip install humanmint[ml]) and pass text= with use_gliner=True to extract from unstructured text, then normalize. Structured fields you pass always win. You can also pass a GlinerConfig (gliner_cfg) to control schema, threshold, and GPU usage. GLiNER extraction is experimental and may be inaccurate; prefer structured inputs when available.

Example (signature block → canonicalized):

text = """
John A. Miller
Deputy Director of Public Works
City of Springfield, Missouri
305 E McDaniel St, Springfield, MO 65806
Phone: (417) 864-1234
Email: jmiller@springfieldmo.gov
"""

result = mint(text=text, use_gliner=True)

# Result:
# MintResult(
#   name: John A Miller
#   email: jmiller@springfieldmo.gov
#   phone: +1 417-864-1234
#   department: Public Works
#   title:
#     raw: Deputy Director
#     normalized: Deputy Director
#     canonical: deputy director
#   address: None
#   organization: Springfield Missouri
# )

You can also batch texts: mint(texts=[...], use_gliner=True) returns a list of MintResult objects.

Advanced GLiNER configuration:

from humanmint.gliner import GlinerConfig

cfg = GlinerConfig(
    threshold=0.85,    # optional confidence threshold
    use_gpu=True,      # move model to GPU if available
    schema=None,       # custom schema dict if desired
    extractor=None,    # reuse a preloaded GLiNER2 instance
)

result = mint(text=text, use_gliner=True, gliner_cfg=cfg)

What’s new in v2 (vs v1)

Clear, canonical property names: name_standardized, email_standardized, phone_standardized, title_canonical, department_canonical (legacy aliases removed).
Explainable comparisons: compare(..., explain=True) shows component scores/penalties.
Multi-person name splitting: split_multi=True handles “John and Jane Smith”.
Name enrichment: detects nicknames and generational suffixes without polluting the main name fields.
Optional GLiNER extraction for unstructured text via use_gliner=True and GlinerConfig; multi-person GLiNER input raises a clear error.
Structured-field pipeline remains deterministic and fast; GLiNER is opt-in and experimental.

Installation

pip install humanmint
# Optional extras:
#   pip install humanmint[address]  # usaddress parsing
#   pip install humanmint[pandas]   # DataFrame helpers
#   pip install humanmint[ml]       # GLiNER2 extraction

Quickstart

from humanmint import mint, compare, bulk

r1 = mint(name="Jane Doe", email="jane.doe@city.gov", department="Public Works", title="Engineer")
r2 = mint(name="J. Doe",  email="JANE.DOE@CITY.GOV", department="PW Dept",       title="Public Works Engineer")

score = compare(r1, r2)  # similarity 0–100
# Or with explanation:
score, why = compare(r1, r2, explain=True)
print("\n".join(why))

records = [
    {"name": "Alice", "email": "alice@example.com"},
    {"name": "Bob",   "email": "bob@example.com"},
]
results = bulk(records, workers=4)

Access Patterns

Quick reference (full field guide in docs/FIELDS.md):

Dict access: result.title["canonical"], result.department["canonical"], result.department["category"]
Properties (preferred): name_standardized, title_canonical, department_canonical, email_standardized, phone_standardized, address_canonical, organization_canonical
Full dicts: result.title, result.department, result.email, etc.

Recommended Properties (quick reference)

Names — name_standardized, name_first, name_last, name_middle, name_suffix, name_suffix_type, name_gender, name_nickname

Emails — email_standardized, email_domain, email_is_valid, email_is_generic_inbox, email_is_free_provider

Phones — phone_standardized, phone_e164, phone_pretty, phone_extension, phone_is_valid, phone_type

Departments — department_canonical, department_category, department_normalized, department_override

Titles — title_canonical, title_raw, title_normalized, title_is_valid, title_confidence, title_seniority

Addresses — address_canonical, address_raw, address_street, address_unit, address_city, address_state, address_zip, address_country

Organizations — organization_raw, organization_normalized, organization_canonical, organization_confidence

Use result.get("email.is_valid") or other dot paths to fetch nested dict values.

Comparing Records

from humanmint import compare
score = compare(r1, r2)  # 0–100
# >85 likely duplicate, >70 similar, <50 different

Batch & Export

from humanmint import bulk, export_json, export_csv, export_parquet, export_sql

results = bulk(records, workers=4, progress=True)
export_json(results, "out.json")
export_csv(results, "out.csv", flatten=True)

CLI

humanmint clean input.csv output.csv --name-col name --email-col email --phone-col phone --dept-col department --title-col title

Performance (benchmark)

Dataset	Time	Per Record	Throughput
1,000	561 ms	0.56 ms	1,783 rec/sec
10,000	3.1 s	0.31 ms	3,178 rec/sec
50,000	14.0 s	0.28 ms	3,576 rec/sec

Notes

US-focused address parsing; usaddress is used when available, otherwise heuristics.
Optional deps (pandas, pyarrow, sqlalchemy, rich, tqdm) enhance exports and progress bars.
Department and title datasets are curated and updated regularly for best accuracy.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.1

Dec 3, 2025

This version

2.0.1b0 pre-release

Dec 3, 2025

2.0.0

Dec 1, 2025

2.0.0b3 pre-release

Dec 1, 2025

2.0.0b2 pre-release

Dec 1, 2025

2.0.0b1 pre-release

Dec 1, 2025

0.1.17

Dec 1, 2025

0.1.14

Nov 28, 2025

0.1.13

Nov 28, 2025

0.1.12

Nov 28, 2025

0.1.11

Nov 28, 2025

0.1.10

Nov 28, 2025

0.1.8

Nov 28, 2025

0.1.7

Nov 28, 2025

0.1.6

Nov 28, 2025

0.1.5

Nov 28, 2025

0.1.4

Nov 28, 2025

0.1.3

Nov 28, 2025

0.1.2

Nov 28, 2025

0.1.1

Nov 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

humanmint-2.0.1b0.tar.gz (1.9 MB view details)

Uploaded Dec 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

humanmint-2.0.1b0-py3-none-any.whl (2.0 MB view details)

Uploaded Dec 3, 2025 Python 3

File details

Details for the file humanmint-2.0.1b0.tar.gz.

File metadata

Download URL: humanmint-2.0.1b0.tar.gz
Upload date: Dec 3, 2025
Size: 1.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for humanmint-2.0.1b0.tar.gz
Algorithm	Hash digest
SHA256	`3da7fae2994e0159d297bac44f6abaa7d16123147f95d9cf4c27c8b627764326`
MD5	`9c83d6a6666a102f140189e77a3027aa`
BLAKE2b-256	`06716dbb44e97062f82daa375d14cf171efd8fd6cab44001abb607b8d51c1e9f`

See more details on using hashes here.

File details

Details for the file humanmint-2.0.1b0-py3-none-any.whl.

File metadata

Download URL: humanmint-2.0.1b0-py3-none-any.whl
Upload date: Dec 3, 2025
Size: 2.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for humanmint-2.0.1b0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4a807944e0b8367dce98d355ede161e173d81936cb562714d6163236ee1d9e09`
MD5	`d7f4dc61258b3bfd56195d509aa661e6`
BLAKE2b-256	`c445044dd568b24cec421f474e7138fbb35c4c7399532f8c937d674fdf3d39ca`

See more details on using hashes here.

humanmint 2.0.1b0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HumanMint v2

Why HumanMint

Department & Title mapping you can’t get elsewhere

All fields in one library

Fast

AI extraction (optional)

What’s new in v2 (vs v1)

Installation

Quickstart

Access Patterns

Recommended Properties (quick reference)

Comparing Records

Batch & Export

CLI

Performance (benchmark)

Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes