Skip to main content

Clean, functional data processing for human-centric applications. Normalize and standardize names, emails, phones, departments, and job titles with a single unified API.

Project description

HumanMint v2

HumanMint cleans and normalizes messy contact data with one call. It standardizes names, emails, phones, addresses, departments, titles, and organizations. It's built for both public-sector data and B2B (CEOs, VPs, directors, managers) and ships with curated public-sector mappings.

from humanmint import mint

result = mint(
    name="Dr. John Q. Smith, PhD",
    email="JOHN.SMITH@CITY.GOV",
    phone="(202) 555-0173 ext 456",
    department="001 - Public Works Dept",
    title="Chief of Police",
    address="123 N. Main St Apt 4B, Madison, WI 53703",
    organization="City of Madison Police Department",
)

result.name_standardized          # "John Q Smith"
result.email_standardized         # "john.smith@city.gov"
result.phone_pretty               # "+1 202-555-0173"
result.department_canonical       # "Public Works"
result.title_canonical            # "police chief"
result.address_canonical          # "123 N. Main St Apt 4B Madison WI 53703 US"

Multi-person splitting:

mint(name="John and Jane Smith", split_multi=True)
# -> [MintResult(John Smith), MintResult(Jane Smith)]

Why HumanMint

  • General-purpose: works for government and B2B without swapping libraries.
  • Real-world chaos: handles titles inside names, departments with codes/phones, smashed addresses, anti-scraper emails, casing quirks.
  • Unique data: 23K+ department variants -> 64 categories; 73K+ titles with curated canonicals + BLS; context-aware title mapping.
  • Safe defaults: length guards, optional aggressive cleaning, semantic conflict checks, bulk dedupe, multi-person name splitting.
  • Fast: lazy imports for quick startup, process-based bulk for CPU-bound speed, built-in dedupe to avoid redundant work.

AI extraction (optional)

Install the ML extra (pip install humanmint[ml]) and pass text= with use_gliner=True to extract from unstructured text, then normalize. Structured fields you pass always win. GLiNER extraction is experimental; prefer structured inputs when available.

from humanmint.gliner import GlinerConfig
result = mint(text=signature_block, use_gliner=True, gliner_cfg=GlinerConfig(threshold=0.85))

Installation

pip install humanmint
# Optional extras:
#   pip install humanmint[address]  # usaddress parsing
#   pip install humanmint[pandas]   # DataFrame helpers
#   pip install humanmint[ml]       # GLiNER2 extraction

Quickstart

from humanmint import mint, compare, bulk

r1 = mint(name="Jane Doe", email="jane.doe@city.gov", department="Public Works", title="Engineer")
r2 = mint(name="J. Doe",  email="JANE.DOE@CITY.GOV", department="PW Dept",       title="Public Works Engineer")

score, why = compare(r1, r2, explain=True)

records = [
    {"name": "Alice", "email": "alice@example.com"},
    {"name": "Bob",   "email": "bob@example.com"},
]
results = bulk(records, workers=4)

Access Patterns

  • Dicts: result.title["canonical"], result.department["canonical"], result.department["category"]
  • Properties: name_standardized, title_canonical, department_canonical, email_standardized, phone_standardized, address_canonical, organization_canonical
  • Full dicts: result.title, result.department, result.email, etc.

Recommended Properties

  • Names: name_standardized, name_first, name_last, name_middle, name_suffix, name_gender, name_nickname
  • Name extras: name_salutation (Mr./Ms./Mx.)
  • Emails: email_standardized, email_domain, email_is_valid, email_is_generic_inbox, email_is_free_provider
  • Phones: phone_standardized, phone_e164, phone_pretty, phone_extension, phone_is_valid, phone_type, phone_location, phone_time_zones
  • Departments: department_canonical, department_category, department_normalized, department_override
  • Titles: title_canonical, title_raw, title_normalized, title_is_valid, title_confidence, title_seniority
  • Addresses: address_canonical, address_raw, address_street, address_unit, address_city, address_state, address_zip, address_country
  • Organizations: organization_raw, organization_normalized, organization_canonical, organization_confidence

Use result.get("email.is_valid") to fetch nested dict values via dot paths.

Comparing Records

from humanmint import compare
score, reasons = compare(r1, r2, explain=True)  # 0->100

Batch & Export

from humanmint import bulk, export_json, export_csv, export_parquet, export_sql

# Process records in parallel
results = bulk(records, workers=4, progress=True)

# Export results to various formats
export_json(results, "out.json")
export_csv(results, "out.csv", flatten=True)

# Note: For per-record overrides (dept_overrides, title_overrides), include them in each record dict
records_with_overrides = [
    {**rec, "dept_overrides": {"IT": "Information Technology"}}
    for rec in records
]
results = bulk(records_with_overrides, workers=4)

CLI

humanmint clean input.csv output.csv --name-col name --email-col email --phone-col phone --dept-col department --title-col title

Performance (current)

  • Cold import: ~0.5 s (with pandas installed).
  • First call warm-up: ~0.5 s (loads caches).
  • Bulk: process-based parallelism; throughput scales with cores and workload size.

Notes

  • US-focused address parsing; usaddress used when available, otherwise heuristics.
  • Optional deps (pandas, pyarrow, sqlalchemy, rich, tqdm) enhance exports and progress bars.
  • Department and title datasets are curated and updated regularly for best accuracy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

humanmint-2.0.1.tar.gz (1.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

humanmint-2.0.1-py3-none-any.whl (2.0 MB view details)

Uploaded Python 3

File details

Details for the file humanmint-2.0.1.tar.gz.

File metadata

  • Download URL: humanmint-2.0.1.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for humanmint-2.0.1.tar.gz
Algorithm Hash digest
SHA256 51931e970d722ed8c278e346f4b07827397fdfac5d0b31cc5bcf18d1533a21ef
MD5 c9fa26c8b26398ea8db01a862d3c3c65
BLAKE2b-256 dcac3236f0622a55f0c53067befd30e69b6bee940b196a9e015547706a2c0fa9

See more details on using hashes here.

File details

Details for the file humanmint-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: humanmint-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for humanmint-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 05504f2567671fb3624eb4df8d4c0d16adf791829949cb60ec4ab9890d8d8aa8
MD5 b0492afc1ffdce7ba9d4ce8cdd00209b
BLAKE2b-256 876e9ea2129224cfa497c5616c308f46b7c9bf327084eed431937524baa7d194

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page