Skip to main content

Demographic prediction from name

Project description

NameTrace

NameTrace is a python package to identify "real" human names and predict gender and geographical origin of the name. The idea is to allow users to e.g. take users' names on social media platforms such as X and identify real names and predict gender and possible origin of the users. This package was build to help researchers.

See here for a comprehensive blogpost about the package.

Features

  • Human Name Detection: Distinguish between human names and other text (usernames, company names, etc.)
  • Gender Prediction: Predict gender from names using deep learning models
  • Geographic Origin: Predict geographic subregion from names
  • High Performance: Uses BiLSTM neural networks with rule-based fallbacks
  • Easy to Use: Simple API with batch processing support

Installation

pip install nametrace

[!NOTE] nametrace requires pytorch. On some platforms, the latest versions of torch might not be supported, and you may get an error message during the installation of nametrace. Actually nametrace does not require the latest version of torch. You can solve this issue by simply installing a version of torch that is compatible with your system. For example, on a Mac OS 12.7 system with an Intel chip, you can only run torch<=2.2.2. So you just install pytorch first and then follow up with nametrace: pip install "torch==2.2.2" pip install namtrace

Quick Start

from nametrace import NameTracer

# Initialize the predictor
nt = NameTracer()

# Predict for a single name
result = nt.predict("John Smith")
print(result)
# {
#   'is_human': True,
#   'gender': 'male',
#   'subregion': 'Northern Europe',
#   'confidence': {
#     'human': 1.0,
#     'gender': 0.9563450217247009,
#     'subregion': 0.40897873044013977
#     }
# }

# Allows batch prediction
names = ["Maria Garcia", "user123", "Ahmed Hassan"]
results = nt.predict(names,batch_size=12)
for name, result in zip(names, results):
    print(f"{name}: {result['is_human']}")

# Maria Garcia: True
# user123: False
# Ahmed Hassan: True


# Allows top k prediction 
result = nt.predict("John Smith",topk=3)
# {
#   'is_human': True,
#   'gender': [
#     ('male', 0.9563450217247009),
#     ('female', 0.04365495219826698)],
#    'subregion': [
#     ('Northern Europe', 0.40897873044013977),
#     ('North America', 0.32769879698753357),
#     ('Australia and New Zealand', 0.16957755386829376)
#     ], 
#   'confidence': {
#     'human': 1.0,
#     'gender': 0.9563450217247009,
#     'subregion': 0.40897873044013977
#     }
# }

API Reference

NameTracer

The main class for name prediction.

__init__(device=None)

Initialize the tracer.

Parameters:

  • device (str, optional): Device for model inference ('cpu', 'cuda', or None for auto-detection)

predict(names, batch_size=None, topk=1)

Predict if a name(s) is(are) human and get demographics.

Parameters:

  • names (str or list): Input name string or list of name strings
  • topk (int, optional): Provide topk predcitions, optional, defaults to 1
  • batch_size (int, optional): batch size for batch inference, defaults to None (i.e. single batch)

Returns: If names is a single name:

  • dict: Prediction results with keys:
    • is_human (bool): Whether the name is human
    • gender (str): Predicted gender ('male'/'female') or None
    • subregion (str): Predicted geographic subregion or None
    • confidence (dict): Confidence scores for each prediction

If names is a list of names list of above dict.

Training Details

NameTrace uses a two-stage approach:

  1. Human Detection: Rule-based lookup against known name databases, with BiLSTM fallback for unknown names
  2. Demographics Prediction: Character-level BiLSTM model for joint gender and geographic origin prediction

Performance on test data

  • Human Detection: Acc: 74.46; F1: 76.49
  • Gender Prediction: Acc: 95.57; Macro F1: 93.01
  • Geographic Origin: Acc: 66.55; Macro F1: 44.83

Requirements

  • Python 3.9+
  • PyTorch 2.0+
  • nameparser

Data and credits

This package was built to allow gender and geographic origin precition in a simple modern api. As such it heavily benefitted from previous work of other authors and packages:

  • name2nat (Kyubyong Park, 2020). I built on the data for nationalities collected by Kyubyong Park and convert these to geographic regions. I also extend his dataset by collecting the gender of the names in the dataset from Wikipedia.
  • gender_guesser (ByRatings, 2016). I take a list of first names from this package, which is originally taken from a c package by Jörg Michael (2008).
  • Gender By Name (Rupinder Singh Rana, 2024), which also contains a list of first namnes that I use for training of the human name detection.

Citation

If you use this package, please remember to cite it:

@misc{
  bose-nametrace-2025,
  url={https://github.com/parobo/nametrace},
  journal={GitHub},
  author={Bose, Paul},
  year={2025},
  month={Jun}
}

License

GNUv3 License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nametrace-0.1.1.tar.gz (4.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nametrace-0.1.1-py3-none-any.whl (4.9 MB view details)

Uploaded Python 3

File details

Details for the file nametrace-0.1.1.tar.gz.

File metadata

  • Download URL: nametrace-0.1.1.tar.gz
  • Upload date:
  • Size: 4.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.8

File hashes

Hashes for nametrace-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c563bdc6d05a054bad98ecce9b25c5eb71b028ba61a534e6b315beadc9adb50b
MD5 8c5028af9dc0505ccce22f4c5e8094d6
BLAKE2b-256 5fc217c45daa3b4b421506b52b2664336d56365f01dc852755e7777b256f344c

See more details on using hashes here.

File details

Details for the file nametrace-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: nametrace-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.8

File hashes

Hashes for nametrace-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 670f9d47ba3f02f2a6022ed770f0600c8442b10f469f61424e2b206d92e44ccf
MD5 0ab2eafa13e804ce4f78a5cbb5499621
BLAKE2b-256 dd58e28f43d3551da7c2324ebc8ef790814922ac37b046f7045bdf92ba5c24e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page