Skip to main content

Nationality Prediction from Firstname using Python 3.13 and scikit-learn

Project description

Firstname to Nationality - Python 3.13 Implementation

A name-to-nationality prediction library for Python 3.13+ using machine learning libraries.

🚀 Features

This library provides the following capabilities:

  • Python 3.13+ Compatible: Uses Python features and type hints
  • ML Stack: Built with scikit-learn for performance and compatibility
  • City-Based Prediction: Use geopy geocoding for city-based nationality prediction
  • Type Safety: Full type hints and dataclasses throughout
  • Error Handling: Robust error handling and fallbacks
  • Dev Container Ready: Includes VS Code dev container configuration
  • Flexible Training: Easy model training with your own data
  • Batch Processing: Efficient batch prediction support

📦 Installation

Using the Dev Container (Recommended)

  1. Open in VS Code
  2. When prompted, click "Reopen in Container"
  3. The dev container will build automatically with Python 3.13

Manual Installation

# Ensure you have Python 3.13+
python --version

# Install dependencies
pip install -r requirements.txt

# Install the package
pip install -e .

🔧 Quick Start

Basic Name-Based Prediction

from firstname_to_nationality import FirstnameToNationality

# Initialize the predictor
predictor = FirstnameToNationality()

# Predict nationality for a single name
result = predictor.predict_single("Giuseppe Rossi", top_n=3)
print(result)  # [('Italian', 0.85), ('Spanish', 0.12), ...]

# Batch prediction
names = ["John Smith", "Maria Rodriguez", "Zhang Wei"]
results = predictor(names, top_n=2)

for name, predictions in results:
    nationality, confidence = predictions[0]
    print(f"{name}{nationality} ({confidence:.2f})")

City-Based Prediction (New!)

from firstname_to_nationality import CityToNationality

# Initialize the city-based predictor
predictor = CityToNationality()

# Predict with city information (more accurate)
result = predictor("Maria Garcia", cities="Barcelona")
print(result)  # Spanish (from Barcelona, Spain)

# Fallback to name-based prediction if no city
result = predictor("Maria Garcia")
print(result)  # Uses ML model on name

# Batch prediction with cities
names = ["John Smith", "Luigi Ferrari", "Zhang Wei"]
cities = ["London", "Milan", "Beijing"]
results = predictor(names, cities=cities)

for item in results:
    name = item["name"]
    pred = item["predictions"][0]
    print(f"{name} from {item['city']}{pred['nationality']} ({pred['country_code']})")

🧪 Examples

Run the example scripts:

# Basic name-based prediction
python example.py

# Country code mapping
python example_country.py

# City-based prediction with geocoding
python example_city.py

🔥 Training Your Own Model

Using Sample Data

python nationality_trainer.py

Using Your Own Data

Create a CSV file with name and nationality columns:

name,nationality
John Smith,American
Giuseppe Rossi,Italian
Hiroshi Tanaka,Japanese

Then train:

python nationality_trainer.py your_data.csv

Creating a Dictionary

python nationality_trainer.py --dict

🏗️ Architecture

The implementation consists of:

  • FirstnameToNationality: Main predictor class with scikit-learn backend
  • FirstnameToCountry: Maps nationalities to country codes
  • CityToNationality: City-based prediction with geocoding fallback to name-based
  • NamePreprocessor: Advanced name preprocessing and normalization
  • PredictionResult: Type-safe prediction results using dataclasses
  • Model Pipeline: TF-IDF vectorization + Logistic Regression

📁 File Structure

The implementation uses these file paths:

  • firstname_to_nationality/best-model.pt: Model checkpoint file
  • firstname_to_nationality/firstname_nationalities.pkl: Name-to-nationality dictionary

� Usage Examples

Basic Usage

from firstname_to_nationality import FirstnameToNationality
predictor = FirstnameToNationality()
results = predictor(["John Smith"])

Advanced Features

# Type-safe single predictions
result = predictor.predict_single("John Smith", top_n=3)

# Training interface
predictor.train(names, nationalities, save_model=True)

# Dictionary management
predictor.save_dictionary(name_dict)

🐳 Development with Docker

Dev Container

The repository includes a complete dev container setup for VS Code:

# Open in VS Code
code .
# Click "Reopen in Container" when prompted

Manual Docker

# Build
docker build -f .devcontainer/Dockerfile -t firstname-to-nationality .

# Run
docker run -it --rm -v $(pwd):/workspace firstname-to-nationality

⚡ Performance

The implementation offers:

  • Fast training with scikit-learn
  • Memory efficiency
  • Batch processing support
  • Python optimizations

🧬 Dependencies

Core Requirements:

  • Python 3.13+
  • scikit-learn >= 1.3.0
  • numpy >= 1.25.0
  • pandas >= 2.0.0
  • joblib >= 1.3.0
  • geopy >= 2.3.0 (for city-based predictions)

Development:

  • pytest, black, isort, pylint, mypy

🤝 Contributing

  1. Use the dev container for consistent environment
  2. Follow type hints throughout
  3. Run tests: pytest
  4. Format code: black . && isort .
  5. Check types: mypy firstname_to_nationality/

Automated Release Process

This repository uses a fully automated release workflow:

  1. Push your code to the main branch
  2. Version is automatically bumped based on conventional commit messages
  3. GitHub release is created automatically with AI-generated release notes
  4. Package is published to PyPI automatically

For more details, see .github/WORKFLOW_SETUP.md.

Commit Message Format

Use conventional commits for automatic version bumping:

  • fix: description → Patch version bump (1.0.0 → 1.0.1)
  • feat: description → Minor version bump (1.0.0 → 1.1.0)
  • feat!: description → Major version bump (1.0.0 → 2.0.0)

Setting Up GitHub Actions Workflows

If you're a maintainer and need to set up the auto-version-bump workflow, see .github/WORKFLOW_SETUP.md for detailed instructions on configuring the required GitHub App for authentication.

📄 License

MIT License

� Implementation Details

This is a complete implementation with:

  • ✅ Consistent method signatures
  • ✅ Reliable file handling
  • ✅ Robust prediction results
  • ✅ Efficient model format
  • ✅ Minimal dependencies

🎯 Roadmap

  • Transformer-based models support
  • REST API server
  • Web interface
  • Multi-language support
  • Advanced evaluation metrics

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

firstname_to_nationality-1.1.23.tar.gz (27.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

firstname_to_nationality-1.1.23-py3-none-any.whl (27.2 MB view details)

Uploaded Python 3

File details

Details for the file firstname_to_nationality-1.1.23.tar.gz.

File metadata

File hashes

Hashes for firstname_to_nationality-1.1.23.tar.gz
Algorithm Hash digest
SHA256 d27cb2d7b4616db07132eabbebc725c9289ad7b8fd01ae1a54594640f355a1ae
MD5 4dbefe9d94d4d7648fa1ceff23859b1b
BLAKE2b-256 1e8b29f3df032ce35a9be0378d8797920a674ad31d2503ea00d517c4b0f420ed

See more details on using hashes here.

Provenance

The following attestation bundles were made for firstname_to_nationality-1.1.23.tar.gz:

Publisher: publish.yml on callidio/firstname_to_nationality

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file firstname_to_nationality-1.1.23-py3-none-any.whl.

File metadata

File hashes

Hashes for firstname_to_nationality-1.1.23-py3-none-any.whl
Algorithm Hash digest
SHA256 912c375a5e0e2844c837d7df9ef48134349396bf4c3ba45aeeb2354c9ae7dcc0
MD5 53fbf9c1cd7028714502c1557db38709
BLAKE2b-256 8207af22a8d5ccfd505e525cc43b76cc3e9751bcb8b6c57c484d3ed222c584b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for firstname_to_nationality-1.1.23-py3-none-any.whl:

Publisher: publish.yml on callidio/firstname_to_nationality

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page