Skip to main content

Nationality Prediction from Firstname using Python 3.13 and scikit-learn

Project description

Firstname to Nationality - Python 3.13 Implementation

A name-to-nationality prediction library for Python 3.13+ using machine learning libraries.

🚀 Features

This library provides the following capabilities:

  • Python 3.13+ Compatible: Uses Python features and type hints
  • ML Stack: Built with scikit-learn for performance and compatibility
  • City-Based Prediction: Use geopy geocoding for city-based nationality prediction
  • Type Safety: Full type hints and dataclasses throughout
  • Error Handling: Robust error handling and fallbacks
  • Dev Container Ready: Includes VS Code dev container configuration
  • Flexible Training: Easy model training with your own data
  • Batch Processing: Efficient batch prediction support

📦 Installation

Using the Dev Container (Recommended)

  1. Open in VS Code
  2. When prompted, click "Reopen in Container"
  3. The dev container will build automatically with Python 3.13

Manual Installation

# Ensure you have Python 3.13+
python --version

# Install dependencies
pip install -r requirements.txt

# Install the package
pip install -e .

🔧 Quick Start

Basic Name-Based Prediction

from firstname_to_nationality import FirstnameToNationality

# Initialize the predictor
predictor = FirstnameToNationality()

# Predict nationality for a single name
result = predictor.predict_single("Giuseppe Rossi", top_n=3)
print(result)  # [('Italian', 0.85), ('Spanish', 0.12), ...]

# Batch prediction
names = ["John Smith", "Maria Rodriguez", "Zhang Wei"]
results = predictor(names, top_n=2)

for name, predictions in results:
    nationality, confidence = predictions[0]
    print(f"{name}{nationality} ({confidence:.2f})")

City-Based Prediction (New!)

from firstname_to_nationality import CityToNationality

# Initialize the city-based predictor
predictor = CityToNationality()

# Predict with city information (more accurate)
result = predictor("Maria Garcia", cities="Barcelona")
print(result)  # Spanish (from Barcelona, Spain)

# Fallback to name-based prediction if no city
result = predictor("Maria Garcia")
print(result)  # Uses ML model on name

# Batch prediction with cities
names = ["John Smith", "Luigi Ferrari", "Zhang Wei"]
cities = ["London", "Milan", "Beijing"]
results = predictor(names, cities=cities)

for item in results:
    name = item["name"]
    pred = item["predictions"][0]
    print(f"{name} from {item['city']}{pred['nationality']} ({pred['country_code']})")

🧪 Examples

Run the example scripts:

# Basic name-based prediction
python example.py

# Country code mapping
python example_country.py

# City-based prediction with geocoding
python example_city.py

🔥 Training Your Own Model

Using Sample Data

python nationality_trainer.py

Using Your Own Data

Create a CSV file with name and nationality columns:

name,nationality
John Smith,American
Giuseppe Rossi,Italian
Hiroshi Tanaka,Japanese

Then train:

python nationality_trainer.py your_data.csv

Creating a Dictionary

python nationality_trainer.py --dict

🏗️ Architecture

The implementation consists of:

  • FirstnameToNationality: Main predictor class with scikit-learn backend
  • FirstnameToCountry: Maps nationalities to country codes
  • CityToNationality: City-based prediction with geocoding fallback to name-based
  • NamePreprocessor: Advanced name preprocessing and normalization
  • PredictionResult: Type-safe prediction results using dataclasses
  • Model Pipeline: TF-IDF vectorization + Logistic Regression

📁 File Structure

The implementation uses these file paths:

  • firstname_to_nationality/best-model.pt: Model checkpoint file
  • firstname_to_nationality/firstname_nationalities.pkl: Name-to-nationality dictionary

� Usage Examples

Basic Usage

from firstname_to_nationality import FirstnameToNationality
predictor = FirstnameToNationality()
results = predictor(["John Smith"])

Advanced Features

# Type-safe single predictions
result = predictor.predict_single("John Smith", top_n=3)

# Training interface
predictor.train(names, nationalities, save_model=True)

# Dictionary management
predictor.save_dictionary(name_dict)

🐳 Development with Docker

Dev Container

The repository includes a complete dev container setup for VS Code:

# Open in VS Code
code .
# Click "Reopen in Container" when prompted

Manual Docker

# Build
docker build -f .devcontainer/Dockerfile -t firstname-to-nationality .

# Run
docker run -it --rm -v $(pwd):/workspace firstname-to-nationality

⚡ Performance

The implementation offers:

  • Fast training with scikit-learn
  • Memory efficiency
  • Batch processing support
  • Python optimizations

🧬 Dependencies

Core Requirements:

  • Python 3.13+
  • scikit-learn >= 1.3.0
  • numpy >= 1.25.0
  • pandas >= 2.0.0
  • joblib >= 1.3.0
  • geopy >= 2.3.0 (for city-based predictions)

Development:

  • pytest, black, isort, pylint, mypy

🤝 Contributing

  1. Use the dev container for consistent environment
  2. Follow type hints throughout
  3. Run tests: pytest
  4. Format code: black . && isort .
  5. Check types: mypy firstname_to_nationality/

Automated Release Process

This repository uses a fully automated release workflow:

  1. Push your code to the main branch
  2. Version is automatically bumped based on conventional commit messages
  3. GitHub release is created automatically with AI-generated release notes
  4. Package is published to PyPI automatically

For more details, see .github/WORKFLOW_SETUP.md.

Commit Message Format

Use conventional commits for automatic version bumping:

  • fix: description → Patch version bump (1.0.0 → 1.0.1)
  • feat: description → Minor version bump (1.0.0 → 1.1.0)
  • feat!: description → Major version bump (1.0.0 → 2.0.0)

Setting Up GitHub Actions Workflows

If you're a maintainer and need to set up the auto-version-bump workflow, see .github/WORKFLOW_SETUP.md for detailed instructions on configuring the required GitHub App for authentication.

📄 License

MIT License

� Implementation Details

This is a complete implementation with:

  • ✅ Consistent method signatures
  • ✅ Reliable file handling
  • ✅ Robust prediction results
  • ✅ Efficient model format
  • ✅ Minimal dependencies

🎯 Roadmap

  • Transformer-based models support
  • REST API server
  • Web interface
  • Multi-language support
  • Advanced evaluation metrics

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

firstname_to_nationality-1.1.15.tar.gz (27.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

firstname_to_nationality-1.1.15-py3-none-any.whl (27.2 MB view details)

Uploaded Python 3

File details

Details for the file firstname_to_nationality-1.1.15.tar.gz.

File metadata

File hashes

Hashes for firstname_to_nationality-1.1.15.tar.gz
Algorithm Hash digest
SHA256 cf3c4eebf1123b9fb7bcc230cb5b4cf30201a1ebb73231a0f0bca5b4c131d079
MD5 61e6b5c0273282ba0572c94d0a922d86
BLAKE2b-256 df924358b7d53e9840219fb7ec2067354524b6d48d90c010ae6ab447010d510f

See more details on using hashes here.

Provenance

The following attestation bundles were made for firstname_to_nationality-1.1.15.tar.gz:

Publisher: publish.yml on callidio/firstname_to_nationality

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file firstname_to_nationality-1.1.15-py3-none-any.whl.

File metadata

File hashes

Hashes for firstname_to_nationality-1.1.15-py3-none-any.whl
Algorithm Hash digest
SHA256 5eaae231d464d2d1713e36b56911c9a730d14f7f47d22824c24d10360a0ca901
MD5 073fed4e9ea141c0f5710bfeea842f9a
BLAKE2b-256 8dba5241806a7b521dabffb7ed42b3370da225a84d679c53d016022827b34749

See more details on using hashes here.

Provenance

The following attestation bundles were made for firstname_to_nationality-1.1.15-py3-none-any.whl:

Publisher: publish.yml on callidio/firstname_to_nationality

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page