Skip to main content

Nationality Prediction from Firstname using Python 3.13 and scikit-learn

Project description

Firstname to Nationality - Python 3.13 Implementation

A name-to-nationality prediction library for Python 3.13+ using machine learning libraries.

🚀 Features

This library provides the following capabilities:

  • Python 3.13+ Compatible: Uses Python features and type hints
  • ML Stack: Built with scikit-learn for performance and compatibility
  • City-Based Prediction: Use geopy geocoding for city-based nationality prediction
  • Type Safety: Full type hints and dataclasses throughout
  • Error Handling: Robust error handling and fallbacks
  • Dev Container Ready: Includes VS Code dev container configuration
  • Flexible Training: Easy model training with your own data
  • Batch Processing: Efficient batch prediction support

📦 Installation

Using the Dev Container (Recommended)

  1. Open in VS Code
  2. When prompted, click "Reopen in Container"
  3. The dev container will build automatically with Python 3.13

Manual Installation

# Ensure you have Python 3.13+
python --version

# Install dependencies
pip install -r requirements.txt

# Install the package
pip install -e .

🔧 Quick Start

Basic Name-Based Prediction

from firstname_to_nationality import FirstnameToNationality

# Initialize the predictor
predictor = FirstnameToNationality()

# Predict nationality for a single name
result = predictor.predict_single("Giuseppe Rossi", top_n=3)
print(result)  # [('Italian', 0.85), ('Spanish', 0.12), ...]

# Batch prediction
names = ["John Smith", "Maria Rodriguez", "Zhang Wei"]
results = predictor(names, top_n=2)

for name, predictions in results:
    nationality, confidence = predictions[0]
    print(f"{name}{nationality} ({confidence:.2f})")

City-Based Prediction (New!)

from firstname_to_nationality import CityToNationality

# Initialize the city-based predictor
predictor = CityToNationality()

# Predict with city information (more accurate)
result = predictor("Maria Garcia", cities="Barcelona")
print(result)  # Spanish (from Barcelona, Spain)

# Fallback to name-based prediction if no city
result = predictor("Maria Garcia")
print(result)  # Uses ML model on name

# Batch prediction with cities
names = ["John Smith", "Luigi Ferrari", "Zhang Wei"]
cities = ["London", "Milan", "Beijing"]
results = predictor(names, cities=cities)

for item in results:
    name = item["name"]
    pred = item["predictions"][0]
    print(f"{name} from {item['city']}{pred['nationality']} ({pred['country_code']})")

🧪 Examples

Run the example scripts:

# Basic name-based prediction
python example.py

# Country code mapping
python example_country.py

# City-based prediction with geocoding
python example_city.py

🔥 Training Your Own Model

Using Sample Data

python nationality_trainer.py

Using Your Own Data

Create a CSV file with name and nationality columns:

name,nationality
John Smith,American
Giuseppe Rossi,Italian
Hiroshi Tanaka,Japanese

Then train:

python nationality_trainer.py your_data.csv

Creating a Dictionary

python nationality_trainer.py --dict

🏗️ Architecture

The implementation consists of:

  • FirstnameToNationality: Main predictor class with scikit-learn backend
  • FirstnameToCountry: Maps nationalities to country codes
  • CityToNationality: City-based prediction with geocoding fallback to name-based
  • NamePreprocessor: Advanced name preprocessing and normalization
  • PredictionResult: Type-safe prediction results using dataclasses
  • Model Pipeline: TF-IDF vectorization + Logistic Regression

📁 File Structure

The implementation uses these file paths:

  • firstname_to_nationality/best-model.pt: Model checkpoint file
  • firstname_to_nationality/firstname_nationalities.pkl: Name-to-nationality dictionary

� Usage Examples

Basic Usage

from firstname_to_nationality import FirstnameToNationality
predictor = FirstnameToNationality()
results = predictor(["John Smith"])

Advanced Features

# Type-safe single predictions
result = predictor.predict_single("John Smith", top_n=3)

# Training interface
predictor.train(names, nationalities, save_model=True)

# Dictionary management
predictor.save_dictionary(name_dict)

🐳 Development with Docker

Dev Container

The repository includes a complete dev container setup for VS Code:

# Open in VS Code
code .
# Click "Reopen in Container" when prompted

Manual Docker

# Build
docker build -f .devcontainer/Dockerfile -t firstname-to-nationality .

# Run
docker run -it --rm -v $(pwd):/workspace firstname-to-nationality

⚡ Performance

The implementation offers:

  • Fast training with scikit-learn
  • Memory efficiency
  • Batch processing support
  • Python optimizations

🧬 Dependencies

Core Requirements:

  • Python 3.13+
  • scikit-learn >= 1.3.0
  • numpy >= 1.25.0
  • pandas >= 2.0.0
  • joblib >= 1.3.0
  • geopy >= 2.3.0 (for city-based predictions)

Development:

  • pytest, black, isort, pylint, mypy

🤝 Contributing

  1. Use the dev container for consistent environment
  2. Follow type hints throughout
  3. Run tests: pytest
  4. Format code: black . && isort .
  5. Check types: mypy firstname_to_nationality/

Automated Release Process

This repository uses a fully automated release workflow:

  1. Push your code to the main branch
  2. Version is automatically bumped based on conventional commit messages
  3. GitHub release is created automatically with AI-generated release notes
  4. Package is published to PyPI automatically

For more details, see .github/WORKFLOW_SETUP.md.

Commit Message Format

Use conventional commits for automatic version bumping:

  • fix: description → Patch version bump (1.0.0 → 1.0.1)
  • feat: description → Minor version bump (1.0.0 → 1.1.0)
  • feat!: description → Major version bump (1.0.0 → 2.0.0)

Setting Up GitHub Actions Workflows

If you're a maintainer and need to set up the auto-version-bump workflow, see .github/WORKFLOW_SETUP.md for detailed instructions on configuring the required GitHub App for authentication.

📄 License

MIT License

� Implementation Details

This is a complete implementation with:

  • ✅ Consistent method signatures
  • ✅ Reliable file handling
  • ✅ Robust prediction results
  • ✅ Efficient model format
  • ✅ Minimal dependencies

🎯 Roadmap

  • Transformer-based models support
  • REST API server
  • Web interface
  • Multi-language support
  • Advanced evaluation metrics

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

firstname_to_nationality-1.1.11.tar.gz (27.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

firstname_to_nationality-1.1.11-py3-none-any.whl (27.2 MB view details)

Uploaded Python 3

File details

Details for the file firstname_to_nationality-1.1.11.tar.gz.

File metadata

File hashes

Hashes for firstname_to_nationality-1.1.11.tar.gz
Algorithm Hash digest
SHA256 50b434b6ee554b3a8cd72f3ab45c7c346d80e12a161bc6d86740fa9a17c299ca
MD5 5d2cac304a330e3b870c594b7666caff
BLAKE2b-256 e681e3cae40f9e4effa007987f6e1b74180532d4411adf8fe6c386522b007dbd

See more details on using hashes here.

Provenance

The following attestation bundles were made for firstname_to_nationality-1.1.11.tar.gz:

Publisher: publish.yml on callidio/firstname_to_nationality

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file firstname_to_nationality-1.1.11-py3-none-any.whl.

File metadata

File hashes

Hashes for firstname_to_nationality-1.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 f4c35f11ce590eeb0ece02bc84127e90ba91a0cb2a57889dc55fe1dae20a3cc2
MD5 f5dcc28a8f7f5d896c2269878fa9fa51
BLAKE2b-256 f0ef1624ee2c9ef63b19c2982efb2702cd7eb6416f48bb5319829d3f76c1d5eb

See more details on using hashes here.

Provenance

The following attestation bundles were made for firstname_to_nationality-1.1.11-py3-none-any.whl:

Publisher: publish.yml on callidio/firstname_to_nationality

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page