Nationality Prediction from Firstname using Python 3.13 and scikit-learn
Project description
Firstname to Nationality - Python 3.13 Implementation
A name-to-nationality prediction library for Python 3.13+ using machine learning libraries.
🚀 Features
This library provides the following capabilities:
- ✅ Python 3.13+ Compatible: Uses Python features and type hints
- ✅ ML Stack: Built with scikit-learn for performance and compatibility
- ✅ City-Based Prediction: Use geopy geocoding for city-based nationality prediction
- ✅ Type Safety: Full type hints and dataclasses throughout
- ✅ Error Handling: Robust error handling and fallbacks
- ✅ Dev Container Ready: Includes VS Code dev container configuration
- ✅ Flexible Training: Easy model training with your own data
- ✅ Batch Processing: Efficient batch prediction support
📦 Installation
Using the Dev Container (Recommended)
- Open in VS Code
- When prompted, click "Reopen in Container"
- The dev container will build automatically with Python 3.13
Manual Installation
# Ensure you have Python 3.13+
python --version
# Install dependencies
pip install -r requirements.txt
# Install the package
pip install -e .
🔧 Quick Start
Basic Name-Based Prediction
from firstname_to_nationality import FirstnameToNationality
# Initialize the predictor
predictor = FirstnameToNationality()
# Predict nationality for a single name
result = predictor.predict_single("Giuseppe Rossi", top_n=3)
print(result) # [('Italian', 0.85), ('Spanish', 0.12), ...]
# Batch prediction
names = ["John Smith", "Maria Rodriguez", "Zhang Wei"]
results = predictor(names, top_n=2)
for name, predictions in results:
nationality, confidence = predictions[0]
print(f"{name} → {nationality} ({confidence:.2f})")
City-Based Prediction (New!)
from firstname_to_nationality import CityToNationality
# Initialize the city-based predictor
predictor = CityToNationality()
# Predict with city information (more accurate)
result = predictor("Maria Garcia", cities="Barcelona")
print(result) # Spanish (from Barcelona, Spain)
# Fallback to name-based prediction if no city
result = predictor("Maria Garcia")
print(result) # Uses ML model on name
# Batch prediction with cities
names = ["John Smith", "Luigi Ferrari", "Zhang Wei"]
cities = ["London", "Milan", "Beijing"]
results = predictor(names, cities=cities)
for item in results:
name = item["name"]
pred = item["predictions"][0]
print(f"{name} from {item['city']} → {pred['nationality']} ({pred['country_code']})")
🧪 Examples
Run the example scripts:
# Basic name-based prediction
python example.py
# Country code mapping
python example_country.py
# City-based prediction with geocoding
python example_city.py
🔥 Training Your Own Model
Using Sample Data
python nationality_trainer.py
Using Your Own Data
Create a CSV file with name and nationality columns:
name,nationality
John Smith,American
Giuseppe Rossi,Italian
Hiroshi Tanaka,Japanese
Then train:
python nationality_trainer.py your_data.csv
Creating a Dictionary
python nationality_trainer.py --dict
🏗️ Architecture
The implementation consists of:
FirstnameToNationality: Main predictor class with scikit-learn backendFirstnameToCountry: Maps nationalities to country codesCityToNationality: City-based prediction with geocoding fallback to name-basedNamePreprocessor: Advanced name preprocessing and normalizationPredictionResult: Type-safe prediction results using dataclasses- Model Pipeline: TF-IDF vectorization + Logistic Regression
📁 File Structure
The implementation uses these file paths:
firstname_to_nationality/best-model.pt: Model checkpoint filefirstname_to_nationality/firstname_nationalities.pkl: Name-to-nationality dictionary
� Usage Examples
Basic Usage
from firstname_to_nationality import FirstnameToNationality
predictor = FirstnameToNationality()
results = predictor(["John Smith"])
Advanced Features
# Type-safe single predictions
result = predictor.predict_single("John Smith", top_n=3)
# Training interface
predictor.train(names, nationalities, save_model=True)
# Dictionary management
predictor.save_dictionary(name_dict)
🐳 Development with Docker
Dev Container
The repository includes a complete dev container setup for VS Code:
# Open in VS Code
code .
# Click "Reopen in Container" when prompted
Manual Docker
# Build
docker build -f .devcontainer/Dockerfile -t firstname-to-nationality .
# Run
docker run -it --rm -v $(pwd):/workspace firstname-to-nationality
⚡ Performance
The implementation offers:
- Fast training with scikit-learn
- Memory efficiency
- Batch processing support
- Python optimizations
🧬 Dependencies
Core Requirements:
- Python 3.13+
- scikit-learn >= 1.3.0
- numpy >= 1.25.0
- pandas >= 2.0.0
- joblib >= 1.3.0
- geopy >= 2.3.0 (for city-based predictions)
Development:
- pytest, black, isort, pylint, mypy
🤝 Contributing
- Use the dev container for consistent environment
- Follow type hints throughout
- Run tests:
pytest - Format code:
black . && isort . - Check types:
mypy firstname_to_nationality/
Automated Release Process
This repository uses a fully automated release workflow:
- Push your code to the
mainbranch - Version is automatically bumped based on conventional commit messages
- GitHub release is created automatically with AI-generated release notes
- Package is published to PyPI automatically
For more details, see .github/WORKFLOW_SETUP.md.
Commit Message Format
Use conventional commits for automatic version bumping:
fix: description→ Patch version bump (1.0.0 → 1.0.1)feat: description→ Minor version bump (1.0.0 → 1.1.0)feat!: description→ Major version bump (1.0.0 → 2.0.0)
Setting Up GitHub Actions Workflows
If you're a maintainer and need to set up the auto-version-bump workflow, see .github/WORKFLOW_SETUP.md for detailed instructions on configuring the required GitHub App for authentication.
📄 License
MIT License
� Implementation Details
This is a complete implementation with:
- ✅ Consistent method signatures
- ✅ Reliable file handling
- ✅ Robust prediction results
- ✅ Efficient model format
- ✅ Minimal dependencies
🎯 Roadmap
- Transformer-based models support
- REST API server
- Web interface
- Multi-language support
- Advanced evaluation metrics
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file firstname_to_nationality-1.1.5.tar.gz.
File metadata
- Download URL: firstname_to_nationality-1.1.5.tar.gz
- Upload date:
- Size: 27.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9dd35fe4052e975a17d829fc9e07d2ec0b3aad7e01f0adb83e38853d72fe8a49
|
|
| MD5 |
3f942faf4b01954ad7bc7875dca6c59a
|
|
| BLAKE2b-256 |
6c72047ac04cdc718a0b749da3890cddeda62fd7010c526c2ad95d7a71ad0032
|
Provenance
The following attestation bundles were made for firstname_to_nationality-1.1.5.tar.gz:
Publisher:
publish.yml on callidio/firstname_to_nationality
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
firstname_to_nationality-1.1.5.tar.gz -
Subject digest:
9dd35fe4052e975a17d829fc9e07d2ec0b3aad7e01f0adb83e38853d72fe8a49 - Sigstore transparency entry: 773265508
- Sigstore integration time:
-
Permalink:
callidio/firstname_to_nationality@d58c0bbe66fe5dc3b8cf7a710f9314e5a430f7d8 -
Branch / Tag:
refs/tags/1.1.5 - Owner: https://github.com/callidio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d58c0bbe66fe5dc3b8cf7a710f9314e5a430f7d8 -
Trigger Event:
release
-
Statement type:
File details
Details for the file firstname_to_nationality-1.1.5-py3-none-any.whl.
File metadata
- Download URL: firstname_to_nationality-1.1.5-py3-none-any.whl
- Upload date:
- Size: 27.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31981973d14f7fb939cdb1174d7a4f69b1d3776a91b32273c33b0b506e9786af
|
|
| MD5 |
1ff97e298e488eec694ffe4347350399
|
|
| BLAKE2b-256 |
75d7bc1df9a711e320b77ba1735dd770edd43d6d8c7c8f3a359c533e17d08a5b
|
Provenance
The following attestation bundles were made for firstname_to_nationality-1.1.5-py3-none-any.whl:
Publisher:
publish.yml on callidio/firstname_to_nationality
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
firstname_to_nationality-1.1.5-py3-none-any.whl -
Subject digest:
31981973d14f7fb939cdb1174d7a4f69b1d3776a91b32273c33b0b506e9786af - Sigstore transparency entry: 773265522
- Sigstore integration time:
-
Permalink:
callidio/firstname_to_nationality@d58c0bbe66fe5dc3b8cf7a710f9314e5a430f7d8 -
Branch / Tag:
refs/tags/1.1.5 - Owner: https://github.com/callidio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d58c0bbe66fe5dc3b8cf7a710f9314e5a430f7d8 -
Trigger Event:
release
-
Statement type: