⚡ Ultra-fast name-to-gender prediction engine. Uses mmap and binary search for <1ms lookups. Features a pre-compiled 4.6MB binary database covering 700k+ global entries. Built for scale.
Project description
gender-detect
A high-performance, binary-search based gender detection library and CLI tool. This project uses a pre-compiled binary database to predict gender and country of origin based on first names with sub-millisecond latency using memory mapping (mmap).
Features
- Extreme Speed: Uses binary search (O(log n)) on a packed binary database with
mmapfor zero-copy lookups. - Zero-Dependency: Built entirely using Python standard libraries.
- CLI Ready: Includes a built-in table-formatted command line interface.
- Privacy Focused: 100% local; no external API calls or data tracking.
Installation
pip install gender-detect
CLI Usage
After installation, you can use the gender-detect command directly from your terminal:
gender-detect John
For automation, you can output the result in raw JSON:
gender-detect John --json
Library Usage
Simple Prediction
Input a name to get a statistical analysis of the likely gender and primary origin.
from gender_detect import GenderDetector
gd = GenderDetector()
result = gd.predict("John")
print(result)
Response Format
The gender_probability represents the likelihood of the gender being correct based on total global samples.
{
"name": "john",
"likely_gender": "male",
"gender_probability": 0.83,
"top_reported_country": "US",
"data_breakdown": [
{
"country": "US",
"male_samples": 4,
"female_samples": 1
},
{
"country": "GB",
"male_samples": 1,
"female_samples": 0
}
]
}
How it Works
The library utilizes a custom packed binary format (4sHBB):
- 4 bytes: BLAKE2b hash prefix of the name.
- 2 bytes: ISO-3166-1 numeric country code.
- 1 byte: Male sample count.
- 1 byte: Female sample count.
By sorting these 8-byte entries by their hash, the library performs a binary search directly on the file disk/memory, ensuring a tiny memory footprint regardless of database size.
Contribution
Data contributions are managed through contribute.json in the main repository.
- Add your name data to the JSON list.
- Ensure
country_codeis the numeric ISO-3166-1 value. - Submit a Pull Request.
The CI/CD pipeline automatically validates the JSON and recompiles the names.bin database upon merging.
License
MIT - See LICENSE file for details.