Human Names for Synthetic Data Generation
Project description
Global Names
Overview
This repository contains a collection of name datasets gathered from multiple sources, including government records, historical documents, and public databases. The data has been cleaned, standardized, and organized to facilitate easy use in data science projects, demographic studies, and name-related research.
Features
- Extensive Name Collection: A wide range of first and last names from multiple countries and cultures.
- Python Tools: Utilities and scripts to efficiently work with the name datasets.
- Easy Integration: Simple and intuitive integration with your Python projects.
- Open Source: Released under the MIT License, allowing for free use and distribution.
Installation
To use global_names in your project, you can install from PyPI using pip.
pip install global-names
Install from source
git clone https://github.com/DecisionNerd/global_names
cd global_names
pip install -r requirements.txt
Usage
Loading the Datasets
You can easily load the name datasets using the provided Python tools. Below is an example of how to load and use the data.
import pandas as pd
# Load last names from a specific country
last_names_df = pd.read_csv('data/last_names/usa.csv')
# Display the first few entries
print(last_names_df.head())
Searching for a Name
The tools provided also include functionality to search for specific names within the datasets.
from tools import name_search
# Search for a specific last name
results = name_search.search_last_name('Smith', 'usa')
print(results)
Generating Random Names
You can also generate random names using the datasets for purposes such as testing or anonymizing data.
from tools import name_generator
# Generate a random full name
random_name = name_generator.generate_random_name('usa')
print(random_name)
Contributing
We welcome contributions from the community! If you would like to contribute, please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bugfix.
- Make your changes.
- Submit a pull request with a detailed description of your changes.
License
This project is licensed under the MIT License. See the LICENSE file for more details.
Acknowledgements
A big thank you to Matthew Hager smashew for his original work on the text databases of last names. This project builds upon the foundation he created.
A thank you to Philippe Rémy philipperemy for posting the username dataset. While Philippe's work is not integrated into this project, his data preparation has greatly accelerated our work on NamesData.
Contact
For any questions or suggestions, feel free to open an issue or start a discussion thread or contact us at hello@frontieranalytica.com.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file global_names-0.0.2.tar.gz
.
File metadata
- Download URL: global_names-0.0.2.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27b0ef67f9b79ed168157b1b5c520dc761311de7d94f5ca9dc32656af3b396d4 |
|
MD5 | 8599db0b0545916f25b1f705d690e7f3 |
|
BLAKE2b-256 | 1de9aa4ceafcd5a1d22361f5736b062ac7569ee9d7488d11f3b9fb909c3f3f83 |
File details
Details for the file global_names-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: global_names-0.0.2-py3-none-any.whl
- Upload date:
- Size: 3.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 050c5554e5805a92ffae4cc592176b047b06df793f9ef50490566caa2c11338d |
|
MD5 | 980ce9d0ad260e1aaa835d8433a2eef0 |
|
BLAKE2b-256 | 9b7f23ce26e64c5560d28475263142c4b2885e8a260197581db9a5ac46dc5a34 |