Skip to main content

Human Names for Synthetic Data Generation

Project description

Global Names

Overview

This repository contains a collection of name datasets gathered from multiple sources, including government records, historical documents, and public databases. The data has been cleaned, standardized, and organized to facilitate easy use in data science projects, demographic studies, and name-related research.

Features

  • Extensive Name Collection: A wide range of first and last names from multiple countries and cultures.
  • Python Tools: Utilities and scripts to efficiently work with the name datasets.
  • Easy Integration: Simple and intuitive integration with your Python projects.
  • Open Source: Released under the MIT License, allowing for free use and distribution.

Installation

To use global_names in your project, you can install from PyPI using pip.

pip install global-names

Install from source

git clone https://github.com/DecisionNerd/global_names
cd global_names
pip install -r requirements.txt

Usage

Loading the Datasets

You can easily load the name datasets using the provided Python tools. Below is an example of how to load and use the data.

import pandas as pd

# Load last names from a specific country
last_names_df = pd.read_csv('data/last_names/usa.csv')

# Display the first few entries
print(last_names_df.head())

Searching for a Name

The tools provided also include functionality to search for specific names within the datasets.

from tools import name_search

# Search for a specific last name
results = name_search.search_last_name('Smith', 'usa')
print(results)

Generating Random Names

You can also generate random names using the datasets for purposes such as testing or anonymizing data.

from tools import name_generator

# Generate a random full name
random_name = name_generator.generate_random_name('usa')
print(random_name)

Contributing

We welcome contributions from the community! If you would like to contribute, please follow these steps:

  1. Fork the repository.
  2. Create a new branch for your feature or bugfix.
  3. Make your changes.
  4. Submit a pull request with a detailed description of your changes.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgements

A big thank you to Matthew Hager smashew for his original work on the text databases of last names. This project builds upon the foundation he created.

A thank you to Philippe Rémy philipperemy for posting the username dataset. While Philippe's work is not integrated into this project, his data preparation has greatly accelerated our work on NamesData.

Contact

For any questions or suggestions, feel free to open an issue or start a discussion thread or contact us at hello@frontieranalytica.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

global_names-0.0.2.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

global_names-0.0.2-py3-none-any.whl (3.5 kB view details)

Uploaded Python 3

File details

Details for the file global_names-0.0.2.tar.gz.

File metadata

  • Download URL: global_names-0.0.2.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for global_names-0.0.2.tar.gz
Algorithm Hash digest
SHA256 27b0ef67f9b79ed168157b1b5c520dc761311de7d94f5ca9dc32656af3b396d4
MD5 8599db0b0545916f25b1f705d690e7f3
BLAKE2b-256 1de9aa4ceafcd5a1d22361f5736b062ac7569ee9d7488d11f3b9fb909c3f3f83

See more details on using hashes here.

File details

Details for the file global_names-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for global_names-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 050c5554e5805a92ffae4cc592176b047b06df793f9ef50490566caa2c11338d
MD5 980ce9d0ad260e1aaa835d8433a2eef0
BLAKE2b-256 9b7f23ce26e64c5560d28475263142c4b2885e8a260197581db9a5ac46dc5a34

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page