Skip to main content

A robust and simple library for generating synthetic datasets for ML/DL projects.

Project description

data_genix

A robust and simple library for generating synthetic datasets for machine learning and deep learning projects. Avoid the hassle of downloading and managing data files for testing and prototyping.

Installation

Clone the repository and install using pip:

git clone [https://github.com/yourusername/data_genix.git](https://github.com/yourusername/data_genix.git)
cd data_genix
pip install .

Quick Start

Generate a DataFrame with a variety of data types with a single function call.

from data_genix import DataGenerator

# Initialize the generator
generator = DataGenerator()

# Generate a dataset with 1000 rows
df = generator.generate(
    num_rows=1000,
    numerical_whole=3,
    decimal=2,
    categorical=2,
    ordinal=1,
    boolean=1,
    datetime=1,
    text=1,
    uuid=1,
    object_types=['name', 'country', 'email', 'job']
)

print(df.head())
print(df.info())

Features

  • Numerical Data: Generate columns of whole numbers (integers) or decimals (floats).
  • Categorical Data: Generate columns with a predefined set of unordered categories.
  • Ordinal Data: Generate columns with a predefined set of ordered categories.
  • Boolean Data: Generate columns of True/False values.
  • Datetime Data: Generate columns with datetime objects.
  • Text Data: Generate columns with random sentences.
  • ID Data: Generate columns with unique identifiers (UUIDs).
  • Coordinates: Generate paired latitude and longitude columns.
  • Web Data: Generate columns for IP addresses, URLs, and phone numbers.
  • Nested Data: Generate columns containing JSON-formatted strings.
  • Object/Text Data: Leverage the power of the Faker library to generate realistic text data like names, addresses, emails, and much more.

Supported object_types

You can use any standard Faker provider method name as a string in the object_types list. Common examples include:

  • name
  • email
  • address
  • country
  • city
  • job
  • text
  • datetime
  • phone_number
  • company
  • url
  • credit_card_number

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datagenix-0.1.1.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datagenix-0.1.1-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file datagenix-0.1.1.tar.gz.

File metadata

  • Download URL: datagenix-0.1.1.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for datagenix-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c2639b9e27aa176da2aecf37f0cb1c9be1927ce7c62e85ae43dbe80128974287
MD5 8b43d178c1b4d550aad5f45897739fd2
BLAKE2b-256 1629626c914664356793e25b0b37fc3f63a167e349660403dcf271e97fd193b7

See more details on using hashes here.

File details

Details for the file datagenix-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: datagenix-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for datagenix-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f5c6a60d9ce17ca6d0b3a3ef67228ddeb70650e7a49c2094898dd3783fab67ce
MD5 3b68b64ee8c4e640cb6d646385f8f6b1
BLAKE2b-256 b2a836c1260c51ad1417cf59fc4d248221ed176e457245b39b44aad6951abfa7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page