Skip to main content

A robust and simple library for generating synthetic datasets for ML/DL projects.

Project description

data_genix

A robust and simple library for generating synthetic datasets for machine learning and deep learning projects. Avoid the hassle of downloading and managing data files for testing and prototyping.

Installation

Clone the repository and install using pip:

git clone [https://github.com/yourusername/data_genix.git](https://github.com/yourusername/data_genix.git)
cd data_genix
pip install .

Quick Start

Generate a DataFrame with a variety of data types with a single function call.

from data_genix import DataGenerator

# Initialize the generator
generator = DataGenerator()

# Generate a dataset with 1000 rows
df = generator.generate(
    num_rows=1000,
    numerical_whole=3,
    decimal=2,
    categorical=2,
    ordinal=1,
    boolean=1,
    datetime=1,
    text=1,
    uuid=1,
    object_types=['name', 'country', 'email', 'job']
)

print(df.head())
print(df.info())

Features

  • Numerical Data: Generate columns of whole numbers (integers) or decimals (floats).
  • Categorical Data: Generate columns with a predefined set of unordered categories.
  • Ordinal Data: Generate columns with a predefined set of ordered categories.
  • Boolean Data: Generate columns of True/False values.
  • Datetime Data: Generate columns with datetime objects.
  • Text Data: Generate columns with random sentences.
  • ID Data: Generate columns with unique identifiers (UUIDs).
  • Coordinates: Generate paired latitude and longitude columns.
  • Web Data: Generate columns for IP addresses, URLs, and phone numbers.
  • Nested Data: Generate columns containing JSON-formatted strings.
  • Object/Text Data: Leverage the power of the Faker library to generate realistic text data like names, addresses, emails, and much more.

Supported object_types

You can use any standard Faker provider method name as a string in the object_types list. Common examples include:

  • name
  • email
  • address
  • country
  • city
  • job
  • text
  • datetime
  • phone_number
  • company
  • url
  • credit_card_number

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datagenix-0.1.0.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datagenix-0.1.0-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file datagenix-0.1.0.tar.gz.

File metadata

  • Download URL: datagenix-0.1.0.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for datagenix-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cda42e1eb722ad2636f9a95417f96fa91f95ea75301973ba4da3ed665278f619
MD5 d8d735b5636eb527754cdd2da8f43f8d
BLAKE2b-256 91ca87187cb4aebe436e0f428c29c29c23079c54a6c6be07a9eef22c374939cd

See more details on using hashes here.

File details

Details for the file datagenix-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: datagenix-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for datagenix-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7563f48fae827c411bfad5be6a3ef8f1df47edf148f2b292cbd4248ae6bca357
MD5 f3aa815fa2770d4869c00ce7aa823e10
BLAKE2b-256 b97211a5c9da1f966e9170acd03246cf8d0f40c6e8a7e53a2ba86b4eacb5eefe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page