Skip to main content

DataGen is a library for generating test data.

Project description

DataGen-kuma

DataGen-kuma is a library for generating test data.
It creates similar data with the same schema based on a Pandas DataFrame.

How It Works

DataGen-kuma takes a DataFrame as input and generates random test data.
Internally, it generates statistical metrics for each data type to facilitate data generation.
Using these metrics, it produces similar data appropriate for each data type.

Data Classification and Generation

  • Numeric: Numeric data. Generates random values using Kernel Density Estimation (KDE) technique. The kernel density function uses gaussian_kde from scipy.stats.
  • Category: Categorical data. Measures the frequency of each value and generates values according to these frequencies.
  • Datetime: Date data following the ISO-8601 standard. Converts to Pandas Timestamps and generates random values within the given date range.
  • Boolean: Boolean data. Measures the frequency of each value and generates values according to these frequencies.
  • ETC: All other data types not mentioned above. Generates data by randomly sampling from the given values with replacement.

Usage

Assuming you have a Pandas DataFrame named df.
This example generates 100,000 rows of data.
The generated object allows access to each row through iteration.

from datagen_kuma.datagen import DataGen

datagen = DataGen(df=df, count=100_000)
for idx, row in datagen:
    print(idx, row)

To retrieve the generated DataFrame, use the following:

generated_df = datagen.dataframe

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datagen_kuma-0.0.1.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

datagen_kuma-0.0.1-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file datagen_kuma-0.0.1.tar.gz.

File metadata

  • Download URL: datagen_kuma-0.0.1.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.11

File hashes

Hashes for datagen_kuma-0.0.1.tar.gz
Algorithm Hash digest
SHA256 f0ec54bac8a175b7ddf847a3196048645e63841b591c3f163048b666503ecfcc
MD5 a0827cb883a3cec5c93a6ceee244b333
BLAKE2b-256 ec49f80e76cbd677e5b9be6267681c2fa072f31e0907dc73dbe7abc19238f68a

See more details on using hashes here.

File details

Details for the file datagen_kuma-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: datagen_kuma-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.11

File hashes

Hashes for datagen_kuma-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5594b6a98af0170462e3952ce0538613bff5fef9b4e686b90d66eb76b2c09385
MD5 8b1c75868caa06b7a685bf4531fa810e
BLAKE2b-256 88e515e7a66ffde9c5850ff9451483cc9304c09d287eccbdea8c95cd4952ca95

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page