Skip to main content

DataGen is a library for generating test data.

Project description

DataGen-kuma

DataGen-kuma is a library for generating test data.
It creates similar data with the same schema based on a Pandas DataFrame.

How It Works

DataGen-kuma takes a DataFrame as input and generates random test data.
Internally, it generates statistical metrics for each data type to facilitate data generation.
Using these metrics, it produces similar data appropriate for each data type.

Data Classification and Generation

  • Numeric: Numeric data. Generates random values using Kernel Density Estimation (KDE) technique. The kernel density function uses gaussian_kde from scipy.stats.
  • Category: Categorical data. Measures the frequency of each value and generates values according to these frequencies.
  • Datetime: Date data following the ISO-8601 standard. Converts to Pandas Timestamps and generates random values within the given date range.
  • Boolean: Boolean data. Measures the frequency of each value and generates values according to these frequencies.
  • ETC: All other data types not mentioned above. Generates data by randomly sampling from the given values with replacement.

Usage

Assuming you have a Pandas DataFrame named df.
This example generates 100,000 rows of data.
The generated object allows access to each row through iteration.

from datagen_kuma.datagen import DataGen

datagen = DataGen(df=df, count=100_000)
for idx, row in datagen:
    print(idx, row)

To retrieve the generated DataFrame, use the following:

generated_df = datagen.dataframe

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datagen_kuma-0.0.2.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

datagen_kuma-0.0.2-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file datagen_kuma-0.0.2.tar.gz.

File metadata

  • Download URL: datagen_kuma-0.0.2.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.11

File hashes

Hashes for datagen_kuma-0.0.2.tar.gz
Algorithm Hash digest
SHA256 96cf0aaea116c61ce5a6f3329c3b5a291910136e349621203e71c4871bab3814
MD5 70cf8e929147893b497847854be3cac9
BLAKE2b-256 1f186d551ddea2e71777257344307f44dde27873326f2c8f5d7d4131ab62b22f

See more details on using hashes here.

File details

Details for the file datagen_kuma-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: datagen_kuma-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.11

File hashes

Hashes for datagen_kuma-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 eae1fa6d0126b09085d616ef2a54e49a9357a6fa889f55bf61d9b7cedd4abe04
MD5 1c1bb41ed2bacf46b8d017aaeb7ffead
BLAKE2b-256 7b2b99c39acfe200e5d0a5b0b440e2f1adb39bcbaf101eaedc7f10b966fe7755

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page