DataGen is a library for generating test data.
Project description
DataGen-kuma
DataGen-kuma is a library for generating test data.
It creates similar data with the same schema based on a Pandas DataFrame.
How It Works
DataGen-kuma takes a DataFrame as input and generates random test data.
Internally, it generates statistical metrics for each data type to facilitate data generation.
Using these metrics, it produces similar data appropriate for each data type.
Data Classification and Generation
- Numeric: Numeric data. Generates random values using Kernel Density Estimation (KDE) technique. The kernel density function uses gaussian_kde from scipy.stats.
- Category: Categorical data. Measures the frequency of each value and generates values according to these frequencies.
- Datetime: Date data following the ISO-8601 standard. Converts to Pandas Timestamps and generates random values within the given date range.
- Boolean: Boolean data. Measures the frequency of each value and generates values according to these frequencies.
- ETC: All other data types not mentioned above. Generates data by randomly sampling from the given values with replacement.
Usage
Assuming you have a Pandas DataFrame named df.
This example generates 100,000 rows of data.
The generated object allows access to each row through iteration.
from datagen_kuma.datagen import DataGen
datagen = DataGen(df=df, count=100_000)
for idx, row in datagen:
print(idx, row)
To retrieve the generated DataFrame, use the following:
generated_df = datagen.dataframe
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datagen_kuma-0.0.1.tar.gz
.
File metadata
- Download URL: datagen_kuma-0.0.1.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0ec54bac8a175b7ddf847a3196048645e63841b591c3f163048b666503ecfcc |
|
MD5 | a0827cb883a3cec5c93a6ceee244b333 |
|
BLAKE2b-256 | ec49f80e76cbd677e5b9be6267681c2fa072f31e0907dc73dbe7abc19238f68a |
File details
Details for the file datagen_kuma-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: datagen_kuma-0.0.1-py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5594b6a98af0170462e3952ce0538613bff5fef9b4e686b90d66eb76b2c09385 |
|
MD5 | 8b1c75868caa06b7a685bf4531fa810e |
|
BLAKE2b-256 | 88e515e7a66ffde9c5850ff9451483cc9304c09d287eccbdea8c95cd4952ca95 |