Skip to main content

Analytic generation of datasets with specified statistical characteristics.

Project description

Overview

The AutoGen (analyticsdf) is a Python library that allows you to generate synthetic data with any statistical characteristics desired.

Features

This library provides a set of functionality to enable the specification and generation of a wide range of datasets with specified statistical characteristics. Specification includes the predictor matrix and the response vector.

Some common congifuration:

  • High correlation and multi-collinearity among predictor variables
  • Interaction effects between variables
  • Skewed distributions of predictor and response variables
  • Nonlinear relationships between predictor and response variables

Check the Analyticsdf documentation for more details.

Inspirations

Install

The beta package of this library is publicly available on both PyPI and Anaconda. Install analyticsdf using pip or conda. We recommend using a virtual environment to avoid conflicts with other software on your device.

pip install analyticsdf
conda install -c faye-yufan analyticsdf

Getting Started

Import the dataset generation class from the package, and play with the class functions.

from analyticsdf.analyticsdataframe import AnalyticsDataframe
ad = AnalyticsDataframe(1000, 6)
ad.predictor_matrix.head()

Initialized Predictor Matrix

The predictor matrix is initialized with all null values. Now let's update the predictors with some distributions:

for var in ['X1', 'X2', 'X3', 'X4', 'X5']:
        ad.update_predictor_uniform(var, 0, 100)
ad.update_predictor_categorical('X6', ["Red", "Yellow", "Blue"], [0.3, 0.4, 0.3])

Updated Predictor Matrix

Once we have a dataframe desired and would like to visualize it, we can do:

df_visualization_bi(ad)

Bivariate Visualization Chart

Next Steps

We plan to integrate an user interface to the library, aiming to let users configure, manipulate, and view datasets more easily.

Code Contributors

Contributors

License

AutoGen is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

analyticsdf-0.0.8.3.tar.gz (10.5 kB view details)

Uploaded Source

File details

Details for the file analyticsdf-0.0.8.3.tar.gz.

File metadata

  • Download URL: analyticsdf-0.0.8.3.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.5

File hashes

Hashes for analyticsdf-0.0.8.3.tar.gz
Algorithm Hash digest
SHA256 e4a58df7a2657b151c71c65d8d9d0869823f540b2c4cc405924803b5dfdf6429
MD5 6c69983eec9578b17651f143f7ae2472
BLAKE2b-256 56cd248c9cf566818c81678122477394daee3d95893bd3afd5175ce67bdbdd7e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page