Skip to main content

Synthetic Data Generation and Evaluation

Project description

Build

About

  • SDGnE (Synthetic Data Generation and Evaluation) is a Python package designed to generate synthetic data and evaluate its quality using neural network models.

  • This tool is intended for developers and researchers who require synthetic datasets for testing and development.

  • The current dittto version v1.0.0 uses Autoencoders and SMOTE to generate synthetic data.

Getting Started

pip install sdgne

Notebooks

To get started, we have created notebook for the Autoencoder and SMOTE algorithm.

Auto Encoder

Autoencoders are a class of neural networks designed for unsupervised learning and representing features in a smaller space. They consist of an encoder and a decoder, intending to learn the input data's compressed representation (encoding). We leverage this architecture to generate synthetic data.

Open In Colab

SMOTE

SMOTE, abbreviated as Synthetic Minority Oversampling Technique, is used to generate synthetic data from the original dataset. Over the years, several variants of SMOTE have been developed, each tailored to specific scenarios and requirements. These variants employ distinct methodologies and innovations to enhance the generation of synthetic data, thereby improving model performance by ensuring a more balanced distribution of classes. We provide a few SMOTE variants for synthetic data generation.

Open In Colab

Comparison

In this notebook, we will compare the Single Encoder Autoencoder and the SMOTE Algorithm for synthetic data generation. We will generate synthetic data using both the algorithms and perform statistical evaluation.

Open In Colab

Features

  • Data Generation: Create synthetic datasets that mimic the statistical properties of real-world data.

  • Neural Autoencoders: Utilize various autoencoder architectures to learn data representations.

  • Evaluation Metrics: Assess the quality of synthetic data using built-in evaluation metrics.

  • Extensibility: Easily extend the package with custom data generators and evaluators.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdgne-4.0.0.tar.gz (286.6 kB view details)

Uploaded Source

Built Distribution

sdgne-4.0.0-py3-none-any.whl (293.3 kB view details)

Uploaded Python 3

File details

Details for the file sdgne-4.0.0.tar.gz.

File metadata

  • Download URL: sdgne-4.0.0.tar.gz
  • Upload date:
  • Size: 286.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.2

File hashes

Hashes for sdgne-4.0.0.tar.gz
Algorithm Hash digest
SHA256 c771e7bc81612fa95a783da759a63f30ea44c9a6e88c705f0fdcf2301cd7bcc1
MD5 37827acde501aea8dd44b2693f5ac18e
BLAKE2b-256 5c7caba637683b0930f8aa927168f811de462dbee33d184a981d6092595e4818

See more details on using hashes here.

File details

Details for the file sdgne-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: sdgne-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 293.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.2

File hashes

Hashes for sdgne-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1b83ee659a5da787d08e26417a9218f3aacb994d2017e714228f733a25419b4a
MD5 db7143ca305a311fe27c0e730c2fe36a
BLAKE2b-256 78edc1781b8cc46046186716f0d6e79364b258ea848d1ca9112c2a0a096ef7d2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page