Skip to main content

Synthetic data generation.

Project description

Welcome to LINDA

MELINDA is a python library for creating tabular synthetic data. It uses various generative models in artificial intelligence to learn statistical properties from your real data and use them to generate synthetic data.

Installation

git clone https://github.com/hse-cs/LINDA.git
cd LINDA
pip install -e .

or

poetry install

Basic usage

The following code snippet creates an example of real data, fits a generative model, and samples synthetic data.

import numpy as np
import pandas as pd
from melinda.models import ProbaformsSynthesizer
from probaforms.models import CVAE

# generate an example of real data
n = 100
data_real = pd.DataFrame()
data_real['col_1'] = np.random.rand(n)
data_real['col_2'] = np.random.rand(n)
data_real['col_3'] = [str(i) for i in np.random.randint(0, 10, n)]
data_real['col_4'] = [str(i) for i in np.random.randint(0, 5, n)]

num_cols = ['col_1', 'col_2']
cat_cols = ['col_3', 'col_4']
lab_cols = None

# fit a generative model
model = CVAE(latent_dim=10, hidden=(10,), lr=0.001, n_epochs=10)
gen = ProbaformsSynthesizer(model, num_cols, cat_cols, lab_cols, cat_transform='OneHotEncoder')
gen.fit(data_real)

# sample synthetic data
data_synthetic = gen.sample(n_samples=10)
data_synthetic.head()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

melinda-0.1.0.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

melinda-0.1.0-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file melinda-0.1.0.tar.gz.

File metadata

  • Download URL: melinda-0.1.0.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for melinda-0.1.0.tar.gz
Algorithm Hash digest
SHA256 98a94471b3f9c83f6b11583890b47feef8d71ce1c93de39af9e2bc29c99e7391
MD5 9c3e7d3d0cc30f7a365c915708dacc61
BLAKE2b-256 cae88fe3b12e6f07fc32d263372c9253cb1a11208807a7264432947bcf09df24

See more details on using hashes here.

File details

Details for the file melinda-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: melinda-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for melinda-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ebf005f6088b879cf07797b9e534b04d3e30b779217bf0a9d7b4a14d036409e3
MD5 b4b0e02a633e359a9c3a33dd7ba3ceca
BLAKE2b-256 5906bc716529365f12ffe66af0e3f439eb9c71e33aeda5078c9d7fbaff168fd5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page