Skip to main content

Synthetic data generation.

Project description

Welcome to LINDA

MELINDA is a python library for creating tabular synthetic data. It uses various generative models in artificial intelligence to learn statistical properties from your real data and use them to generate synthetic data.

Installation

git clone https://github.com/hse-cs/LINDA.git
cd LINDA
pip install -e .

or

poetry install

Basic usage

The following code snippet creates an example of real data, fits a generative model, and samples synthetic data.

import numpy as np
import pandas as pd
from melinda.models import ProbaformsSynthesizer
from probaforms.models import CVAE

# generate an example of real data
n = 100
data_real = pd.DataFrame()
data_real['col_1'] = np.random.rand(n)
data_real['col_2'] = np.random.rand(n)
data_real['col_3'] = [str(i) for i in np.random.randint(0, 10, n)]
data_real['col_4'] = [str(i) for i in np.random.randint(0, 5, n)]

num_cols = ['col_1', 'col_2']
cat_cols = ['col_3', 'col_4']
lab_cols = None

# fit a generative model
model = CVAE(latent_dim=10, hidden=(10,), lr=0.001, n_epochs=10)
gen = ProbaformsSynthesizer(model, num_cols, cat_cols, lab_cols, cat_transform='OneHotEncoder')
gen.fit(data_real)

# sample synthetic data
data_synthetic = gen.sample(n_samples=10)
data_synthetic.head()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

melinda-0.1.1.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

melinda-0.1.1-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file melinda-0.1.1.tar.gz.

File metadata

  • Download URL: melinda-0.1.1.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for melinda-0.1.1.tar.gz
Algorithm Hash digest
SHA256 833a8197415302f222918a1e5e20fb7cfad648644f06f61b085125f2a8cf8a6f
MD5 34a75ead1682d122301291479367ad9e
BLAKE2b-256 571f2e8644b7aa72b3041868c4129e7ac51ee479f4b33aac111d483eeff9120c

See more details on using hashes here.

File details

Details for the file melinda-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: melinda-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for melinda-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 53b9214565e2a28804f0763e4226494d9bf2acdd8e7abb25e0c4607081cdb85f
MD5 4c64205214f79bf07e1a3ee2465b4637
BLAKE2b-256 adffb2c3049b925429f776efe0306eeae9bc801bd88c2b8bb6ede389575be764

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page