Skip to main content

A tool for relational data generation

Project description

RelGen v0.1

RelGen

Unit Tests E2E Tests Colab License

RelGen is the abbreviation of Relation Generation. This tool is used to generate relational data in databases. On the other hand, the pronunciation of Rel is similar to Real, which means that the relational data generated by RelGen is very real.

Overview

RelGen is a Python library designed to generate real relational data for users. RelGen uses a variety of advanced deep generative models and algorithms to learn data distribution from real data and generate high-quality simulation data.

RelGen v0.1
Figure: RelGen Overall Architecture

Features

Generate relational data using deep generative models. RelGen provides a variety of deep generative models, including Generative Adversarial Network (GAN), Autoregressive Model (AR Model) and Diffusion Model.

Generate data for multiple relational tables. RelGen can flexibly generate data for multiple relational tables in the database, so that the data distribution of each generated table is close to that of the original table, and the joined table can also be similar to the original joined table.

Evaluate the quality of generated relational data. RelGen evaluates the generated relational data in terms of fidelity, privacy, and diversity, and visualizes the quality of the generated relational data using histogram and t-SNE plot.

Installation

RelGen requires Python version 3.7 or later.

RelGen requires torch version 1.7.0 or later. If you want to use RelGen with GPU, please ensure that CUDA or cudatoolkit version is 9.2 or later. This requires NVIDIA driver version >= 396.26 (for Linux) or >= 397.44 (for Windows10).

Install from conda

Install from source

git clone https://github.com/ruc-datalab/RelGen.git && cd RelGen
pip install -r requirements.txt

Quick-Start

Load Dataset

Load a demo dataset to get started. This dataset is a single table describing the census.

Load metadata for the census dataset.

from relgen.data.metadata import Metadata

metadata = Metadata()
metadata.load_from_json("datasets/census/metadata.json")

Load data for the census dataset.

import pandas as pd

data = {
    "census": pd.read_csv("datasets/census/census.csv")
}

RelGen v0.1

Encapsulate the census dataset and process it.

from relgen.data.dataset import Dataset

dataset = Dataset(metadata)
dataset.fit(data)

Generating Data

Train the synthesizer.

from relgen.synthesizer.arsynthesizer import MADESynthesizer

synthesizer = MADESynthesizer(dataset)
synthesizer.fit(data)

Generate relational data.

sampled_data = synthesizer.sample()

RelGen v0.1

Evaluating Data

Compare real data and generated data to evaluate the quality of generated data.

from relgen.evaluator import Evaluator

evaluator = Evaluator(data["census"], sampled_data["census"])

Show comparison histogram of data distribution between real data and generated data.

evaluator.eval_histogram(columns=["age", "sex", "relationship"])

RelGen v0.1

Show comparison t-SNE plot of data distribution between real data and generated data.

evaluator.eval_tsne()

RelGen v0.1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

relgen-0.1.0.tar.gz (58.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

relgen-0.1.0-py3-none-any.whl (71.3 kB view details)

Uploaded Python 3

File details

Details for the file relgen-0.1.0.tar.gz.

File metadata

  • Download URL: relgen-0.1.0.tar.gz
  • Upload date:
  • Size: 58.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.11

File hashes

Hashes for relgen-0.1.0.tar.gz
Algorithm Hash digest
SHA256 16379ae8ee8b71b82d7de66c54a71a38c7a936ea15886e0e40658b7c57925b82
MD5 d346e70ee3dc87fc432cdd96be52fe1c
BLAKE2b-256 2cdd9fcd591b25a72eb3986397d1e70a8a90a2e5f3ebf986b7f502f1e92414fe

See more details on using hashes here.

File details

Details for the file relgen-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: relgen-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 71.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.11

File hashes

Hashes for relgen-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3077ac6a8836d1397ac348b10c94efc133436c15c283b5029bb0b218085f3529
MD5 4ee4f4ec7b21e9fd744a77de6a661100
BLAKE2b-256 40498843218b6d60f97a54d6c39c6523d6bd24e50b09eb81b85033cca8f634e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page