Skip to main content

A tool for relational data generation

Project description

RelGen v0.1

RelGen

Unit Tests E2E Tests Colab PyPi Latest Release License

RelGen is the abbreviation of Relation Generation. This tool is used to generate relational data in databases. On the other hand, the pronunciation of Rel is similar to Real, which means that the relational data generated by RelGen is very real.

Overview

RelGen is a Python library designed to generate real relational data for users. RelGen uses a variety of advanced deep generative models and algorithms to learn data distribution from real data and generate high-quality simulation data.

RelGen v0.1
Figure: RelGen Overall Architecture

Features

Generate relational data using deep generative models. RelGen provides a variety of deep generative models, including Generative Adversarial Network (GAN), Autoregressive Model (AR Model) and Diffusion Model.

Generate data for multiple relational tables. RelGen can flexibly generate data for multiple relational tables in the database, so that the data distribution of each generated table is close to that of the original table, and the joined table can also be similar to the original joined table.

Evaluate the quality of generated relational data. RelGen evaluates the generated relational data in terms of fidelity, privacy, and diversity, and visualizes the quality of the generated relational data using histogram and t-SNE plot.

Installation

RelGen requires Python version 3.7 or later.

Install from pip

pip install relgen

Install from source

git clone https://github.com/ruc-datalab/RelGen.git && cd RelGen
pip install -r requirements.txt

Quick-Start

Load Dataset

Load a demo dataset to get started. This dataset is a single table describing the census.

Load metadata for the census dataset.

from relgen.data.metadata import Metadata

metadata = Metadata()
metadata.load_from_json("datasets/census/metadata.json")

Load data for the census dataset.

import pandas as pd

data = {
    "census": pd.read_csv("datasets/census/census.csv")
}

RelGen v0.1

Encapsulate the census dataset and process it.

from relgen.data.dataset import Dataset

dataset = Dataset(metadata)
dataset.fit(data)

Generating Data

Train the synthesizer.

from relgen.synthesizer.arsynthesizer import MADESynthesizer

synthesizer = MADESynthesizer(dataset)
synthesizer.fit(data)

Generate relational data.

sampled_data = synthesizer.sample()

RelGen v0.1

Evaluating Data

Compare real data and generated data to evaluate the quality of generated data.

from relgen.evaluator import Evaluator

evaluator = Evaluator(data["census"], sampled_data["census"])

Show comparison histogram of data distribution between real data and generated data.

evaluator.eval_histogram(columns=["age", "sex", "relationship"])

RelGen v0.1

Show comparison t-SNE plot of data distribution between real data and generated data.

evaluator.eval_tsne()

RelGen v0.1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

relgen-0.1.1.tar.gz (58.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

relgen-0.1.1-py3-none-any.whl (71.3 kB view details)

Uploaded Python 3

File details

Details for the file relgen-0.1.1.tar.gz.

File metadata

  • Download URL: relgen-0.1.1.tar.gz
  • Upload date:
  • Size: 58.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.11

File hashes

Hashes for relgen-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a01acf9bb4227fbce0ce6dd8e11eac0959a879766923df7cb78abc1266088919
MD5 a03892ddc45e13b390a11d8085f16b19
BLAKE2b-256 7120f5cebb85e46801627eac607da224277fe2887fe791a35b4502d63a95c8da

See more details on using hashes here.

File details

Details for the file relgen-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: relgen-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 71.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.11

File hashes

Hashes for relgen-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ad8edf4845e86a4b3a230f985dd469dd18bbc05152f949c0ffaee6f4a53decda
MD5 16ef508d0f01835eb28fc59405c7f88f
BLAKE2b-256 32a6fd8e3c8ea3857f86b1636adecefe75e440bc164be6e31fd910204b29d58a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page