Skip to main content

Add your description here

Project description

GENESIS Core Lib

Python Version License

🧬 Advanced Synthetic Data Generation Library for Python 3.12+

GENESIS Core Lib is a powerful, extensible library for generating high-quality synthetic data using state-of-the-art machine learning models. Perfect for data augmentation, privacy preservation, and ML model testing.

✨ Key Features

  • 🎯 Multiple Model Types: VAEs (TabularVAE, TimeSeriesVAE) and CTGAN
  • 📊 Data Type Support: Tabular data, time series with group_index, and custom datasets
  • 🔧 Function-Based Generation: Mathematical functions for controlled data generation
  • 📈 Quality Evaluation: Built-in metrics for data quality assessment
  • 🚀 High Performance: Optimized for both CPU and GPU processing
  • 🔒 Privacy Focused: Designed with privacy preservation in mind

🛠️ Installation

Quick Install

pip install sdg-core-lib

Development Install

git clone https://github.com/emiliocimino/generator_core_lib.git
cd generator_core_lib
pip install -e ".[dev]"

🚀 Quick Start

from sdg_core_lib import Job

# Text-based JSON configuration (no file needed)
config = {
    "n_rows": 1000,
    "model": {
        "algorithm_name": "sdg_core_lib.data_generator.models.VAEs.implementation.TabularVAE.TabularVAE",
        "model_name": "customer_synthetic_model"
    },
    "dataset": {
        "dataset_type": "table",
        "data": [
            {
                "column_data": [13.71, 13.4, 13.27, 13.17, 14.13, 13.88, 13.24, 13.73],
                "column_name": "alcohol",
                "column_type": "continuous",
                "column_datatype": "float64"
            },
            {
                "column_data": [5.65, 3.91, 4.28, 2.59, 4.1, 3.9, 3.8, 4.2],
                "column_name": "malic_acid",
                "column_type": "continuous",
                "column_datatype": "float64"
            },
            {
                "column_data": [1.28, 1.05, 1.02, 1.03, 1.71, 1.23, 1.07, 1.5],
                "column_name": "ash",
                "column_type": "continuous",
                "column_datatype": "float64"
            }
        ]
    },
    "save_filepath": "./models"
}

# Create and run a synthetic data generation job
job = Job(
    n_rows=config["n_rows"],
    model_info=config["model"],
    dataset=config["dataset"],
    save_filepath=config.get("save_filepath", "./models")
)

# Generate synthetic data
results, metrics, model, schema = job.train()
print(f"Generated {len(results)} synthetic rows")
print(f"Quality metrics: {metrics}")

📖 See Quick Start Guide for detailed examples

🔧 Function-Based Generation

# Generate data using mathematical functions
from sdg_core_lib import Job

functions = [
    {
        "feature": "linear_data",
        "function_name": "LinearFunction",
        "parameters": {
            "m": 2.0,
            "q": 1.0,
            "min_value": 0.0,
            "max_value": 100.0
        }
    }
]

job = Job(n_rows=100, functions=functions)
synthetic_data = job.generate_from_functions()

📚 Documentation

📖 User Documentation

Complete guide for users including:

  • Core concepts and architecture
  • Data types (tabular, time series, custom)
  • Model configurations (VAEs, CTGAN)
  • API reference and examples
  • Best practices and troubleshooting

🔧 Developer Documentation

Technical documentation for developers:

  • Architecture overview and design patterns
  • Extension points and customization
  • Development setup and testing
  • Code organization and standards

Quick Start Guide

Get started immediately with:

  • Installation instructions
  • Basic examples and tutorials
  • Common use cases
  • Troubleshooting tips

📋 Step-by-Step Tutorial

Hands-on tutorial covering:

  • Complete project workflow
  • Real-world examples
  • Advanced techniques
  • Performance optimization

🏗️ Architecture

GENESIS Core Lib follows a modular architecture:

  • Data Generator: ML models (TabularVAE, TimeSeriesVAE, CTGAN)
  • Dataset: Data abstraction (Table, TimeSeries) with proper column structure
  • Preprocess: Data transformation and normalization strategies
  • Postprocess: Function application and data modification
  • Evaluate: Quality assessment and statistical metrics

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

# Clone repository
git clone https://github.com/emiliocimino/generator_core_lib.git
cd generator_core_lib

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=sdg_core_lib

# Run specific test file
pytest tests/test_job.py

📄 License

This project is licensed under the GNU Affero General Public License v3.0 - see the LICENSE file for details.

🙏 Acknowledgments

  • Built with TensorFlow and Keras for deep learning models
  • Statistical evaluation using scipy and numpy
  • Inspired by state-of-the-art synthetic data generation research

📞 Support


GENESIS Core Lib - Generating Tomorrow's Data, Today 🚀

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdg_core_lib-0.1.9.dev10.tar.gz (35.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdg_core_lib-0.1.9.dev10-py3-none-any.whl (65.9 kB view details)

Uploaded Python 3

File details

Details for the file sdg_core_lib-0.1.9.dev10.tar.gz.

File metadata

  • Download URL: sdg_core_lib-0.1.9.dev10.tar.gz
  • Upload date:
  • Size: 35.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sdg_core_lib-0.1.9.dev10.tar.gz
Algorithm Hash digest
SHA256 1fe163cb4199c11c25a5470a99fc4ade28df655649da6b792c4306f43766bca6
MD5 3455b00008583dda54b47aeaa03c5d98
BLAKE2b-256 65a8eedae441dfd54c90812f91c9f11110c3dd4a9ebd13df3e042a53d1cf88cc

See more details on using hashes here.

File details

Details for the file sdg_core_lib-0.1.9.dev10-py3-none-any.whl.

File metadata

  • Download URL: sdg_core_lib-0.1.9.dev10-py3-none-any.whl
  • Upload date:
  • Size: 65.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sdg_core_lib-0.1.9.dev10-py3-none-any.whl
Algorithm Hash digest
SHA256 2640099b1ef76014ea63660f01e15b9e2d6b590e3c1869edb2c8dc9f90698d1a
MD5 323af65d21432633e476f4c23878c2d0
BLAKE2b-256 15f9d3257996697b61496c8a0303a246a8ff3f4e4921cfddf5fe75b67bd21b84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page