Skip to main content

Synthetic Data Generation

Project description

SDG Hub

Composable blocks and flows for synthetic data generation

Docs PyPI Tests Python 3.10+ License Coverage Ask DeepWiki


SDG Hub Demo

SDG Hub is a Python framework for building synthetic data generation pipelines. Chain LLM, parsing, transform, filtering, and agent blocks into YAML-defined flows -- then generate training data at scale.

Get Started

pip install sdg-hub
from sdg_hub import FlowRegistry, Flow

# Discover and load a built-in flow
FlowRegistry.discover_flows()
flow = Flow.from_yaml(FlowRegistry.get_flow_path("MCP Server Distillation"))

# Configure and run
flow.set_model_config(model="openai/gpt-4o")
result = flow.generate(dataset)

See the Quick Start for a full walkthrough, or browse all built-in flows.

Documentation

Full documentation at ai-innovation.team/sdg_hub

  • Installation -- setup, optional dependencies, development install
  • Quick Start -- end-to-end walkthrough from loading a flow to generating data
  • Core Concepts -- blocks, flows, registries, and dataset handling
  • Block Reference -- LLM, parsing, transform, filtering, agent, and custom blocks
  • Flow Reference -- YAML schema, built-in flows, custom flows
  • API Reference -- auto-generated from source
  • Contributing -- development setup and contribution guidelines

Coding Agent Plugin

SDG Hub is available as a plugin for two coding agents, bringing synthetic data generation directly into your coding workflow.

Claude Code

Via org marketplace (recommended — includes all Red Hat AI plugins):

/plugin marketplace add Red-Hat-AI-Innovation-Team/plugins
/plugin install sdg-hub@Red-Hat-AI-Innovation-Team/plugins

Via this repo directly:

/plugin marketplace add Red-Hat-AI-Innovation-Team/sdg_hub
/plugin install sdg-hub@Red-Hat-AI-Innovation-Team/sdg_hub

From a local clone:

git clone https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub.git
/plugin marketplace add /path/to/sdg_hub
Codex CLI
codex plugin marketplace add Red-Hat-AI-Innovation-Team/plugins

Then install the plugin from the marketplace. See .codex-plugin/INSTALL.md for manual installation.

After Installing

Invoke the setup-guide skill to configure your LLM provider and model.

Skill Description
setup-guide Guided first-time configuration
data-generation Run synthetic data generation using a flow
flow-browser Browse and inspect available flows

License

Apache License 2.0 -- see LICENSE.


Built by the Red Hat AI Innovation Team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdg_hub-0.9.4.tar.gz (8.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdg_hub-0.9.4-py3-none-any.whl (258.9 kB view details)

Uploaded Python 3

File details

Details for the file sdg_hub-0.9.4.tar.gz.

File metadata

  • Download URL: sdg_hub-0.9.4.tar.gz
  • Upload date:
  • Size: 8.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for sdg_hub-0.9.4.tar.gz
Algorithm Hash digest
SHA256 fdf620cca429d7ed39adf35b15dbf3fbb81a284c399e3138922289fafc7881b4
MD5 e0d3f4467e6c2cc8aad6a2f4ced7442c
BLAKE2b-256 70187f84adaed65df85fe0b0202d4bc8710bb40e06c40ce89d9c3c011b7700c6

See more details on using hashes here.

Provenance

The following attestation bundles were made for sdg_hub-0.9.4.tar.gz:

Publisher: pypi.yml on Red-Hat-AI-Innovation-Team/sdg_hub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sdg_hub-0.9.4-py3-none-any.whl.

File metadata

  • Download URL: sdg_hub-0.9.4-py3-none-any.whl
  • Upload date:
  • Size: 258.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for sdg_hub-0.9.4-py3-none-any.whl
Algorithm Hash digest
SHA256 cba1d1b585f0fe2c980c329988c44b9a40ea8f42c1e5af1b0d79ebc549fccbaf
MD5 54254ec9f32e8618c37579f196352b53
BLAKE2b-256 0f6c152ace4b41b9355bf592109eb7a2dba1f2f49a4c14845421c832682e7a12

See more details on using hashes here.

Provenance

The following attestation bundles were made for sdg_hub-0.9.4-py3-none-any.whl:

Publisher: pypi.yml on Red-Hat-AI-Innovation-Team/sdg_hub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page