Enterprise-Grade Synthetic Data Generation

These details have not been verified by PyPI

Project links

Project description

OmniGen 🚀

Generate synthetic data at scale using an enterprise-ready framework with full customizable configuration, security, and ease of use

Built by Ultrasafe AI for production environments.

What is OmniGen?

OmniGen is an enterprise-grade framework for generating synthetic datasets at scale—from scratch or from base data. Generate trillions of tokens and billions of samples across multiple modalities:

🎯 Data Types Supported

💬 Conversational Data - Single-turn to multi-turn dialogues
🤖 Agentic Datasets - Tool use, function calling, multi-step reasoning
🎨 Multimodal Datasets - Text, images, audio, video combinations
🖼️ Images - Synthetic image generation and editing
🎵 Audio - Speech, music, sound effects
🎬 Video - Synthetic video sequences

🎓 Use Cases

Fine-Tuning - Instruction following, task-specific models
Supervised Fine-Tuning (SFT) - High-quality labeled datasets
Offline Reinforcement Learning - Preference datasets with rewards
Online Reinforcement Learning - Ground truth with reward checking scripts
Pre-Training - Large-scale corpus generation
Machine Learning - Training data for any ML task

🏗️ Why OmniGen?

✅ Enterprise-Ready - Built for production at scale
✅ Fully Customizable - Configure every aspect of generation
✅ Secure - Complete isolation, no data mixing
✅ Easy - Simple API, clear examples
✅ Modular - Independent pipelines for different data types

🚀 Currently Available Pipeline

conversation_extension - Extend Single-Turn to Multi-Turn Conversations

Turn your base questions into rich multi-turn dialogues. This is just the first pipeline—more coming soon!

Why OmniGen?

✅ Simple - One command to generate thousands of conversations
✅ Scalable - Parallel processing for fast generation
✅ Flexible - Mix different AI providers (OpenAI, Anthropic, Ultrasafe AI)
✅ Production Ready - Built for SaaS platforms with multi-tenant support

Quick Start

1. Install

pip install omnigen

2. Prepare Base Data

Create a file base_data.jsonl with your starting questions:

{"conversations": [{"role": "user", "content": "How do I learn Python?"}]}
{"conversations": [{"role": "user", "content": "What is machine learning?"}]}
{"conversations": [{"role": "user", "content": "Explain neural networks"}]}

3. Generate Conversations

from omnigen.pipelines.conversation_extension import (
    ConversationExtensionConfigBuilder,
    ConversationExtensionPipeline
)

# Configure the pipeline
config = (ConversationExtensionConfigBuilder()
    # User followup generator
    .add_provider(
        role='user_followup',
        name='ultrasafe',
        api_key='your-api-key',
        model='usf-mini'
    )
    # Assistant response generator
    .add_provider(
        role='assistant_response',
        name='ultrasafe',
        api_key='your-api-key',
        model='usf-mini'
    )
    # Generation settings
    .set_generation(
        num_conversations=100,
        turn_range=(3, 8)  # 3-8 turns per conversation
    )
    # Input data
    .set_data_source(
        source_type='file',
        file_path='base_data.jsonl'
    )
    # Output
    .set_storage(
        type='jsonl',
        output_file='output.jsonl'
    )
    .build()
)

# Run the pipeline
pipeline = ConversationExtensionPipeline(config)
pipeline.run()

4. Get Results

Your generated conversations will be in output.jsonl:

{
  "id": 0,
  "conversations": [
    {"role": "user", "content": "How do I learn Python?"},
    {"role": "assistant", "content": "Great choice! Start with the basics..."},
    {"role": "user", "content": "What resources do you recommend?"},
    {"role": "assistant", "content": "I recommend these resources..."},
    {"role": "user", "content": "How long will it take?"},
    {"role": "assistant", "content": "With consistent practice..."}
  ],
  "num_turns": 3,
  "success": true
}

Supported AI Providers

Provider	Model Examples
Ultrasafe AI	`usf-mini`, `usf-max`
OpenAI	`gpt-4-turbo`, `gpt-3.5-turbo`
Anthropic	`claude-3-5-sonnet`, `claude-3-opus`
OpenRouter	Various models

Mix Different Providers

config = (ConversationExtensionConfigBuilder()
    .add_provider('user_followup', 'openai', api_key, 'gpt-4-turbo')
    .add_provider('assistant_response', 'anthropic', api_key, 'claude-3-5-sonnet')
    # ... rest of config
    .build()
)

Advanced Features

Multi-Tenant SaaS Support

Perfect for platforms serving multiple users concurrently:

# Each user gets isolated workspace
workspace_id = f"user_{user_id}_session_{session_id}"

config = (ConversationExtensionConfigBuilder(workspace_id=workspace_id)
    .add_provider('user_followup', 'ultrasafe', shared_api_key, 'usf-mini')
    .add_provider('assistant_response', 'ultrasafe', shared_api_key, 'usf-mini')
    .set_storage('jsonl', output_file='output.jsonl')  # Auto-isolated
    .build()
)

# Storage automatically goes to: workspaces/{workspace_id}/output.jsonl

Parallel Dataset Generation

from concurrent.futures import ProcessPoolExecutor

def process_dataset(input_file, output_file):
    config = (ConversationExtensionConfigBuilder()
        .add_provider('user_followup', 'ultrasafe', api_key, 'usf-mini')
        .add_provider('assistant_response', 'ultrasafe', api_key, 'usf-mini')
        .set_data_source('file', file_path=input_file)
        .set_storage('jsonl', output_file=output_file)
        .build()
    )
    ConversationExtensionPipeline(config).run()

# Process 3 datasets in parallel
with ProcessPoolExecutor(max_workers=3) as executor:
    executor.submit(process_dataset, 'data1.jsonl', 'out1.jsonl')
    executor.submit(process_dataset, 'data2.jsonl', 'out2.jsonl')
    executor.submit(process_dataset, 'data3.jsonl', 'out3.jsonl')

📖 Complete Configuration Reference

All Configuration Options Explained

Below is a comprehensive YAML configuration showing ALL available options with detailed explanations:

# ==============================================================================
# WORKSPACE ISOLATION (Optional)
# ==============================================================================
# Unique ID for multi-tenant environments - auto-isolates all output files
workspace_id: "user_123_session_abc"

# ==============================================================================
# PROVIDERS - AI Model Configuration
# ==============================================================================
# Configure different AI providers for each role
# Each role can use a different provider/model combination

providers:
  # Provider for generating user follow-up questions
  user_followup:
    name: ultrasafe              # Options: ultrasafe, openai, anthropic, openrouter
    api_key: ${API_KEY}          # Use env var: ${VAR_NAME} or direct key
    model: usf-mini              # Model identifier
    temperature: 0.7             # Randomness (0.0-1.0): higher = more creative
    max_tokens: 2048             # Max tokens in response
    timeout: 300                 # Request timeout in seconds
    max_retries: 5               # Number of retry attempts on failure
    retry_delay: 2               # Delay between retries in seconds
  
  # Provider for generating assistant responses
  assistant_response:
    name: ultrasafe              # Can use different provider than user_followup
    api_key: ${API_KEY}
    model: usf-mini
    temperature: 0.7
    max_tokens: 8192             # Larger for detailed responses
    timeout: 300
    max_retries: 5
    retry_delay: 2

# PROVIDER OPTIONS:
# ----------------
# ultrasafe:
#   models: usf-mini, usf-max
#
# openai:
#   models: gpt-4-turbo, gpt-4, gpt-3.5-turbo, gpt-4o, gpt-4o-mini
#
# anthropic:
#   models: claude-3-5-sonnet-20241022, claude-3-opus-20240229,
#           claude-3-sonnet-20240229, claude-3-haiku-20240307
#
# openrouter:
#   models: Any OpenRouter supported model
#   base_url: https://openrouter.ai/api/v1 (optional)

# ==============================================================================
# GENERATION SETTINGS
# ==============================================================================
generation:
  num_conversations: 100           # Total conversations to generate
  
  turn_range:                      # Number of turns per conversation
    min: 3                         # Minimum turns
    max: 8                         # Maximum turns
  
  parallel_workers: 10             # Concurrent workers (balance speed vs rate limits)
  
  # Extension behavior for multi-turn input
  extension_mode: "smart"          # Options: "smart" | "legacy"
  # - smart: Intelligently handle multi-turn conversations
  # - legacy: Always extract first user message only
  
  skip_invalid: true               # Skip invalid patterns (recommended: true)
  
  # Turn calculation method
  turn_calculation: "additional"   # Options: "additional" | "total"
  # - additional: Add NEW turns on top of existing (default)
  # - total: Keep total turns within range (never removes existing)

# ==============================================================================
# DATA SOURCE CONFIGURATION
# ==============================================================================
base_data:
  enabled: true                    # Enable base data loading
  
  # OPTION 1: Local File
  source_type: file                # Use local JSONL/JSON file
  file_path: data/input.jsonl      # Path to file
  format: conversations            # JSON key containing conversation array
  shuffle: false                   # Shuffle data before processing
  
  # OPTION 2: HuggingFace Dataset
  # source_type: huggingface       # Use HuggingFace dataset
  # hf_dataset: username/dataset   # HuggingFace dataset path
  # hf_split: train                # Dataset split: train, test, validation
  # hf_token: ${HF_TOKEN}          # HuggingFace API token (if private)
  # hf_streaming: false            # Stream dataset (for large datasets)
  # format: conversations          # Field name in dataset
  # shuffle: true                  # Shuffle after loading

# ==============================================================================
# STORAGE CONFIGURATION
# ==============================================================================
storage:
  type: jsonl                      # Options: jsonl | mongodb
  
  # JSONL Storage (Default)
  output_file: output.jsonl        # Successful conversations
  partial_file: partial.jsonl      # Partial/incomplete conversations
  failed_file: failed.jsonl        # Failed conversations
  
  # MongoDB Storage (Alternative)
  # type: mongodb
  # mongodb:
  #   connection_string: mongodb://localhost:27017
  #   database: omnigen
  #   collection: conversations
  #   output_collection: output          # Successful
  #   partial_collection: partial        # Partial
  #   failed_collection: failed          # Failed

# ==============================================================================
# DATETIME CONFIGURATION (Optional)
# ==============================================================================
datetime_config:
  enabled: true                    # Enable datetime generation
  mode: random_from_range          # Options: random_from_range | current | fixed
  timezone: UTC                    # Timezone (UTC, America/New_York, Asia/Dubai, etc.)
  format: "%Y-%m-%d %H:%M:%S"      # Python strftime format
  
  # For random_from_range mode
  range:
    start: "2024-01-01 00:00:00"   # Start datetime
    end: "2024-12-31 23:59:59"     # End datetime
  
  # For fixed mode
  # fixed_datetime: "2024-06-15 12:00:00"

# ==============================================================================
# SYSTEM MESSAGES (Optional)
# ==============================================================================
system_messages:
  # Prepend system message to every conversation
  prepend_always:
    enabled: true
    content: "You are a helpful AI assistant. Current time: {current_datetime} ({timezone})."
  
  # Append system message to every conversation
  append_always:
    enabled: false
    content: "Remember to be concise and helpful."
  
  # Add system message only if none exists
  add_if_missing:
    enabled: false
    content: "You are an AI assistant."

# Available variables in system messages:
# - {current_datetime}: Generated datetime
# - {timezone}: Configured timezone
# - {workspace_id}: Current workspace ID

# ==============================================================================
# CUSTOM PROMPTS (Optional)
# ==============================================================================
prompts:
  # Custom prompt for user follow-up generation
  followup_question: |
    ## Your Task
    Generate an intelligent follow-up user question based on conversation history.
    
    ### CONVERSATION HISTORY:
    {history}
    
    ### INSTRUCTIONS:
    - Generate a meaningful follow-up question
    - Be conversational and natural
    - Vary your phrasing and tone
    - Build on the assistant's last response
    
    Return your follow-up question wrapped in XML tags:
    <user>Your follow-up question here</user>
  
  # Custom prompt for assistant response generation
  # assistant_response: |
  #   Your custom assistant response prompt here...

# ==============================================================================
# DEBUG OPTIONS (Optional)
# ==============================================================================
debug:
  log_api_timing: true             # Log API call timings
  log_parallel_status: true        # Log parallel worker status
  verbose: false                   # Verbose logging

Quick Configuration Examples

Example 1: Local File Input

providers:
  user_followup:
    name: ultrasafe
    api_key: ${ULTRASAFE_API_KEY}
    model: usf-mini
  assistant_response:
    name: ultrasafe
    api_key: ${ULTRASAFE_API_KEY}
    model: usf-mini

generation:
  num_conversations: 100
  turn_range: {min: 3, max: 8}

base_data:
  source_type: file
  file_path: input.jsonl

storage:
  type: jsonl
  output_file: output.jsonl

Example 2: HuggingFace Dataset Input

providers:
  user_followup:
    name: openai
    api_key: ${OPENAI_API_KEY}
    model: gpt-4-turbo
  assistant_response:
    name: anthropic
    api_key: ${ANTHROPIC_API_KEY}
    model: claude-3-5-sonnet-20241022

generation:
  num_conversations: 1000
  turn_range: {min: 5, max: 10}
  parallel_workers: 20

base_data:
  source_type: huggingface
  hf_dataset: username/my-dataset
  hf_split: train
  hf_token: ${HF_TOKEN}
  format: conversations
  shuffle: true

storage:
  type: jsonl
  output_file: output.jsonl

Example 3: Mixed Providers with MongoDB

providers:
  user_followup:
    name: openai
    api_key: ${OPENAI_API_KEY}
    model: gpt-3.5-turbo
    temperature: 0.8
  assistant_response:
    name: anthropic
    api_key: ${ANTHROPIC_API_KEY}
    model: claude-3-5-sonnet-20241022
    temperature: 0.7

generation:
  num_conversations: 500
  turn_range: {min: 3, max: 8}

base_data:
  source_type: file
  file_path: questions.jsonl

storage:
  type: mongodb
  mongodb:
    connection_string: mongodb://localhost:27017
    database: omnigen
    collection: conversations

Example 4: Programmatic Configuration (Python)

from omnigen.pipelines.conversation_extension import (
    ConversationExtensionConfigBuilder,
    ConversationExtensionPipeline
)

# Build configuration programmatically
config = (ConversationExtensionConfigBuilder()
    # Workspace isolation
    .set_workspace_id("user_123_session_abc")
    
    # Providers
    .add_provider(
        role='user_followup',
        name='ultrasafe',
        api_key='your-api-key',
        model='usf-mini',
        temperature=0.7,
        max_tokens=2048
    )
    .add_provider(
        role='assistant_response',
        name='ultrasafe',
        api_key='your-api-key',
        model='usf-mini',
        temperature=0.7,
        max_tokens=8192
    )
    
    # Generation settings
    .set_generation(
        num_conversations=100,
        turn_range=(3, 8),
        parallel_workers=10,
        extension_mode='smart',
        skip_invalid=True,
        turn_calculation='additional'
    )
    
    # Data source - Local file
    .set_data_source(
        source_type='file',
        file_path='input.jsonl',
        format='conversations',
        shuffle=False
    )
    
    # Data source - HuggingFace (alternative)
    # .set_data_source(
    #     source_type='huggingface',
    #     hf_dataset='username/dataset',
    #     hf_split='train',
    #     hf_token='your-token',
    #     format='conversations',
    #     shuffle=True
    # )
    
    # Storage
    .set_storage(
        type='jsonl',
        output_file='output.jsonl',
        partial_file='partial.jsonl',
        failed_file='failed.jsonl'
    )
    
    # Custom prompts (optional)
    .set_prompts(
        followup_question="Your custom prompt here with {history}"
    )
    
    .build()
)

# Run pipeline
pipeline = ConversationExtensionPipeline(config)
pipeline.run()

📖 Conversation Extension Pipeline - Complete Guide

Overview

The Conversation Extension Pipeline intelligently transforms base conversations into rich multi-turn dialogues. It can handle both single-turn questions and extend existing multi-turn conversations.

Key Features

✅ Smart Extension - Continues from existing conversations based on last role
✅ Flexible Input - Handles single-turn or multi-turn base data
✅ Provider Mix - Use different AI providers for user and assistant
✅ Multi-Tenant - Complete workspace isolation
✅ Configurable - Full control over generation behavior

Configuration Options

Extension Modes

Smart Mode (Default)

generation:
  extension_mode: "smart"

Single-turn input → Generate new conversation from scratch
Multi-turn (user last) → Add 1 assistant response, then continue
Multi-turn (assistant last) → Add user + assistant, then continue
Invalid patterns → Skip row entirely

Legacy Mode

generation:
  extension_mode: "legacy"

Always extracts first user message only (original behavior)

Turn Calculation

Additional Mode (Default) - Add NEW turns on top of existing

generation:
  turn_calculation: "additional"  # Add 3-8 NEW turns

Total Mode - Keep total within range (never removes existing)

generation:
  turn_calculation: "total"  # Total should be 3-8 turns

Complete Configuration

# Workspace isolation (optional)
workspace_id: "user_123"

# AI Providers
providers:
  user_followup:
    name: "ultrasafe"
    api_key: "${ULTRASAFE_API_KEY}"
    model: "usf-mini"
    temperature: 0.7
    max_tokens: 2048
  
  assistant_response:
    name: "ultrasafe"
    api_key: "${ULTRASAFE_API_KEY}"
    model: "usf-mini"
    temperature: 0.7
    max_tokens: 8192

# Generation Settings
generation:
  num_conversations: 100
  turn_range:
    min: 3
    max: 8
  parallel_workers: 10
  
  # Extension behavior
  extension_mode: "smart"        # "smart" | "legacy"
  skip_invalid: true             # Skip invalid patterns
  turn_calculation: "additional" # "additional" | "total"

# Input Data
base_data:
  enabled: true
  source_type: "file"
  file_path: "base_data.jsonl"
  format: "conversations"
  shuffle: false

# Output Storage
storage:
  type: "jsonl"
  output_file: "output.jsonl"
  partial_file: "partial.jsonl"
  failed_file: "failed.jsonl"

# System Messages (optional)
system_messages:
  add_if_missing:
    enabled: true
    content: "You are a helpful assistant. Current datetime: {current_datetime}"

# DateTime (optional)
datetime_config:
  enabled: true
  timezone: "UTC"
  format: "%Y-%m-%d %H:%M:%S"
  range:
    start_date: "2024-01-01"
    end_date: "2024-12-31"

Input Data Formats

Valid Patterns

Single-turn ✅

{"conversations": [{"role": "user", "content": "How do I learn Python?"}]}

Multi-turn (user last) ✅

{
  "conversations": [
    {"role": "user", "content": "How do I learn Python?"},
    {"role": "assistant", "content": "Start with basics..."},
    {"role": "user", "content": "What resources?"}
  ]
}

Multi-turn (assistant last) ✅

{
  "conversations": [
    {"role": "user", "content": "How do I learn Python?"},
    {"role": "assistant", "content": "Start with basics..."}
  ]
}

Invalid Patterns (Skipped)

❌ First message not user

{"conversations": [{"role": "assistant", "content": "Hello"}]}

❌ Empty conversations

{"conversations": []}

Programmatic Usage

from omnigen.pipelines.conversation_extension import (
    ConversationExtensionConfigBuilder,
    ConversationExtensionPipeline
)

config = (ConversationExtensionConfigBuilder()
    .add_provider('user_followup', 'ultrasafe', 'api-key', 'usf-mini')
    .add_provider('assistant_response', 'ultrasafe', 'api-key', 'usf-mini')
    .set_generation(
        num_conversations=100,
        turn_range=(3, 8),
        parallel_workers=10,
        extension_mode='smart',      # Handle multi-turn intelligently
        skip_invalid=True,            # Skip invalid patterns
        turn_calculation='additional' # Add new turns (default)
    )
    .set_data_source('file', file_path='base_data.jsonl')
    .set_storage('jsonl', output_file='output.jsonl')
    .build()
)

pipeline = ConversationExtensionPipeline(config)
pipeline.run()

Turn Calculation Examples

Additional Mode (Default)

Existing: 2 turns
Config: turn_range = (3, 8)
Result: Add 3-8 NEW turns → Total: 5-10 turns

Total Mode

Existing: 2 turns
Config: turn_range = (3, 8)
Result: Add 1-6 turns → Total: 3-8 turns

Existing: 10 turns (already > max)
Config: turn_range = (3, 8)
Result: Add 0 turns → Keep 10 turns (never remove)

Best Practices

Provider Selection

Use better models for assistant (claude-3-5-sonnet, gpt-4-turbo)
Use cheaper models for user followups (usf-mini, gpt-3.5-turbo)

Turn Range

Quick exchanges: (2, 4)
In-depth: (5, 10)
Balanced: (3, 8) ✅

Parallel Workers

Conservative: 5 (avoid rate limits)
Balanced: 10 ✅
Aggressive: 20 (watch for rate limits)

Troubleshooting

Issue: Empty output

Check input data format (first message must be user)
Set skip_invalid: false to see errors

Issue: Rate limits

Reduce parallel_workers
Check provider API limits

Issue: Low quality

Increase temperature (0.8-0.9)
Use better models
Add custom prompts and system messages

License

About Ultrasafe AI

Enterprise-grade AI tools with focus on safety and performance.

🌐 Website: us.inc
📧 Email: support@us.inc

Made with ❤️ by Ultrasafe AI

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.18

Nov 16, 2025

0.1.17

Nov 16, 2025

0.1.16

Nov 10, 2025

0.1.15

Nov 10, 2025

0.1.14

Nov 10, 2025

0.1.13

Nov 10, 2025

0.1.12

Nov 10, 2025

0.1.11

Nov 9, 2025

0.1.10

Nov 9, 2025

0.1.9

Nov 9, 2025

0.1.8

Nov 9, 2025

0.1.7

Nov 9, 2025

0.1.6

Nov 9, 2025

0.1.5

Oct 21, 2025

0.1.4

Oct 21, 2025

0.1.3

Oct 21, 2025

0.1.2

Oct 21, 2025

0.1.1

Oct 21, 2025

0.1.0

Oct 21, 2025

0.0.1.post11

Oct 19, 2025

0.0.1.post10

Oct 18, 2025

0.0.1.post9

Oct 6, 2025

0.0.1.post8

Oct 5, 2025

0.0.1.post7

Oct 5, 2025

0.0.1.post6

Oct 5, 2025

0.0.1.post5

Oct 4, 2025

0.0.1.post4

Oct 4, 2025

0.0.1.post3

Oct 4, 2025

This version

0.0.1.post2

Oct 4, 2025

0.0.1.post1

Oct 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnigen_usf-0.0.1.post2.tar.gz (47.4 kB view details)

Uploaded Oct 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

omnigen_usf-0.0.1.post2-py3-none-any.whl (43.6 kB view details)

Uploaded Oct 4, 2025 Python 3

File details

Details for the file omnigen_usf-0.0.1.post2.tar.gz.

File metadata

Download URL: omnigen_usf-0.0.1.post2.tar.gz
Upload date: Oct 4, 2025
Size: 47.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for omnigen_usf-0.0.1.post2.tar.gz
Algorithm	Hash digest
SHA256	`01edf045e2d06c062f15cb266b14a95903040681a44f9e61df625d178e6c3dc0`
MD5	`c5756cea68f95e2b178971753dd0b908`
BLAKE2b-256	`ec6ddec667743760a5787d0b425ec532cf7f28a6e74d63af2ea40d05d0c4815e`

See more details on using hashes here.

File details

Details for the file omnigen_usf-0.0.1.post2-py3-none-any.whl.

File metadata

Download URL: omnigen_usf-0.0.1.post2-py3-none-any.whl
Upload date: Oct 4, 2025
Size: 43.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for omnigen_usf-0.0.1.post2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e26f706cef0aac5bb38bb366b9f497aa3d5db2b8bd141c16a638e4f99f8edb95`
MD5	`20cca31ba3bda5446c6354ff5e671ce3`
BLAKE2b-256	`c78d29d6d15aed2d1fe0887fa3ef530e80578a9f3c4691cc4785714de4160e3b`

See more details on using hashes here.

omnigen-usf 0.0.1.post2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OmniGen 🚀

What is OmniGen?

🎯 Data Types Supported

🎓 Use Cases

🏗️ Why OmniGen?

🚀 Currently Available Pipeline

conversation_extension - Extend Single-Turn to Multi-Turn Conversations

Why OmniGen?

Quick Start

1. Install

2. Prepare Base Data

3. Generate Conversations

4. Get Results

Supported AI Providers

Mix Different Providers

Advanced Features

Multi-Tenant SaaS Support

Parallel Dataset Generation

📖 Complete Configuration Reference

All Configuration Options Explained

Quick Configuration Examples

Example 1: Local File Input

Example 2: HuggingFace Dataset Input

Example 3: Mixed Providers with MongoDB

Example 4: Programmatic Configuration (Python)

📖 Conversation Extension Pipeline - Complete Guide

Overview

Key Features

Configuration Options

Extension Modes

Turn Calculation

Complete Configuration

Input Data Formats

Valid Patterns

Invalid Patterns (Skipped)

Programmatic Usage

Turn Calculation Examples

Best Practices

Troubleshooting

License

About Ultrasafe AI

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes