Skip to main content

LLM wrapper, for telemetry and internal model routing

Project description

Maniac

LLM-agnostic AI program orchestration with continuous prompt optimization and LoRA fine-tuning across all models.

Overview

Maniac provides a unified interface for deploying AI programs across any LLM provider or model. Each inference line spawns an AI Program Container that continuously optimizes both prompts and LoRA fine-tuning parameters across all models, ensuring optimal performance regardless of which model the Control Plane allocates.

Quick Start

Installation

pip install maniac

Basic Usage

from maniac import Maniac

# Initialize with your preferred provider
client = Maniac(provider="openai", api_key="your-key")
# or
client = Maniac(provider="vertex", project_id="your-project", region="us-east5")

# Customer support ticket analysis
response = client.responses.create(
    model="claude-opus-4",
    input="Customer reports: 'Payment failed but was charged anyway. Order #12345'", 
    instructions="You are a customer support analyst. Categorize the issue, determine urgency, and suggest resolution steps.",
    temperature=0.0,
    max_tokens=1024,
    task_label="support-ticket-analysis",
    judge_prompt="You are comparing two customer support analyses for the same ticket. Is response A's categorization and resolution plan at least as accurate and actionable as response B's? Focus on issue identification, urgency assessment, and solution quality."
)

# Document summarization for compliance
response = client.chat.completions.create(
    model="claude-opus-4",
    messages=[
        {"role": "system", "content": "You are a compliance officer specializing in financial regulations."},
        {"role": "user", "content": "Summarize the key compliance risks in this 50-page contract..."}
    ],
    temperature=0.0,
    task_label="compliance-review"
)

# Stream existing analysis results (bypass inference for batch processing)
client.chat.completions.stream_create(
    task_label="document-processing",
    system_prompt="You are a legal document analyst.",
    user_prompt="Extract key terms from this vendor agreement...",
    output="Key terms: Payment net 30, liability cap $1M, termination 90 days notice...",
    judge_prompt="You are comparing two contract analyses for the same document. Is response A's extraction of key terms at least as complete and accurate as response B's? Focus on identifying all critical terms, payment conditions, and legal obligations."
)

Core Concepts

AI Program Containers

Every inference line creates an AI Program Container that:

  • Continuously optimizes prompts and LoRA adaptations across all models simultaneously, ensuring each container can deploy optimally on any model (closed-source or open-source)
  • Maintains unified optimization state combining prompt engineering and fine-tuning metrics across the entire model ecosystem
  • Handles seamless model switching with pre-optimized prompts and LoRA weights ready for any target model
  • Automatically balances prompt vs LoRA optimization based on model capabilities (e.g., more LoRA for open-source, more prompt engineering for closed-source)

Control Plane

The Control Plane allocates containers to LLMs based on:

  • Quality preferences specified in judge prompts
  • Cost constraints configured in the dashboard
  • Latency requirements for real-time vs batch processing
  • Optimization readiness - how well each container's prompts and LoRA weights are optimized for each model
  • Model capabilities and task compatibility

Supported Providers

  • OpenAI: GPT-4o, GPT-4, GPT-3.5, O3-mini (prompt optimization + API-level adaptation)
  • Anthropic (Vertex AI): Claude Opus 4, Claude Sonnet 4 (prompt optimization + structured fine-tuning)
  • Open-source models: Llama, Mistral, CodeLlama (unified prompt + LoRA optimization)

Configuration

Provider Setup

OpenAI:

client = Maniac(
    provider="openai",
    api_key="sk-...",
    base_url="https://api.openai.com/v1"  # optional
)

Vertex AI:

client = Maniac(
    provider="vertex",
    project_id="your-gcp-project",
    region="us-east5"
)

Quality Control

Use judge prompts to specify quality criteria:

response = client.responses.create(
    model="claude-opus-4",
    input="Vendor contract shows $2M annual spend but accounting shows $2.1M. Investigate discrepancy.",
    instructions="You are a financial auditor. Identify potential causes for the discrepancy and recommend investigation steps.",
    temperature=0.0,
    max_tokens=2000,
    task_label="financial-audit",
    judge_prompt="You are comparing two financial audit analyses for the same discrepancy. Is response A's identification of root causes and investigation plan at least as thorough and actionable as response B's? Focus on completeness of potential causes and clarity of next steps."
)

Dashboard Configuration

Access the Maniac dashboard to configure:

  • Cost preferences: Set budget limits and cost-per-token thresholds
  • Latency targets: Specify response time requirements
  • Model preferences: Define fallback hierarchies and quality trade-offs
  • Container policies: Configure joint prompt + LoRA optimization schedules and resource limits

Advanced Features

Task Labeling

Group related inferences for coordinated prompt and LoRA optimization across all models:

import concurrent.futures

task_id = "customer-support-analysis"

def process_ticket(ticket_data):
    return client.responses.create(
        model="claude-opus-4",
        input=ticket_data["customer_message"],
        instructions="You are a customer support analyst. Categorize the issue, assess urgency (Low/Medium/High), and provide resolution steps.",
        temperature=0.0,
        max_tokens=1024,
        task_label=task_id,
        judge_prompt="You are comparing two customer support analyses for the same ticket. Is response A's categorization, urgency assessment, and resolution plan at least as accurate and helpful as response B's? Focus on accuracy of issue identification and practicality of solutions."
    )

# Process support tickets concurrently with shared task_label
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    results = list(executor.map(process_ticket, support_tickets))

Streaming

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Analyze this 10-K filing for competitive risks and revenue projections..."}],
    stream=True,
    task_label="financial-analysis"
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")

Parameter Reference

responses.create() parameters:

  • model: Model name (e.g., "claude-opus-4", "gpt-4o")
  • input: Business document, customer inquiry, or data to analyze
  • instructions: Domain-specific role and task definition (e.g., "financial auditor", "compliance officer")
  • temperature: Randomness (0.0 for consistent analysis, higher for creative tasks)
  • max_tokens: Response length limit (1024 for summaries, 4096 for detailed analysis)
  • task_label: Groups related business processes for unified optimization
  • judge_prompt: Quality standards for business-critical decisions

stream_create() parameters:

  • task_label: Task identifier for grouping
  • system_prompt: System instructions
  • user_prompt: User input
  • output: Pre-generated response content
  • judge_prompt: Evaluation criteria

Enterprise Benefits

Cost Management & Performance Reliability

  • Automatic cost optimization: Containers switch between models based on budget constraints while maintaining quality standards
  • Performance guarantees: Pre-optimized prompts and LoRA weights ensure consistent output quality regardless of model availability
  • Vendor risk mitigation: Single API maintains operations even when specific model providers experience outages or policy changes

Rapid Model Adoption

  • Zero-downtime model transitions: New models automatically receive optimized prompts and fine-tuning from existing container data
  • Quality-assured deployment: Judge prompts ensure new models meet established performance benchmarks before production use
  • Seamless scaling: Containers handle traffic spikes by intelligently distributing across available models based on latency and cost requirements

Operational Excellence

  • Centralized monitoring: Dashboard provides unified visibility across all models, tasks, and performance metrics
  • Compliance-ready logging: Complete audit trail of all inferences, optimizations, and model selections
  • Enterprise-grade reliability: Built-in fallback mechanisms and automatic retry logic ensure business continuity

Best Practices

  • Use task labels to group related inferences for coordinated prompt + LoRA optimization across the entire model ecosystem
  • Specify judge prompts to guide quality-aware model selection and optimization direction
  • Set appropriate temperature values (0.0 for deterministic tasks, higher for creative tasks)
  • Configure fallback models in the dashboard - containers automatically maintain optimized prompts and LoRA weights for each fallback
  • Monitor container metrics to track both prompt engineering and fine-tuning performance across models

Support

For issues and feature requests, visit the Maniac documentation portal or contact support@maniac.ai.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maniac-0.1.2.tar.gz (28.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

maniac-0.1.2-py3-none-any.whl (27.4 kB view details)

Uploaded Python 3

File details

Details for the file maniac-0.1.2.tar.gz.

File metadata

  • Download URL: maniac-0.1.2.tar.gz
  • Upload date:
  • Size: 28.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for maniac-0.1.2.tar.gz
Algorithm Hash digest
SHA256 98e314f88b30a849860de696b09cbcf33a7c9956c6578d6f88dc92f19f3e5886
MD5 4d2afdaaadc616cc93e08422f454b2c9
BLAKE2b-256 6c3ba9dc7bab8da2b01268d33020583d38faedff68fb5c7af38fb1e7b9fc96a6

See more details on using hashes here.

File details

Details for the file maniac-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: maniac-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for maniac-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 264ea83d33c3d8e07db14ed784d3c2c71e1608206657660c8a3ed81c3869ef99
MD5 453e163cb56c8c87b6413bd372a4bb4f
BLAKE2b-256 807166508fba577038ce63d395a38424f15d0e9405e64173c5109103b793c676

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page