Skip to main content

Unified platform for self-hosted LLM inference + enterprise safety governance

Project description

TurboPrivate AI — Private & Safe Enterprise AI Platform

PyPI version Python versions CI status Downloads License Stars

Run powerful LLMs on your own hardware — 40–60% cheaper than public clouds, with built-in enterprise safety & governance.


Why TurboPrivate AI?

  • Full data sovereignty — nothing leaves your infrastructure
  • Dramatic cost reduction — INT4/AWQ quantization + smart routing
  • Enterprise Safety — powered by Mythos Safe (defensive evaluation, jailbreak protection, audit)
  • OpenAI compatible — drop-in replacement for your existing applications
  • One-command deploy — from bare metal to production in minutes

Key Features

  • TurboQuant Engine — State-of-the-art INT4/AWQ quantization with minimal quality loss
  • Mythos Safe — Multi-layer defensive safety (pre & post-flight gates)
  • Private RAG — Secure document ingestion and retrieval
  • Full-stack observability — Prometheus, Grafana, OpenTelemetry
  • Enterprise ready — RBAC, audit trail, multi-tenancy, compliance support
  • Hardware flexibility — RTX 4090, A100/H100, or even CPU-only

Performance (RTX 4090)

Model Quant Tokens/sec VRAM Usage Cost vs Groq/AWS
Llama 3.1 8B INT4 110+ ~5.8 GB ~8x cheaper
Qwen2.5 32B INT4 45+ ~22 GB ~6x cheaper
Llama 3.1 70B INT4 18+ ~48 GB ~5x cheaper

Quick Start

# 1. Deploy full stack (K8s)
turbo deploy --provider bare-metal --gpu auto

# 2. Serve model
turbo model serve meta-llama/Llama-3.1-8B --quant int4

# 3. Chat
turbo chat

Or use Docker Compose for quick testing:

docker compose up -d                    # dev
# docker compose -f docker-compose.prod.yml up -d  # production (GPU)

Pricing

Tier Price Best For Includes
PoC / Pilot €15,000 – €35,000 4–8 weeks trial Deployment, 2 models, training, support
Enterprise License €65,000 / year Single cluster, up to 10 users Full features, unlimited models, SLA 99.5%
Enterprise Plus €120,000 – €180,000 / year Multiple clusters, 50+ users Priority support, custom verifiers, SOC2
Managed Service €8,000 – €25,000 / month No ops team Fully managed by us

Volume discounts available for 3+ clusters.
All prices exclude hardware.

Interested in a private demo?
📅 Book a 30-min PoC Call | ✉️ Contact Sales

Architecture

CLI / SDK / Dashboard
        ↓
   API Gateway (FastAPI · Auth · Rate Limiting)
        ↓
┌─────────────────┐  ┌───────────────────┐
│  Mythos Safe    │  │  TurboQuant INT4  │
│  Verifiers ·    │  │  vLLM/llama.cpp   │
│  Audit Trail    │  │  Inference Engine │
└─────────────────┘  └───────────────────┘
        ↓
   Memory & RAG (TurboMemory · pdf2struct)
        ↓
┌──────────┐ ┌──────────┐ ┌──────────┐
│  K3s     │ │Monitoring│ │ Storage  │
│  Cluster │ │Prom/Graf │ │ PG/Redis │
└──────────┘ └──────────┘ └──────────┘

Demo

TurboPrivate AI deployment demo

Documentation

Integrations

Changelog

0.1.6 (2026-05-16)

  • SAP HANA integration guide: cost calculator, security checklist, BYOM in AI Core, Med/Fintech compliance
  • Enterprise hardening best practices for self-hosted LLM + vector database deployments

0.1.5 (2026-05-16)

  • SAP HANA vector store integration example (LangChain + HanaDB + TurboPrivate AI RAG)
  • FastAPI RAG endpoint with similarity search + LLM generation
  • Document ingestion script with PDF/text support + HNSW index creation

0.1.4 (2026-05-13)

  • Production-hardened Helm charts (configmap, ingress, services templates)
  • Enhanced rate limiter with token bucket algorithm + per-route limits
  • Improved safety gate middleware with pre/post-flight hook chain
  • Realtime metrics visualization in dashboard endpoint
  • TurboQuant v3 quantization pipeline: AWQ + INT4 mixed-precision
  • Backup/restore CLI with age-encrypted snapshots
  • K3s provisioner with multi-node discovery + node labels
  • vLLM backend: speculative decoding toggle + prefix caching
  • llama.cpp backend: flash attention + GPU offloading
  • Worker refinements: quantize retry, eval timeout, ingestion dedup
  • CLI enhancements: model status, deploy progress, backup summary
  • PII detector regex expansion (passport, SSN, phone variants)
  • Vulnerability verifier: CVE-2025 scoring + dependency jail status
  • PDF/image ingestion with OCR fallback in RAG pipeline

0.1.3 (2026-05-13)

  • Extended demo GIF to 61s with 5-scene animation (intro, deploy, serve+chat, safety block, dashboard)
  • Switched README GIF to absolute GitHub raw URL for PyPI rendering

0.1.2 (2026-05-11)

  • Enterprise-ready README with pricing table and benchmarks
  • Added docs/ARCHITECTURE.md with system design diagrams
  • Added docs/DEPLOYMENT.md with production deployment guide
  • Added examples/ with HTTP, safety, RAG, and quantization samples
  • Added .env.example with all configuration options
  • Added benchmarks/ with RTX 4090 performance results
  • Switched license from MIT to Apache 2.0
  • Added turbo doctor CLI command for system health checks
  • Added GitHub Actions Docker build workflow
  • Updated pyproject.toml with full install extra

0.1.1 (2026-05-11)

  • Migrated to hatchling build system
  • Fixed missing InferenceEngine import in turbo.inference
  • Fixed TracerProvider bug in OpenTelemetry instrumentation
  • Added structured logging to all exception handlers
  • Consolidated Celery workers into shared worker.celery_app
  • Added CI workflow with ruff linting + pytest
  • Improved graceful shutdown (audit trail flush)
  • Updated dependencies (replaced unstructured with actual used libs)

License

Apache 2.0 — see LICENSE.


Built by Kubenew — ex-HPE engineer, 12+ years enterprise infrastructure

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turboprivate_ai-0.1.6.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

turboprivate_ai-0.1.6-py3-none-any.whl (55.6 kB view details)

Uploaded Python 3

File details

Details for the file turboprivate_ai-0.1.6.tar.gz.

File metadata

  • Download URL: turboprivate_ai-0.1.6.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for turboprivate_ai-0.1.6.tar.gz
Algorithm Hash digest
SHA256 e1d598380088acd50eb26a465b759abec0c8c45e8c11401df9afa31a1a72da10
MD5 f5efb0087e0f179d7ab1aaa6b7c7db05
BLAKE2b-256 38c6a9b83d469b74e288cf6855b210861c96a242b4a34cf3bf6f3e00e1e2aa0e

See more details on using hashes here.

File details

Details for the file turboprivate_ai-0.1.6-py3-none-any.whl.

File metadata

File hashes

Hashes for turboprivate_ai-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 8769765b81c15d80cf08270cd3ef1653e0aa2c4dce084df3de119c0387cb1ef5
MD5 dc31170ab8dd876cab3db6d7e1e4bc34
BLAKE2b-256 48d4b88169f1f088b1eb4e4a8d21454891e2c65c849e8953c31c449945091d94

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page