Unified platform for self-hosted LLM inference + enterprise safety governance
Project description
TurboPrivate AI — Private & Safe Enterprise AI Platform
Run powerful LLMs on your own hardware — 40–60% cheaper than public clouds, with built-in enterprise safety & governance.
Why TurboPrivate AI?
- Full data sovereignty — nothing leaves your infrastructure
- Dramatic cost reduction — INT4/AWQ quantization + smart routing
- Enterprise Safety — powered by Mythos Safe (defensive evaluation, jailbreak protection, audit)
- OpenAI compatible — drop-in replacement for your existing applications
- One-command deploy — from bare metal to production in minutes
Key Features
- TurboQuant Engine — State-of-the-art INT4/AWQ quantization with minimal quality loss
- Mythos Safe — Multi-layer defensive safety (pre & post-flight gates)
- Private RAG — Secure document ingestion and retrieval
- Full-stack observability — Prometheus, Grafana, OpenTelemetry
- Enterprise ready — RBAC, audit trail, multi-tenancy, compliance support
- Hardware flexibility — RTX 4090, A100/H100, or even CPU-only
Performance (RTX 4090)
| Model | Quant | Tokens/sec | VRAM Usage | Cost vs Groq/AWS |
|---|---|---|---|---|
| Llama 3.1 8B | INT4 | 110+ | ~5.8 GB | ~8x cheaper |
| Qwen2.5 32B | INT4 | 45+ | ~22 GB | ~6x cheaper |
| Llama 3.1 70B | INT4 | 18+ | ~48 GB | ~5x cheaper |
Quick Start
# 1. Deploy full stack (K8s)
turbo deploy --provider bare-metal --gpu auto
# 2. Serve model
turbo model serve meta-llama/Llama-3.1-8B --quant int4
# 3. Chat
turbo chat
Or use Docker Compose for quick testing:
docker compose up -d # dev
# docker compose -f docker-compose.prod.yml up -d # production (GPU)
Pricing
| Tier | Price | Best For | Includes |
|---|---|---|---|
| PoC / Pilot | €15,000 – €35,000 | 4–8 weeks trial | Deployment, 2 models, training, support |
| Enterprise License | €65,000 / year | Single cluster, up to 10 users | Full features, unlimited models, SLA 99.5% |
| Enterprise Plus | €120,000 – €180,000 / year | Multiple clusters, 50+ users | Priority support, custom verifiers, SOC2 |
| Managed Service | €8,000 – €25,000 / month | No ops team | Fully managed by us |
Volume discounts available for 3+ clusters.
All prices exclude hardware.
Interested in a private demo?
📅 Book a 30-min PoC Call | ✉️ Contact Sales
Architecture
CLI / SDK / Dashboard
↓
API Gateway (FastAPI · Auth · Rate Limiting)
↓
┌─────────────────┐ ┌───────────────────┐
│ Mythos Safe │ │ TurboQuant INT4 │
│ Verifiers · │ │ vLLM/llama.cpp │
│ Audit Trail │ │ Inference Engine │
└─────────────────┘ └───────────────────┘
↓
Memory & RAG (TurboMemory · pdf2struct)
↓
┌──────────┐ ┌──────────┐ ┌──────────┐
│ K3s │ │Monitoring│ │ Storage │
│ Cluster │ │Prom/Graf │ │ PG/Redis │
└──────────┘ └──────────┘ └──────────┘
Demo
Documentation
- Architecture — Full system design
- Deployment — Production deployment guide
- CLI Reference — All CLI commands
- API Reference — FastAPI routes
- Safety Gate — Verifier configuration
- Demo Assets — GIF recording tape + deploy script
Changelog
0.1.4 (2026-05-13)
- Production-hardened Helm charts (configmap, ingress, services templates)
- Enhanced rate limiter with token bucket algorithm + per-route limits
- Improved safety gate middleware with pre/post-flight hook chain
- Realtime metrics visualization in dashboard endpoint
- TurboQuant v3 quantization pipeline: AWQ + INT4 mixed-precision
- Backup/restore CLI with age-encrypted snapshots
- K3s provisioner with multi-node discovery + node labels
- vLLM backend: speculative decoding toggle + prefix caching
- llama.cpp backend: flash attention + GPU offloading
- Worker refinements: quantize retry, eval timeout, ingestion dedup
- CLI enhancements: model status, deploy progress, backup summary
- PII detector regex expansion (passport, SSN, phone variants)
- Vulnerability verifier: CVE-2025 scoring + dependency jail status
- PDF/image ingestion with OCR fallback in RAG pipeline
0.1.3 (2026-05-13)
- Extended demo GIF to 61s with 5-scene animation (intro, deploy, serve+chat, safety block, dashboard)
- Switched README GIF to absolute GitHub raw URL for PyPI rendering
0.1.2 (2026-05-11)
- Enterprise-ready README with pricing table and benchmarks
- Added docs/ARCHITECTURE.md with system design diagrams
- Added docs/DEPLOYMENT.md with production deployment guide
- Added examples/ with HTTP, safety, RAG, and quantization samples
- Added .env.example with all configuration options
- Added benchmarks/ with RTX 4090 performance results
- Switched license from MIT to Apache 2.0
- Added
turbo doctorCLI command for system health checks - Added GitHub Actions Docker build workflow
- Updated pyproject.toml with
fullinstall extra
0.1.1 (2026-05-11)
- Migrated to hatchling build system
- Fixed missing
InferenceEngineimport inturbo.inference - Fixed
TracerProviderbug in OpenTelemetry instrumentation - Added structured logging to all exception handlers
- Consolidated Celery workers into shared
worker.celery_app - Added CI workflow with ruff linting + pytest
- Improved graceful shutdown (audit trail flush)
- Updated dependencies (replaced
unstructuredwith actual used libs)
License
Apache 2.0 — see LICENSE.
Built by Kubenew — ex-HPE engineer, 12+ years enterprise infrastructure
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file turboprivate_ai-0.1.4.tar.gz.
File metadata
- Download URL: turboprivate_ai-0.1.4.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8dfae79cd9780dd94dee526c84b7ad8dad0eaa8356b6b698e0cb51664cc28e94
|
|
| MD5 |
11e6123e07d0d7f51b7df17de33b7bf8
|
|
| BLAKE2b-256 |
a36f644ac977f1ecb617a78925d4a7190ccd59a945d81db85d56dbc55cfbb341
|
File details
Details for the file turboprivate_ai-0.1.4-py3-none-any.whl.
File metadata
- Download URL: turboprivate_ai-0.1.4-py3-none-any.whl
- Upload date:
- Size: 55.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e73ed3181822c09626496910677c3123833318fd45dcafc1bb738e31bafb2d8
|
|
| MD5 |
34f413820452040b325d0d6dfd921154
|
|
| BLAKE2b-256 |
fd463a3c6220c0304bec3251149026c49b9cdfa9e78c5d8be8d7706b12df9370
|