Skip to main content

Unified platform for self-hosted LLM inference + enterprise safety governance

Project description

TurboPrivate AI — Self-Hosted Enterprise AI Platform

Switch from OpenAI in 30 seconds. Drop-in compatible API with built-in safety, governance, and 40–60% cost reduction.

PyPI Python CI Downloads License Security

Run powerful LLMs on your own hardware — with enterprise safety, governance, and full data sovereignty.


Quick Start

One-Click Install

curl -fsSL https://get.turboprivate.ai | bash

Or via pip

pip install turboprivate-ai
turbo deploy --provider bare-metal --gpu auto
turbo model serve meta-llama/Llama-3.1-8B --quant int4
turbo chat

Docker Compose (Hardware-Aware)

git clone https://github.com/Kubenew/turboprivate-ai.git
cd turboprivate-ai

# Auto-detects GPU / Apple Silicon / CPU
curl -fsSL https://get.turboprivate.ai | bash

# Or manually:
docker compose -f docker-compose.gpu.yml up -d    # NVIDIA GPU
docker compose -f docker-compose.mac.yml up -d     # Apple Silicon
docker compose -f docker-compose.cpu.yml up -d     # CPU fallback

Why TurboPrivate AI?

Feature TurboPrivate AI Ollama vLLM OpenAI API
Data Sovereignty ✅ Full ✅ Full ✅ Full ❌ Cloud
Enterprise Safety ✅ Mythos Safe (7 verifiers) ❌ None ❌ None ⚠️ Basic
OpenAI Compatible ✅ 100% ✅ Partial ✅ Partial ✅ Native
INT4/AWQ Quantization ✅ TurboQuant v3 ✅ GGUF ✅ AWQ N/A
RAG Pipeline ✅ Built-in ❌ External External ❌ External
Audit Trail ✅ Immutable JSONL ❌ None ❌ None ⚠️ Limited
RBAC / Multi-tenant ✅ Enterprise ❌ None ❌ None ✅ Enterprise
Kubernetes Native ✅ Helm + K3s ❌ Manual ⚠️ Manual N/A
Cost (RTX 4090) ~8x cheaper Free Free $5-10/M tokens

🏢 For Enterprises

TurboPrivate AI is the Enterprise On-Premise AI Gateway — a secure, compliant orchestration layer between your corporate data and open-source models.

What We Are

  • Secure Wrapper: Mythos Safe gate with 7 verifiers (injection, PII, toxicity, etc.)
  • OpenAI Parity: 100% compatible API — swap base_url and you're done
  • SAP HANA Native: Direct, secure RAG connector with SQL injection guard + RLS
  • Audit & Compliance: Immutable JSONL logs, GDPR/HIPAA/SOC 2 ready
  • Hardware Agnostic: GPU, Apple Silicon, or CPU — auto-optimized

What We Are Not

  • Model Training: We don't train models from scratch
  • Custom UI: We integrate Open WebUI / LibreChat instead of building our own
  • Vector DB: We connect to Qdrant, Milvus, pgvector — we don't replace them

See docs/ENTERPRISE.md for architecture details.

Security & Compliance

  • Full data sovereignty: Nothing leaves your infrastructure
  • Mythos Safe: 7-layer defense (injection, PII, toxicity, hallucination, etc.)
  • Audit trail: Immutable JSONL logs with SIEM integration
  • RBAC: Fine-grained access control with OIDC/SAML support
  • Compliance ready: GDPR, HIPAA, SOC 2, PCI-DSS, ISO 27001

See SECURITY.md and docs/COMPLIANCE.md for details.

Enterprise Integrations

  • SAP HANA: Vector store + RAG pipeline (Guide)
  • SAP AI Core: BYOM deployment support
  • Kubernetes: Helm charts, HPA, multi-cluster
  • Observability: Prometheus, Grafana, OpenTelemetry
  • Secrets: HashiCorp Vault, AWS Secrets Manager, K8s Secrets

Support & SLAs

Tier Response Includes
Community GitHub Issues OSS core, docs, community support
PoC / Pilot 48h 4-8 week trial, 2 models, training
Enterprise 4h SLA 99.5%, unlimited models, TAM
Enterprise Plus 1h Multi-cluster, custom verifiers, SOC2

📅 Book a 30-min PoC Call | ✉️ Contact Sales


📊 Performance (RTX 4090)

Model Quant Tokens/sec VRAM Cost vs Cloud
Llama 3.1 8B INT4 110+ ~5.8 GB ~8x cheaper
Qwen2.5 32B INT4 45+ ~22 GB ~6x cheaper
Llama 3.1 70B INT4 18+ ~48 GB ~5x cheaper

Independent benchmarks: benchmarks/


🛡️ Architecture

CLI / SDK / Dashboard
        ↓
   API Gateway (FastAPI · Auth · Rate Limiting)
        ↓
┌─────────────────┐  ┌───────────────────┐
│  Mythos Safe    │  │  TurboQuant INT4  │
│  Verifiers ·    │  │  vLLM/llama.cpp   │
│  Audit Trail    │  │  Inference Engine │
└─────────────────┘  └───────────────────┘
        ↓
   Memory & RAG (TurboMemory · pdf2struct)
        ↓
──────────┐ ┌──────────┐ ┌──────────┐
│  K3s     │ │Monitoring│ │ Storage  │
│  Cluster │ │Prom/Graf │ │ PG/Redis │
└────────── └──────────┘ └──────────┘

🎬 Demo

TurboPrivate AI deployment demo


Documentation


🔄 Changelog

0.1.9 (2026-05-17)

  • Optimized SAP HANA Secure Connector: pre-compiled regex for high-throughput PII masking
  • Air-gapped installer support: --offline flag + local compose file detection
  • CI/CD security pipeline: automated SQL injection blocking tests
  • Performance tuning: vectorized masking hints, reduced regex overhead

0.1.8 (2026-05-17)

  • SAP HANA Secure RAG Connector: SQL injection guard, RLS mapping, PII masking
  • Hardware-aware installer: auto-detects NVIDIA / Apple Silicon / CPU
  • Docker Compose profiles: gpu.yml, mac.yml, cpu.yml for optimal deployment
  • README overhaul: "What We Are / Are Not" transparency, Enterprise Gateway positioning
  • Modular architecture: decoupled inference backends, plug-and-play vector DBs

0.1.7 (2026-05-17)

  • SECURITY.md with threat model, hardening guide, SBOM, responsible disclosure
  • CONTRIBUTING.md with dev setup, testing, PR guidelines
  • Enterprise Deployment Guide: air-gapped, HA, secrets, proxy, hardware sizing
  • Compliance readiness: GDPR, HIPAA, SOC 2, PCI-DSS, ISO 27001, EU AI Act
  • One-click installer (install.sh) + docker-compose.full.yml with GPU passthrough
  • GitHub issue templates: bug report, feature request, security report
  • README overhaul: feature comparison table, "For Enterprises" section, badges

0.1.6 (2026-05-16)

  • SAP HANA integration guide: cost calculator, security checklist, BYOM, compliance
  • Enterprise hardening best practices
  • SECURITY.md and CONTRIBUTING.md added

0.1.5 (2026-05-16)

  • SAP HANA vector store integration (LangChain + HanaDB)
  • FastAPI RAG endpoint with similarity search
  • Document ingestion with PDF/text + HNSW index

0.1.4 (2026-05-13)

  • Production Helm charts (configmap, ingress, services)
  • TurboQuant v3: AWQ + INT4 mixed-precision
  • K3s provisioner with multi-node discovery
  • vLLM backend: speculative decoding + prefix caching

Full changelog →


📄 License

Apache 2.0 — see LICENSE.


Built by Kubenew — ex-HPE engineer, 12+ years enterprise infrastructure

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turboprivate_ai-0.1.9.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

turboprivate_ai-0.1.9-py3-none-any.whl (58.1 kB view details)

Uploaded Python 3

File details

Details for the file turboprivate_ai-0.1.9.tar.gz.

File metadata

  • Download URL: turboprivate_ai-0.1.9.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for turboprivate_ai-0.1.9.tar.gz
Algorithm Hash digest
SHA256 63a26d96e3580dfbe54bca80579fef29f4a670cd37614e7dc19c0d128b74ab13
MD5 6f5e016705fd01a1d1b0b32e04388d6c
BLAKE2b-256 7d1ecc4f714c5cdfcdb709015399d1b5815d44f4e6c4006091e82b9377407a28

See more details on using hashes here.

File details

Details for the file turboprivate_ai-0.1.9-py3-none-any.whl.

File metadata

File hashes

Hashes for turboprivate_ai-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 5d9a8b5345539ef30a4bdbc700dcb7734121207e3d6c2a22bd7157ef744414d9
MD5 5002b8dc439ad559863f5d14159e445f
BLAKE2b-256 87cf59e9e048e44131972a533819fead50b298037d8060d2b72eba5a40704b66

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page