Intelligent serverless orchestration for data platforms - eliminate idle cluster waste and cold start latency
Project description
Ghost Compute
Intelligent Serverless Orchestration for Data Platforms
Ghost eliminates the $44.5B annual waste from idle clusters and cold start latency in enterprise data infrastructure. Drop-in optimization for Databricks, EMR, Synapse, and Spark workloads.
The Problem
Enterprises face an impossible tradeoff:
| Option | Problem |
|---|---|
| Keep clusters warm | Pay for 24/7 idle compute (~30% waste) |
| Cold start on-demand | 5-35 minute startup delays, missed SLAs |
| Vendor serverless | Premium pricing, vendor lock-in, limited control |
Ghost solves this by providing intelligent cluster orchestration that delivers sub-second start times while eliminating idle waste.
Key Features
๐ฎ Ghost Predict
ML-powered predictive provisioning that pre-warms resources before you need them.
๐ค Ghost Hibernate
State preservation that snapshots clusters to object storage for instant resume.
๐ Ghost Pool
Cross-workload resource sharing that maximizes utilization across teams.
โก Ghost Spot
Autonomous spot/preemptible instance management with graceful failover.
๐ Ghost Insight
Real-time cost attribution and optimization recommendations.
Quick Start
Installation
One command to install Ghost Compute with all platforms:
pip install ghost-compute
This single install includes support for:
- Databricks (Azure, AWS, GCP)
- Amazon EMR
- Azure Synapse Analytics
- Google Cloud Dataproc
Install from source:
git clone https://github.com/ghost-ai-dev/ghost-compute.git
cd ghost-compute
pip install -e .
Basic Usage
from ghost import GhostClient
# Initialize Ghost
ghost = GhostClient(
platform="databricks",
credentials_path="~/.ghost/credentials.json"
)
# Enable intelligent orchestration
ghost.optimize(
workspace_id="your-workspace",
strategies=["predict", "hibernate", "spot"],
target_savings=0.40 # 40% cost reduction target
)
# Monitor savings
stats = ghost.get_stats()
print(f"Monthly savings: ${stats.savings_usd:,.2f}")
print(f"Cold starts eliminated: {stats.cold_starts_prevented}")
CLI Usage
# View supported platforms
ghost platforms
# Connect to your platform
# Databricks
ghost connect databricks --workspace-url https://xxx.cloud.databricks.com --token YOUR_TOKEN
# AWS EMR
ghost connect emr --profile default --region us-east-1
# Azure Synapse
ghost connect synapse --subscription-id YOUR_SUB_ID --resource-group YOUR_RG
# Google Dataproc
ghost connect dataproc --project-id YOUR_PROJECT --region us-central1
# Analyze current waste
ghost analyze --output report.json
# Enable optimization
ghost optimize --strategies predict,hibernate,spot
# List clusters
ghost clusters
# View optimization insights
ghost insights
Supported Platforms
All platforms are included in the single pip install ghost-compute package.
| Platform | Status | Features |
|---|---|---|
| Databricks | โ GA | Predict, Hibernate, Pool, Spot, Insight |
| Amazon EMR | โ GA | Predict, Hibernate*, Spot, Pool, Insight |
| Azure Synapse | โ GA | Predict, Hibernate (auto-pause), Pool, Insight |
| Google Dataproc | โ GA | Predict, Hibernate*, Preemptible VMs, Pool, Insight |
| Cloudera CDP | ๐ง Alpha | Insight only (coming soon) |
| Self-managed Spark | ๐ง Alpha | Pool, Spot (coming soon) |
*EMR and Dataproc hibernation works via cluster termination with state preservation for fast recreation.
Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ YOUR APPLICATION โ
โ (Databricks / EMR / Synapse / Dataproc) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ GHOST LAYER โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Predict โ โ Hibernate โ โ Spot โ โ
โ โ Scheduler โ โ Manager โ โ Orchestrator โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Pool โ โ Insight โ โ Multi-Cloud โ โ
โ โ Manager โ โ Engine โ โ Abstraction โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CLOUD INFRASTRUCTURE โ
โ (AWS / Azure / GCP / On-Prem) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
How It Works
1. Predictive Provisioning
Ghost analyzes your workload patterns to predict when clusters will be needed:
# Ghost learns from historical patterns
# - Scheduled jobs (cron patterns)
# - User activity (login times, query patterns)
# - Data arrival (streaming triggers)
# - Seasonal trends (end of month, quarterly)
# Pre-warms clusters 30-60 seconds before needed
# Result: Sub-second perceived start time
2. State Hibernation
Instead of terminating clusters, Ghost preserves state:
# Traditional approach:
# Terminate โ Cold start (5-35 min) โ Re-initialize
# Ghost approach:
# Hibernate โ Snapshot to S3 โ Resume in <5 sec
3. Intelligent Pooling
Share warm resources across workloads:
# Team A finishes job at 2:00 PM
# Team B starts job at 2:05 PM
# Ghost transfers warm instances โ Zero cold start for Team B
4. Spot Orchestration
Maximize savings with automatic spot management:
# Ghost automatically:
# - Uses spot instances for interruptible workloads
# - Monitors interruption signals
# - Checkpoints state before termination
# - Fails over to on-demand gracefully
Configuration
Environment Variables
GHOST_API_KEY=your-api-key
GHOST_PLATFORM=databricks
GHOST_WORKSPACE_URL=https://xxx.cloud.databricks.com
GHOST_LOG_LEVEL=INFO
Configuration File
# ghost.yaml
platform: databricks
workspace_url: https://xxx.cloud.databricks.com
strategies:
predict:
enabled: true
lookahead_minutes: 60
confidence_threshold: 0.8
hibernate:
enabled: true
idle_timeout_minutes: 10
storage_backend: s3
storage_bucket: ghost-hibernate-states
spot:
enabled: true
max_spot_percentage: 80
fallback_to_ondemand: true
interruption_buffer_seconds: 120
pool:
enabled: true
cross_team_sharing: true
max_idle_instances: 10
insight:
enabled: true
cost_alerts: true
alert_threshold_usd: 1000
exclusions:
- cluster_name: "production-critical-*"
- tag: "ghost:exclude"
Pricing
Ghost operates on a savings-share model:
| Tier | Monthly Compute Spend | Ghost Fee |
|---|---|---|
| Starter | < $50K | 25% of savings |
| Growth | $50K - $250K | 20% of savings |
| Enterprise | > $250K | Custom |
No savings = No payment. We only charge when we deliver results.
Benchmarks
| Metric | Before Ghost | After Ghost | Improvement |
|---|---|---|---|
| Average cold start | 8.5 min | 0.8 sec | 99.8% faster |
| Idle compute waste | 32% | 4% | 87% reduction |
| Monthly spend ($100K baseline) | $100,000 | $58,000 | 42% savings |
| SLA misses (5-min threshold) | 23/month | 0/month | 100% eliminated |
Documentation
- Getting Started Guide
- Platform Integration
- API Reference
- Configuration Options
- Best Practices
- Troubleshooting
Examples
- Databricks Notebook Integration
- EMR Step Function Integration
- Airflow DAG with Ghost
- Terraform Module
- Kubernetes Operator
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
# Development setup
git clone https://github.com/ghost-ai-dev/ghost-compute.git
cd ghost-compute
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
pytest
Security
- SOC 2 Type II certified
- No data leaves your cloud environment
- All state stored in your own S3/Blob/GCS buckets
- Role-based access control
- Audit logging
Report security issues to security@ghost-compute.io
License
Apache License 2.0 - see LICENSE
Support
- ๐ง Email: support@ghost-compute.io
- ๐ฌ Slack: ghost-compute.slack.com
- ๐ Docs: docs.ghost-compute.io
- ๐ Issues: GitHub Issues
Built by Ghost AI | Eliminating waste in enterprise data infrastructure
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ghost_compute-0.1.0.tar.gz.
File metadata
- Download URL: ghost_compute-0.1.0.tar.gz
- Upload date:
- Size: 74.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1dc30db7dfe48208c8b453ee586a2afe9961fdaef68f50a37bed506495fc8a61
|
|
| MD5 |
77a17a94c56725cb4fe3cc62dddc5f0d
|
|
| BLAKE2b-256 |
c63c338f58916a47ff7a10428687b33a1cf5cc1529ead42223a0a206506563c0
|
Provenance
The following attestation bundles were made for ghost_compute-0.1.0.tar.gz:
Publisher:
publish.yml on CruiseAI/Ghost
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ghost_compute-0.1.0.tar.gz -
Subject digest:
1dc30db7dfe48208c8b453ee586a2afe9961fdaef68f50a37bed506495fc8a61 - Sigstore transparency entry: 850164547
- Sigstore integration time:
-
Permalink:
CruiseAI/Ghost@ce8e73a9715a4f9241227ce9b1f80ed4a773c64d -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/CruiseAI
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ce8e73a9715a4f9241227ce9b1f80ed4a773c64d -
Trigger Event:
release
-
Statement type:
File details
Details for the file ghost_compute-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ghost_compute-0.1.0-py3-none-any.whl
- Upload date:
- Size: 75.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f445c601f4bb19a9182e86959a63fd5dfa19d9308708951b8018fe3216ded45
|
|
| MD5 |
13e0f1c850ecbacf4631859c5326bc30
|
|
| BLAKE2b-256 |
c9633e55a73f38e2ef88417b41d736c083ee5f961d813fb9bff30ea4b8f8690b
|
Provenance
The following attestation bundles were made for ghost_compute-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on CruiseAI/Ghost
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ghost_compute-0.1.0-py3-none-any.whl -
Subject digest:
2f445c601f4bb19a9182e86959a63fd5dfa19d9308708951b8018fe3216ded45 - Sigstore transparency entry: 850164549
- Sigstore integration time:
-
Permalink:
CruiseAI/Ghost@ce8e73a9715a4f9241227ce9b1f80ed4a773c64d -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/CruiseAI
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ce8e73a9715a4f9241227ce9b1f80ed4a773c64d -
Trigger Event:
release
-
Statement type: