Skip to main content

Intelligent serverless orchestration for data platforms - eliminate idle cluster waste and cold start latency

Project description

Ghost Compute

Intelligent Serverless Orchestration for Data Platforms

Ghost eliminates the $44.5B annual waste from idle clusters and cold start latency in enterprise data infrastructure. Drop-in optimization for Databricks, EMR, Synapse, and Spark workloads.

License: Apache 2.0 Python 3.9+ Code style: black

The Problem

Enterprises face an impossible tradeoff:

Option Problem
Keep clusters warm Pay for 24/7 idle compute (~30% waste)
Cold start on-demand 5-35 minute startup delays, missed SLAs
Vendor serverless Premium pricing, vendor lock-in, limited control

Ghost solves this by providing intelligent cluster orchestration that delivers sub-second start times while eliminating idle waste.

Key Features

๐Ÿ”ฎ Ghost Predict

ML-powered predictive provisioning that pre-warms resources before you need them.

๐Ÿ’ค Ghost Hibernate

State preservation that snapshots clusters to object storage for instant resume.

๐ŸŠ Ghost Pool

Cross-workload resource sharing that maximizes utilization across teams.

โšก Ghost Spot

Autonomous spot/preemptible instance management with graceful failover.

๐Ÿ“Š Ghost Insight

Real-time cost attribution and optimization recommendations.

Quick Start

Installation

One command to install Ghost Compute with all platforms:

pip install ghost-compute

This single install includes support for:

  • Databricks (Azure, AWS, GCP)
  • Amazon EMR
  • Azure Synapse Analytics
  • Google Cloud Dataproc

Install from source:

git clone https://github.com/ghost-ai-dev/ghost-compute.git
cd ghost-compute
pip install -e .

Basic Usage

from ghost import GhostClient

# Initialize Ghost
ghost = GhostClient(
    platform="databricks",
    credentials_path="~/.ghost/credentials.json"
)

# Enable intelligent orchestration
ghost.optimize(
    workspace_id="your-workspace",
    strategies=["predict", "hibernate", "spot"],
    target_savings=0.40  # 40% cost reduction target
)

# Monitor savings
stats = ghost.get_stats()
print(f"Monthly savings: ${stats.savings_usd:,.2f}")
print(f"Cold starts eliminated: {stats.cold_starts_prevented}")

CLI Usage

# View supported platforms
ghost platforms

# Connect to your platform
# Databricks
ghost connect databricks --workspace-url https://xxx.cloud.databricks.com --token YOUR_TOKEN

# AWS EMR
ghost connect emr --profile default --region us-east-1

# Azure Synapse
ghost connect synapse --subscription-id YOUR_SUB_ID --resource-group YOUR_RG

# Google Dataproc
ghost connect dataproc --project-id YOUR_PROJECT --region us-central1

# Analyze current waste
ghost analyze --output report.json

# Enable optimization
ghost optimize --strategies predict,hibernate,spot

# List clusters
ghost clusters

# View optimization insights
ghost insights

Supported Platforms

All platforms are included in the single pip install ghost-compute package.

Platform Status Features
Databricks โœ… GA Predict, Hibernate, Pool, Spot, Insight
Amazon EMR โœ… GA Predict, Hibernate*, Spot, Pool, Insight
Azure Synapse โœ… GA Predict, Hibernate (auto-pause), Pool, Insight
Google Dataproc โœ… GA Predict, Hibernate*, Preemptible VMs, Pool, Insight
Cloudera CDP ๐Ÿšง Alpha Insight only (coming soon)
Self-managed Spark ๐Ÿšง Alpha Pool, Spot (coming soon)

*EMR and Dataproc hibernation works via cluster termination with state preservation for fast recreation.

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    YOUR APPLICATION                          โ”‚
โ”‚         (Databricks / EMR / Synapse / Dataproc)             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      GHOST LAYER                             โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚   Predict   โ”‚  โ”‚  Hibernate  โ”‚  โ”‚        Spot         โ”‚  โ”‚
โ”‚  โ”‚  Scheduler  โ”‚  โ”‚   Manager   โ”‚  โ”‚    Orchestrator     โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚    Pool     โ”‚  โ”‚   Insight   โ”‚  โ”‚     Multi-Cloud     โ”‚  โ”‚
โ”‚  โ”‚   Manager   โ”‚  โ”‚   Engine    โ”‚  โ”‚     Abstraction     โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  CLOUD INFRASTRUCTURE                        โ”‚
โ”‚              (AWS / Azure / GCP / On-Prem)                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

How It Works

1. Predictive Provisioning

Ghost analyzes your workload patterns to predict when clusters will be needed:

# Ghost learns from historical patterns
# - Scheduled jobs (cron patterns)
# - User activity (login times, query patterns)
# - Data arrival (streaming triggers)
# - Seasonal trends (end of month, quarterly)

# Pre-warms clusters 30-60 seconds before needed
# Result: Sub-second perceived start time

2. State Hibernation

Instead of terminating clusters, Ghost preserves state:

# Traditional approach:
# Terminate โ†’ Cold start (5-35 min) โ†’ Re-initialize

# Ghost approach:
# Hibernate โ†’ Snapshot to S3 โ†’ Resume in <5 sec

3. Intelligent Pooling

Share warm resources across workloads:

# Team A finishes job at 2:00 PM
# Team B starts job at 2:05 PM
# Ghost transfers warm instances โ†’ Zero cold start for Team B

4. Spot Orchestration

Maximize savings with automatic spot management:

# Ghost automatically:
# - Uses spot instances for interruptible workloads
# - Monitors interruption signals
# - Checkpoints state before termination
# - Fails over to on-demand gracefully

Configuration

Environment Variables

GHOST_API_KEY=your-api-key
GHOST_PLATFORM=databricks
GHOST_WORKSPACE_URL=https://xxx.cloud.databricks.com
GHOST_LOG_LEVEL=INFO

Configuration File

# ghost.yaml
platform: databricks
workspace_url: https://xxx.cloud.databricks.com

strategies:
  predict:
    enabled: true
    lookahead_minutes: 60
    confidence_threshold: 0.8

  hibernate:
    enabled: true
    idle_timeout_minutes: 10
    storage_backend: s3
    storage_bucket: ghost-hibernate-states

  spot:
    enabled: true
    max_spot_percentage: 80
    fallback_to_ondemand: true
    interruption_buffer_seconds: 120

  pool:
    enabled: true
    cross_team_sharing: true
    max_idle_instances: 10

  insight:
    enabled: true
    cost_alerts: true
    alert_threshold_usd: 1000

exclusions:
  - cluster_name: "production-critical-*"
  - tag: "ghost:exclude"

Pricing

Ghost operates on a savings-share model:

Tier Monthly Compute Spend Ghost Fee
Starter < $50K 25% of savings
Growth $50K - $250K 20% of savings
Enterprise > $250K Custom

No savings = No payment. We only charge when we deliver results.

Benchmarks

Metric Before Ghost After Ghost Improvement
Average cold start 8.5 min 0.8 sec 99.8% faster
Idle compute waste 32% 4% 87% reduction
Monthly spend ($100K baseline) $100,000 $58,000 42% savings
SLA misses (5-min threshold) 23/month 0/month 100% eliminated

Documentation

Examples

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

# Development setup
git clone https://github.com/ghost-ai-dev/ghost-compute.git
cd ghost-compute
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
pytest

Security

  • SOC 2 Type II certified
  • No data leaves your cloud environment
  • All state stored in your own S3/Blob/GCS buckets
  • Role-based access control
  • Audit logging

Report security issues to security@ghost-compute.io

License

Apache License 2.0 - see LICENSE

Support


Built by Ghost AI | Eliminating waste in enterprise data infrastructure

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghost_compute-0.1.0.tar.gz (74.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ghost_compute-0.1.0-py3-none-any.whl (75.0 kB view details)

Uploaded Python 3

File details

Details for the file ghost_compute-0.1.0.tar.gz.

File metadata

  • Download URL: ghost_compute-0.1.0.tar.gz
  • Upload date:
  • Size: 74.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ghost_compute-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1dc30db7dfe48208c8b453ee586a2afe9961fdaef68f50a37bed506495fc8a61
MD5 77a17a94c56725cb4fe3cc62dddc5f0d
BLAKE2b-256 c63c338f58916a47ff7a10428687b33a1cf5cc1529ead42223a0a206506563c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for ghost_compute-0.1.0.tar.gz:

Publisher: publish.yml on CruiseAI/Ghost

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ghost_compute-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ghost_compute-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 75.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ghost_compute-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2f445c601f4bb19a9182e86959a63fd5dfa19d9308708951b8018fe3216ded45
MD5 13e0f1c850ecbacf4631859c5326bc30
BLAKE2b-256 c9633e55a73f38e2ef88417b41d736c083ee5f961d813fb9bff30ea4b8f8690b

See more details on using hashes here.

Provenance

The following attestation bundles were made for ghost_compute-0.1.0-py3-none-any.whl:

Publisher: publish.yml on CruiseAI/Ghost

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page