Skip to main content

Self-hosted ML platform for small AI teams. One install, full job lifecycle, real-time GPU monitoring.

Project description

Apex dashboard streaming live CIFAR-10 training output

Apex

Self-hosted ML platform for small AI teams.
Your GPU. Your team. No cloud tax.

Website License PyPI Python Discord


Apex is a self-hosted ML platform for teams that own their GPU. One pip install, one command, your browser opens to a full job queue, live GPU monitoring, and browser-native VS Code — all running on a single workstation.

What you get

  • Job queue with Docker execution — submit training jobs from the UI or API, scheduler picks them up, runs them inside Docker containers with full GPU access, streams logs back over WebSocket
  • Real-time GPU telemetry — GPU util, VRAM, temperature, power draw, CPU, RAM, pushed to the dashboard every 2 seconds via server-sent events
  • Browser-native VS Code — launch a code-server dev session in a container with your workspace pre-mounted; one click and you're coding inside the GPU
  • Job history + logs — sortable table of every job, filter by status, tail logs from anywhere, cancel running jobs
  • Pre-built imagesapex/code-server:python and apex/code-server:pytorch (CUDA 12.4 + PyTorch 2.4) bundled as working examples
  • No Kubernetes. No cloud bill. No DevOps engineer.

Quickstart

pip install apex-ml
apex start

That's it. Your browser opens to http://localhost:7000.

On first run, Apex prints a one-time login to the terminal:

=== First-run owner account created ===
email:    owner@apex.local
password: <random>
Save this — it will not be shown again.

Log in with those credentials. The password is only shown once.

Requirements

  • Python 3.10+
  • Docker daemon running (used to execute jobs and dev sessions)
  • NVIDIA GPU (optional — Apex degrades gracefully to CPU-only if no GPU is present)
  • Linux or macOS (tested on Ubuntu 22.04 + macOS 14)

Build the base images (one-time)

apex build-images                # Python image (~2 min)
apex build-images --pytorch      # + PyTorch/CUDA image (~15 min, ~8 GB)

Submit your first job

curl -X POST http://localhost:7000/api/jobs \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "hello-world",
    "image": "apex/code-server:python",
    "script": "python -c \"print(\\\"hello from apex\\\")\"",
    "gpu_count": 0,
    "priority": "normal"
  }'

Or use the UI — click + Submit job on the dashboard.

Custom images

You can use any Docker image for training jobs:

docker build -t my-team/training-image .
# then specify my-team/training-image when submitting a job

For dev sessions (browser VS Code), the image must have code-server installed. Build on top of the official base image:

FROM apex/code-server:python
RUN pip install torch transformers

Or use codercom/code-server:latest directly.

Train CIFAR-10 in 60 seconds

# Build the PyTorch image (one-time, ~8 GB)
git clone https://github.com/apexhq-dev/apex && cd apex
docker build -t apex/code-server:pytorch -f docker/pytorch.Dockerfile docker/

# Drop the training script into your workspace
cp runway-workspace/cifar10_train.py ~/apex-workspace/

# Submit the job via the UI or curl
# Image:  apex/code-server:pytorch
# Script: python /workspace/cifar10_train.py --epochs 2
# GPU:    1 GPU

The training job hits ~66% test accuracy on an NVIDIA L4 in about 30 seconds.

Screenshots

Overview dashboard Live training logs
Overview dashboard — live GPU metrics, stat cards, submit form, job list, active sessions Live training logs — WebSocket stream of real CIFAR-10 training output
Job history table Metrics page
Job history — sortable, filterable, per-row logs + cancel/remove actions Metrics — large GPU + CPU chart with 8-cell live readings grid
Dev sessions Docker images
Dev sessions — browser VS Code in one click, workspace pre-mounted Docker images — reads directly from the host daemon, no registry push

The math

Compare a full month of training compute, 24/7:

Option Price Math
RunPod RTX 4090 $316 / mo $0.44/hr × 720 hrs
Lambda H100 $2,150 / mo $2.99/hr × 720 hrs
AWS p4d.24xlarge (8× A100) $23,594 / mo $32.77/hr × 720 hrs
Your workstation + Apex $29 / mo Team tier, 8 seats, unlimited jobs

Pricing

Tier Price Gets you
Free $0 forever Full feature set, 1 seat, unlimited jobs, community support
Team $29/mo flat Everything in Free + 8 seats, multi-user auth, audit log, SSO, priority support
Hosted $99/mo + GPU We run Apex for you on a rented GPU, SLA, backups

All tiers run the same codebase. Free and Team are self-hosted on your hardware.

Architecture

Intentionally boring. One Python process, one pip install, one GPU machine.

apex/
├── cli.py                # click CLI: start, stop, status
├── config.py             # loads ~/.apex/config.json
├── docker_mgr.py         # docker SDK wrapper (create, start, clean up)
├── monitor/              # GPU (pynvml) + CPU (psutil) collector thread
├── scheduler/            # SQLite-backed queue + worker thread
├── server/               # FastAPI app + routes
│   ├── app.py            # app factory
│   ├── auth.py           # JWT + bcrypt
│   ├── db.py             # SQLite schema + connection helper
│   └── routes/           # metrics, jobs, sessions, images, users
└── static/               # Vanilla HTML/CSS/JS — no build step

Stack: FastAPI · Uvicorn · SQLite · pynvml · docker-py · SSE-Starlette · code-server · vanilla JS Not in the stack: React Redis Postgres RabbitMQ Kubernetes Helm Node webpack

See SPEC.md for the full technical specification.

CLI

apex start [--host 0.0.0.0] [--port 7000] [--no-browser]
apex status
apex stop              # Ctrl+C works fine, this is a stub
apex logs              # pipe `apex start` output instead
apex build-images      # build base container images (one-time)
apex config show       # print current config
apex config set KEY VALUE   # update a config value (workspace, port, host)
apex --version

Configuration

Apex reads ~/.apex/config.json on startup. Defaults:

{
  "workspace_path": "~/apex-workspace",
  "port": 7000,
  "host": "0.0.0.0",
  "session_port_range": [8080, 8200],
  "jwt_secret": "<auto-generated on first run>"
}

View or change settings with the apex config command:

apex config show                          # print current config
apex config set workspace /mnt/nas/apex   # change workspace path
apex config set port 8000                 # change UI port
apex config set host 127.0.0.1           # bind to localhost only

Or use the APEX_WORKSPACE environment variable (useful for system-level config):

export APEX_WORKSPACE=/mnt/nas/apex
apex start

Shared workspace (NAS / network drives)

Every job and dev session mounts the workspace directory into the container at /workspace. Pointing multiple team members' machines at the same network share means scripts, datasets, and checkpoints are visible to everyone without manual copying.

# 1. Mount the shared drive at the OS level (NFS example)
sudo mount -t nfs 192.168.1.10:/shared /mnt/nas

# 2. Point Apex at it
apex config set workspace /mnt/nas/apex-workspace

# 3. Restart
apex stop && apex start

Samba (SMB) mounts work the same way — mount it at the OS level, then run apex config set workspace <path>. The workspace directory is created automatically if it does not exist.

API

Route Method Description
/api/health GET {"ok": true}
/api/metrics/stream GET (SSE) Live GPU/CPU metrics every 2s
/api/metrics/current GET Single snapshot of current metrics
/api/jobs GET List jobs (filters: status, limit, offset)
/api/jobs POST Submit a new job
/api/jobs/{id} GET Job detail
/api/jobs/{id} DELETE Cancel running job or remove completed one
/api/jobs/{id}/logs WS Stream container logs line-by-line
/api/sessions GET/POST/DELETE List, launch, stop dev sessions
/api/images GET List Docker images from the host daemon
/api/users/login POST JWT login
/api/users/me GET Current user
/api/users/invite POST Invite a new team member (admin only)

Development

git clone https://github.com/apexhq-dev/apex
cd apex
pip install -e .
apex start --skip-docker-check  # dev mode without Docker

The frontend is plain HTML/CSS/JS in apex/static/ — edit and refresh, no build step.

License

Apache License 2.0 © 2026 Apex contributors

Community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apex_ml-0.2.1.tar.gz (54.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

apex_ml-0.2.1-py3-none-any.whl (61.4 kB view details)

Uploaded Python 3

File details

Details for the file apex_ml-0.2.1.tar.gz.

File metadata

  • Download URL: apex_ml-0.2.1.tar.gz
  • Upload date:
  • Size: 54.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for apex_ml-0.2.1.tar.gz
Algorithm Hash digest
SHA256 3de51000e05c2405ad9ca959e75e5452b31ab44268c67d295623c372cba961a9
MD5 00f62cbf329f802a2af8f075b4991442
BLAKE2b-256 d42743476fe2350e58df2525345e8cff562bb6538b2d073426bc75d8c9b0a273

See more details on using hashes here.

File details

Details for the file apex_ml-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: apex_ml-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 61.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for apex_ml-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e6c73f967dc612709c100e0efa5dd9a43505af06080402ca1584721b2b073011
MD5 1ab3b633b3baf742a6fd412a14476696
BLAKE2b-256 353fc0fceee67869a57fda08685982ea24e07f07422512bd17b068818eeb7e5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page