Skip to main content

Backup and restore orchestration for AI-Hub data services

Project description

swiss-ai-hub-backup

The centralized backup, restore, and PostgreSQL-maintenance service for Swiss AI Hub — a self-contained Dagster instance that snapshots every stateful service to S3.

PyPI Python License


What is Swiss AI Hub?

Swiss AI Hub is an open-source, self-hosted AI platform for enterprises. One docker compose up starts ~30 integrated containers across several stateful stores — PostgreSQL, FerretDB, Milvus, Neo4j, ClickHouse, Valkey, and NATS JetStream. This package keeps that data safe.

What is this package?

swiss-ai-hub-backup is the platform's backup/restore and database-maintenance plane. It runs as its own independent Dagster instance (separate from the data pipelines, with its own SQLite storage) and:

  • Backs up PostgreSQL (×2), Milvus, Neo4j, ClickHouse, Valkey, and NATS JetStream to S3 (SeaweedFS) on a schedule — gracefully stopping and restarting the managed containers around each run for consistent snapshots.
  • Restores any service from a chosen backup timestamp.
  • Maintains the platform PostgreSQL online: prunes verbose Dagster event_logs, tunes autovacuum, and runs pg_repack — so deployments stay bounded over time without downtime.

Each stateful service has a BackupHandler (postgres, milvus, neo4j, clickhouse, valkey, nats); the whole thing is wired into a Dagster asset graph by backup_definitions(). Because it operates on the storage layer and needs to stop containers, it requires read access to the Docker socket (/var/run/docker.sock), which it uses to discover platform containers via their com.docker.compose.project label.

Unlike the other Swiss AI Hub packages, this is an operational service, not a library you build agents/APIs on. It is licensed AGPL-3.0-or-later (the rest of the SDK is Apache-2.0).

Should you use this package?

Most operators don't install it directly — it ships with the platform as the backup-* containers (a gRPC code server, a daemon, and a webserver UI on :3004). You'd reach for this PyPI package to run the backup plane standalone, embed its logic, or extend it — for example, adding a BackupHandler for a stateful service of your own.

What it does

Job Schedule Stops containers?
Full backup (all services → S3) daily Yes (consistent snapshots)
Restore (service ← chosen timestamp) on demand Yes
event_logs cleanup + autovacuum tuning weekly No (online-safe)
pg_repack (reclaim disk) monthly No (online-safe)

Installation

pip install swiss-ai-hub-backup
# or
uv add swiss-ai-hub-backup

Requires Python 3.13.


Quick start

The backup plane is a Dagster code location built by backup_definitions():

# my_backup/__init__.py
from swiss_ai_hub.backup.dagster.definitions import backup_definitions

defs = backup_definitions()   # 26 assets, 4 jobs: backup, restore, cleanup, repack

Inspect and run it with the Dagster UI (it keeps its own state in DAGSTER_HOME):

export DAGSTER_HOME=/tmp/backup-dagster && mkdir -p "$DAGSTER_HOME"
set -a && source .env && set +a          # S3 + DB credentials, BACKUP_* settings
dagster dev -m my_backup                 # http://localhost:3000

From the UI you can materialize the online-safe maintenance jobs (cleanup, pg_repack) against a running stack without disruption. The full backup/restore jobs stop and restart containers, so run those deliberately — and note they need access to the Docker socket and to all the stateful services. dagster definitions validate -m my_backup loads the whole code location without running anything (a fast CI/sanity check).

Settings are not auto-loaded from the environment. Connection and BACKUP_* settings are read only when constructed, so export them in the process that runs Dagster (set -a && source .env && set +a).


How it's deployed

In production the backup plane runs as three containers from one image, forming a self-contained Dagster instance:

Container Role Notes
backup-code Dagster gRPC code server (dagster api grpc … :4266) mounts /var/run/docker.sock:ro to stop/start containers; on data + storage
backup-daemon Dagster daemon runs the schedules and sensors; on data
backup-webserver Dagster UI (:3004) inspect runs, trigger restores; on proxy + data

Because it needs the Docker socket and the platform's stateful services, the canonical deployment is the platform's own backup compose. See infra/deployment and the documentation for the full container setup, retention config, and the BACKUP_* environment variables. If you run your own variant, mirror that three-container shape and grant the code server read access to the Docker socket.

Extending — add a service to back up

Implement the BackupHandler ABC for your service in services/, then register it in HANDLER_FACTORIES — the Dagster asset wiring picks it up automatically (handlers are synchronous by design):

from swiss_ai_hub.backup.services.base import BackupHandler

class MyServiceHandler(BackupHandler):
    def backup(self, context) -> ...:
        ...   # dump your service's state to S3
    def restore(self, context) -> ...:
        ...   # restore it from a backup

If the handler needs Docker access, type-hint a DockerManager parameter in __init__ and the factory injects it. The maintenance subsystem follows the same pattern (MaintenanceHandler + CLEANUP_HANDLER_NAMES). See the documentation for the full handler contract.


Links

License

AGPL-3.0-or-later — see packages/backup/LICENSE. Note this differs from the Apache-2.0 SDK packages; for the full per-package license matrix, see LICENSES.md.


Part of Swiss AI Hub. Built in Switzerland by bbv Software Services.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swiss_ai_hub_backup-0.301.6.tar.gz (52.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swiss_ai_hub_backup-0.301.6-py3-none-any.whl (75.1 kB view details)

Uploaded Python 3

File details

Details for the file swiss_ai_hub_backup-0.301.6.tar.gz.

File metadata

  • Download URL: swiss_ai_hub_backup-0.301.6.tar.gz
  • Upload date:
  • Size: 52.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swiss_ai_hub_backup-0.301.6.tar.gz
Algorithm Hash digest
SHA256 f52c67db08dc0de245214cd9728685015812c2ffd90611aaff540c5fa76c3b9b
MD5 d82446560ed8faa34c4d500c602fd811
BLAKE2b-256 d94b296b8fe7b021ce7208ba5583959e4030c1f8c0ea2b911ce38321de90a58c

See more details on using hashes here.

File details

Details for the file swiss_ai_hub_backup-0.301.6-py3-none-any.whl.

File metadata

  • Download URL: swiss_ai_hub_backup-0.301.6-py3-none-any.whl
  • Upload date:
  • Size: 75.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swiss_ai_hub_backup-0.301.6-py3-none-any.whl
Algorithm Hash digest
SHA256 f55adb337374fad8bac05d97f8fecab53104edc0f79b462e767f2a514981979f
MD5 b6f0a7a7b66e8b931cd9591e75756b1c
BLAKE2b-256 ebd983a96f59839b3333f569b9326f4cd6b8d4c57ba40853c64f96ac54f2d088

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page