Skip to main content

Ship AI agents safely with release diffs, runtime evidence, and policy gates.

Project description

FlightDeck

Ship AI agents safely with release diffs, runtime evidence, and policy gates.

FlightDeck is local-first (CLI + SQLite + optional flightdeck serve UI): run evidence, pricing tables, and the ledger stay on disk in your environment by default—no trace or billing payload is sent to FlightDeck as a vendor. That posture matters for regulated, air-gapped, and data-sovereignty teams that cannot ship telemetry to a third-party SaaS observability backend. It is not an agent framework, prompt IDE, tracing dashboard, or gateway — it is where what shipped, what ran, what it cost, and whether promote is allowed are recorded and compared.

In ~20 seconds

  1. Register immutable agent releases (release.yaml + bundle checksum).
  2. Ingest run evidence (RunEvent JSONL or POST /v1/events).
  3. Diff baseline vs candidate: cost, latency, errors, and confidence (optional pricing catalog lines on top).
  4. Promote only when policy passes; optional human approval (request → confirm) before the ledger moves.

Example outcome

You ship a candidate whose system prompt drifts by a handful of tokens; under your tariffs the diff shows cost per run up ~31% while policy caps spend. flightdeck release promote (or the HTTP promote path) stays blocked until you change the model, relax policy with intent, or widen evidence — not because CI is slow, but because the governed ledger says no. (The ~31% story uses the two custom pricing YAMLs in examples/quickstart/; flightdeck init alone seeds a bundled snapshot so your first cost-aware diff does not start from an empty pricing ledger.)

Who should use this?

  • Primary buyer / ICP: Platform or ML engineering teams (often 5–30 people) at growth-stage companies shipping two or more LLM agents to production—especially teams that already had a cost or regression incident from a prompt or model change and need a governed promote path.
  • Teams that version agent builds (prompts, tools, model pins) and need a durable audit trail.
  • Engineers who want one command to answer “is this candidate safe to roll forward?” with numbers, not gut feel.
  • Healthcare, fintech, and enterprise operators who cannot default to sending traces or cost data to a hosted observability vendor—local-first evidence and pricing imports are the default integration model.
  • Anyone who has outgrown ad hoc folder diffs or spreadsheet promote checklists.

How FlightDeck fits your stack

FlightDeck sits next to your agent runtime (not in the inference hot path): emit evidence, run flightdeck from a laptop or CI, gate promote with policy (and optional approval).

flowchart LR
  subgraph runtime [Your agent runtime]
    agent[Agent or service]
  end
  subgraph fd [FlightDeck workspace]
    ingest[Ingest RunEvents]
    ledger[(SQLite ledger)]
    diff[release diff]
    promote[promote or rollback]
  end
  subgraph automation [Automation]
    ci[CI job or operator]
  end
  agent -->|"JSONL or HTTP events"| ingest
  ingest --> ledger
  ledger --> diff
  diff --> ci
  ci -->|"policy pass"| promote

Comparison at a glance

FlightDeck Langfuse Arize Phoenix / Cloud Git / CI alone
Primary job Release + promote governance for agents (ledger, diff, policy) Tracing, sessions, evals, LLM observability ML / model observability and monitoring Source control and generic pipelines
Immutable release artifact Yes (release.yaml + checksum) No No Only if you build it
Evidence + cost/latency diff Yes (runs + pricing tables / optional catalog) Different lens (trace-level) Different lens DIY
Default data residency On your machine (CLI / SQLite / local HTTP) Typically SaaS-hosted Cloud offerings Your repo
Policy gate on promote First-class No No DIY

Try the UI: run flightdeck serve, then open http://127.0.0.1:8765/ — Overview, Diff, and Actions (see docs/web-ui.md).

Why it exists

Small prompt or model changes can silently move cost, latency, and error rate. FlightDeck turns those moves into explicit promote decisions backed by ingested runs — before production pointers advance.

Current local spine: versioned release.yaml + checksums · RunEvent ingest (JSONL or arrays) · bundled default pricing on flightdeck init (plus optional pricing import) · flightdeck release diff · policy-gated release promote / rollback · full audit history.

Status

FlightDeck is local-first and ships as a Python CLI backed by SQLite.

v1.0.0 froze SemVer-stable public contracts for the documented CLI, committed schemas/v1/, and POST /v1/events with api_version v1. v1.1.x adds catalog-aware diffs, approval flows, and forensics slices (optional pricing catalog on diffs, promotion request/confirm, read-only runs listing, GET /v1/workspace for UI and automation, Helm/fleet examples) without breaking those v1.0 shapes. v1.2.0 raises the Python floor to 3.11+, tightens Bearer gating for POST /v1/events and GET /v1/* when FLIGHTDECK_LOCAL_API_TOKEN is set, adds optional PostgreSQL, bundled default pricing on flightdeck init, and experimental flightdeck.integrations. See RELEASE_NOTES.md and CHANGELOG.md. The product scope is still intentionally narrow (release governance, not a hosted agent platform).

Maintenance and sustainability: the project is Apache-2.0 with no required commercial license. If FlightDeck matters to your production stack, use SUPPORT.md for security, commercial, and sponsorship pointers, and the Sponsor affordance on github.com/flightdeckdev/flightdeck when it is enabled—signals like that answer “what happens if maintenance stops?” more credibly than roadmap prose alone.

Not implemented yet:

  • hosted control plane
  • automated traffic routing
  • tool-cost pricing
  • OpenTelemetry import/export mapping (optional uv sync --extra telemetry or pip install 'flightdeck-ai[telemetry]' for future work)

Shipped locally:

  • flightdeck serve + JSON routes under /v1/* (read + diff/promote/rollback + event ingest); see Local HTTP API below
  • minimal Python SDK (flightdeck.sdk.client)
  • flightdeck release rollback (policy-gated, audited)
  • optional promotion_requires_approval in flightdeck.yaml with POST /v1/promote/request and POST /v1/promote/confirm

Local HTTP API

With flightdeck serve (default bind 127.0.0.1), the app exposes GET /health, GET /v1/workspace (read-only workspace flags for scripts and the bundled UI), GET /v1/metrics, GET /v1/releases, GET /v1/promoted, GET /v1/actions, GET /v1/promotion-requests, GET /v1/runs, POST /v1/events, POST /v1/diff, POST /v1/promote, POST /v1/promote/request, POST /v1/promote/confirm, and POST /v1/rollback. POST /v1/promote, POST /v1/promote/request, POST /v1/promote/confirm, POST /v1/rollback, and POST /v1/events accept requests only from loopback clients unless FLIGHTDECK_LOCAL_API_TOKEN is set, in which case callers must send Authorization: Bearer <token>; when that token is set, the same Bearer header is required for GET /v1/* read APIs (bundled UI via VITE_FLIGHTDECK_LOCAL_API_TOKEN). POST /v1/diff stays unauthenticated. See docs/http-api.md and SECURITY.md.

Quickstart

Install uv, then from the repo root:

uv sync --extra dev
uv run flightdeck --help

Or with pip and a venv:

python -m venv .venv
python -m pip install -e ".[dev]"
flightdeck --help

Run the cross-platform quickstart smoke (same as CI):

uv run flightdeck-quickstart-verify

(or python -m flightdeck.quickstart_smoke / python scripts/quickstart_smoke.py inside an activated venv)

Or use the bash wrapper (Git Bash / WSL on Windows):

./scripts/smoke.sh

Bundled pricing (default init): flightdeck init migrates the ledger, imports OpenAI, Anthropic, and Google (Gemini-class) tables at pricing_version flightdeck-bundled-2026-05, and writes .flightdeck/pricing-catalog.yaml with pricing_catalog_path set in flightdeck.yaml. In release.yaml, set spec.pricing_reference to { provider: openai | anthropic | google, pricing_version: flightdeck-bundled-2026-05 } to get per-table and catalog cost lines on diffs without authoring YAML. These rates are a convenience snapshot, not live vendor billing—flightdeck pricing import your own files for production. Use flightdeck init --no-bundled-pricing for an empty ledger.

Or walk through the full quickstart (policy + two custom tariffs for the ~31% narrative—same flow CI runs):

flightdeck init   # omit --no-bundled-pricing; bundled tables are additive with the imports below
flightdeck pricing import examples/quickstart/pricing-baseline.yaml
flightdeck pricing import examples/quickstart/pricing-candidate.yaml
flightdeck policy set examples/quickstart/policy.yaml

BASELINE=$(flightdeck release register examples/quickstart/baseline-release)
CANDIDATE=$(flightdeck release register examples/quickstart/candidate-release)

sed "s/__BASELINE_RELEASE_ID__/${BASELINE}/g" examples/quickstart/baseline-events.jsonl > baseline-events.jsonl
sed "s/__CANDIDATE_RELEASE_ID__/${CANDIDATE}/g" examples/quickstart/candidate-events.jsonl > candidate-events.jsonl

flightdeck runs ingest baseline-events.jsonl
flightdeck runs ingest candidate-events.jsonl

flightdeck release diff "$BASELINE" "$CANDIDATE" --window 7d
flightdeck release promote "$BASELINE" --env local --window 7d --reason "initial baseline"
flightdeck release history --agent agent_support --env local

The static event files in examples/quickstart use placeholder release IDs so the repo can ship stable examples. Substitute them before ingestion, or run uv run flightdeck-quickstart-verify / python -m flightdeck.quickstart_smoke (venv) or ./scripts/smoke.sh from Git Bash/WSL on Windows.

Examples: examples/quickstart/ · examples/ci/ (policy gate + Actions) · examples/deploy/ (serve via Docker/Compose) · examples/integration/ (HTTP event emitter) · examples/integration/adoption/ (framework hooks).

Documentation

Development

uv sync --frozen --extra dev
uv run python -m ruff check src tests
uv run python -m pytest
uv run flightdeck-quickstart-verify
uv run flightdeck --help

If you change web/ or Pydantic models, also run the static/ and schemas/ drift checks from DEVELOPMENT.md (same gates as .github/workflows/ci.yml). AGENTS.md and .cursor/rules/flightdeck-ci-artifacts.mdc summarize them for humans and Cursor.

See DEVELOPMENT.md for uv and pip setup, verification, troubleshooting, and PyPI releases (tag-driven; not on merge to main).

License

FlightDeck is licensed under the Apache License, Version 2.0 — see LICENSE and NOTICE.

The canonical public repository: https://github.com/flightdeckdev/flightdeck.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flightdeck_ai-1.2.0.tar.gz (469.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flightdeck_ai-1.2.0-py3-none-any.whl (158.0 kB view details)

Uploaded Python 3

File details

Details for the file flightdeck_ai-1.2.0.tar.gz.

File metadata

  • Download URL: flightdeck_ai-1.2.0.tar.gz
  • Upload date:
  • Size: 469.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flightdeck_ai-1.2.0.tar.gz
Algorithm Hash digest
SHA256 c7da6d7bff75443fa1e03a5f7a439f6fab962f163f4ffe92df81981ea5e0f361
MD5 dbb99ae9e7406063ad6deff7c96533d0
BLAKE2b-256 6ebcb476631705d9fe4d4479ee0c4ebe9332714975afbdf2d3742dc8314b573e

See more details on using hashes here.

Provenance

The following attestation bundles were made for flightdeck_ai-1.2.0.tar.gz:

Publisher: release-pypi.yml on flightdeckdev/flightdeck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flightdeck_ai-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: flightdeck_ai-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 158.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flightdeck_ai-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2faaaa4ad403716435d64281aea48b0884f943c1b60a2f96f9399b02b72ae92b
MD5 63766667df4577e9f2a071402b5129e7
BLAKE2b-256 28368f29cb51cc2e99f38ec703d5068af3fd549425b7a37aee42f82cecb9f755

See more details on using hashes here.

Provenance

The following attestation bundles were made for flightdeck_ai-1.2.0-py3-none-any.whl:

Publisher: release-pypi.yml on flightdeckdev/flightdeck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page