Skip to main content

FluxLoop CLI — Agent evaluation framework

Project description

FluxLoop CLI

Command-line interface for the FluxLoop agent evaluation framework.

Installation

pip install fluxloop-cli

Quick Start

# Authenticate
fluxloop auth login

# Create a project and scenario
fluxloop projects create --name my-project
fluxloop init scenario smoke-test
fluxloop scenarios create --name smoke-test --goal "Validate agent accuracy"

# Run a skill test
fluxloop skill validate ./SKILL.md
fluxloop skill test ./SKILL.md --input "Generate a summary" --copy-files src/

# Full test workflow (pull inputs → run → push results)
fluxloop test --scenario smoke-test

Commands

Core

Command Description
fluxloop run Run agent over configured inputs using the selected executor
fluxloop skill validate Validate SKILL.md against static contracts
fluxloop skill test Execute a skill in sandbox with behavior contract evaluation
fluxloop skill benchmark Run N benchmark iterations and report stats
fluxloop init scenario <name> Scaffold a new scenario directory
fluxloop context show Display current project, scenario, and workspace state
fluxloop test Full workflow: pull → run → push
fluxloop test results View local or remote test results
fluxloop evaluate Trigger server-side evaluation and wait for completion

Authentication & Projects

Command Description
fluxloop auth login Authenticate via device code flow
fluxloop auth logout Remove stored credentials
fluxloop auth status Show login state and token expiry
fluxloop projects list List available projects
fluxloop projects create Create a new project
fluxloop projects select Set active project
fluxloop apikeys create Generate an API key (saved to .fluxloop/.env)
fluxloop apikeys list List existing API keys

Scenarios & Data Pipeline

Command Description
fluxloop scenarios create Create a scenario on the server
fluxloop scenarios select Set active scenario locally
fluxloop scenarios refine Refine scenario contracts
fluxloop personas suggest Generate user personas via LLM
fluxloop inputs synthesize Generate test inputs from personas
fluxloop inputs list List available input sets
fluxloop inputs qc Quality-check generated inputs
fluxloop inputs refine Refine inputs iteratively
fluxloop bundles publish Publish input sets as versioned bundles
fluxloop bundles list List published bundles
fluxloop manifests show Display current manifest
fluxloop manifests publish Publish manifest to server
fluxloop data push Upload knowledge or ground-truth data
fluxloop data bind Bind uploaded data to a scenario
fluxloop data gt status Check ground-truth materialization status
fluxloop intent refine Refine agent profile and test intent

Sync

Command Description
fluxloop sync pull Pull bundle (inputs, personas, criteria) from server
fluxloop sync push Upload test results to server

Configuration

Scenario configuration lives in YAML files under scenarios/<name>/configs/:

scenarios/
  smoke-test/
    configs/
      simulation.yaml    # Runner, iterations, conversation settings
      input.yaml         # Input source and items
      scenario.yaml      # Scenario metadata
    contracts/
      static.yaml        # SKILL.md structure rules
      behavior.yaml      # Execution assertions
    pulled/              # Data from sync pull

Runner Types

Configure the executor in simulation.yaml:

Function — call a Python handler directly:

runner:
  type: function
  target: "my_agent:handler"
  timeout_seconds: 30

Skill — run a SKILL.md in Claude Agent SDK sandbox:

runner:
  type: skill
  skill_path: ./SKILL.md
  harness: claude
  allowed_tools: ["Read", "Write", "Shell"]
  skill_max_turns: 10
  budget: 0.50

Process — invoke a subprocess via NDJSON protocol:

runner:
  type: process
  command: ["python", "agent.py"]
  protocol: ndjson

Input Sources

input:
  source: inline          # inline | generated | bundle | pulled
  items:
    - text: "Hello, summarize this document"
    - text: "What are the key takeaways?"

When source: pulled, inputs are loaded from pulled/inputs.json after sync pull.

Environment Variable Substitution

YAML config values support ${VAR} syntax, resolved from environment variables.

Contracts

Static Contract

Validates SKILL.md structure before execution:

  • Required sections (e.g., # Purpose, # Instructions)
  • File size limits
  • Encoding checks
  • Forbidden pattern detection

Behavior Contract

Asserts conditions on execution results:

  • tool_called / tool_not_called
  • turn_count (min/max)
  • output_contains / output_matches
  • file_exists
  • cost_below / duration_below

Authentication

FluxLoop uses OAuth device code flow for interactive login:

fluxloop auth login              # Opens browser for approval
fluxloop auth login --no-browser # Manual code entry
fluxloop auth login --no-wait    # Save pending, resume later
fluxloop auth login --resume     # Resume pending login

Tokens are stored in ~/.fluxloop/auth.json. For CI environments, use FLUXLOOP_API_KEY instead.

Environment Variables

Variable Purpose
FLUXLOOP_API_URL Backend API base URL
FLUXLOOP_API_KEY API key for authenticated requests
FLUXLOOP_SYNC_API_KEY API key specifically for sync operations
ANTHROPIC_API_KEY Required for multi-turn UserSimulator
OPENAI_API_KEY Alternative provider for UserSimulator

Workspace-level variables can also be set in .fluxloop/.env.

Output

Test runs produce standardized output in .fluxloop/results/<experiment>-<timestamp>/:

File Content
trace_summary.jsonl Per-run execution traces (tool calls, tokens, cost)
summary.json Aggregated statistics (success rate, duration, cost)
errors.json Failure inventory with diagnostics

Developing

cd cli/python

# Install dependencies
uv sync --group dev

# Run in development mode
uv run fluxloop --help

# Run tests
uv run pytest

# Lint
uv run ruff check .

Building & Publishing

# Build
uv build

# Publish to PyPI
uv publish
# or
twine upload dist/*

Tech Stack

Library Purpose
Typer CLI framework
Pydantic Data validation
ruamel.yaml YAML parsing
httpx HTTP client
Rich Terminal output formatting
claude-agent-sdk Skill execution in Claude sandbox

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fluxloop_cli-0.4.0.tar.gz (153.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fluxloop_cli-0.4.0-py3-none-any.whl (86.0 kB view details)

Uploaded Python 3

File details

Details for the file fluxloop_cli-0.4.0.tar.gz.

File metadata

  • Download URL: fluxloop_cli-0.4.0.tar.gz
  • Upload date:
  • Size: 153.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fluxloop_cli-0.4.0.tar.gz
Algorithm Hash digest
SHA256 6c0af643051d39e88614e874895290410157bbea7ecac1edfd114e61db66afb9
MD5 27fc82b4b766b245c19caae5b6e97bc8
BLAKE2b-256 fab79bf0df6d07d4cf348fd70ac3aeb209a3b64fad6802cb59e63b02f85ebb2f

See more details on using hashes here.

File details

Details for the file fluxloop_cli-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: fluxloop_cli-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 86.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fluxloop_cli-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aea1d56f80f66233a04f0a125ed14ff0613430149dc85eb00567565305d9c2bd
MD5 3ac5e2e8a15adf6ec8b0c01e8499145a
BLAKE2b-256 83f6ef0e4b4d7bca731eee9f88abbc17855cb51c4ba9e5f721ee1346679f3ca3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page