FluxLoop CLI — Agent evaluation framework
Project description
FluxLoop CLI
Command-line interface for the FluxLoop agent evaluation framework.
Installation
pip install fluxloop-cli
Quick Start
# Authenticate
fluxloop auth login
# Create a project and scenario
fluxloop projects create --name my-project
fluxloop init scenario smoke-test
fluxloop scenarios create --name smoke-test --goal "Validate agent accuracy"
# Run a skill test
fluxloop skill validate ./SKILL.md
fluxloop skill test ./SKILL.md --input "Generate a summary" --copy-files src/
# Full test workflow (pull inputs → run → push results)
fluxloop test --scenario smoke-test
Commands
Core
| Command | Description |
|---|---|
fluxloop run |
Run agent over configured inputs using the selected executor |
fluxloop skill validate |
Validate SKILL.md against static contracts |
fluxloop skill test |
Execute a skill in sandbox with behavior contract evaluation |
fluxloop skill benchmark |
Run N benchmark iterations and report stats |
fluxloop init scenario <name> |
Scaffold a new scenario directory |
fluxloop context show |
Display current project, scenario, and workspace state |
fluxloop test |
Full workflow: pull → run → push |
fluxloop test results |
View local or remote test results |
fluxloop evaluate |
Trigger server-side evaluation and wait for completion |
Authentication & Projects
| Command | Description |
|---|---|
fluxloop auth login |
Authenticate via device code flow |
fluxloop auth logout |
Remove stored credentials |
fluxloop auth status |
Show login state and token expiry |
fluxloop projects list |
List available projects |
fluxloop projects create |
Create a new project |
fluxloop projects select |
Set active project |
fluxloop apikeys create |
Generate an API key (saved to .fluxloop/.env) |
fluxloop apikeys list |
List existing API keys |
Scenarios & Data Pipeline
| Command | Description |
|---|---|
fluxloop scenarios create |
Create a scenario on the server |
fluxloop scenarios select |
Set active scenario locally |
fluxloop scenarios refine |
Refine scenario contracts |
fluxloop personas suggest |
Generate user personas via LLM |
fluxloop inputs synthesize |
Generate test inputs from personas |
fluxloop inputs list |
List available input sets |
fluxloop inputs qc |
Quality-check generated inputs |
fluxloop inputs refine |
Refine inputs iteratively |
fluxloop bundles publish |
Publish input sets as versioned bundles |
fluxloop bundles list |
List published bundles |
fluxloop manifests show |
Display current manifest |
fluxloop manifests publish |
Publish manifest to server |
fluxloop data push |
Upload knowledge or ground-truth data |
fluxloop data bind |
Bind uploaded data to a scenario |
fluxloop data gt status |
Check ground-truth materialization status |
fluxloop intent refine |
Refine agent profile and test intent |
Sync
| Command | Description |
|---|---|
fluxloop sync pull |
Pull bundle (inputs, personas, criteria) from server |
fluxloop sync push |
Upload test results to server |
Configuration
Scenario configuration lives in YAML files under scenarios/<name>/configs/:
scenarios/
smoke-test/
configs/
simulation.yaml # Runner, iterations, conversation settings
input.yaml # Input source and items
scenario.yaml # Scenario metadata
contracts/
static.yaml # SKILL.md structure rules
behavior.yaml # Execution assertions
pulled/ # Data from sync pull
Runner Types
Configure the executor in simulation.yaml:
Function — call a Python handler directly:
runner:
type: function
target: "my_agent:handler"
timeout_seconds: 30
Skill — run a SKILL.md in Claude Agent SDK sandbox:
runner:
type: skill
skill_path: ./SKILL.md
harness: claude
allowed_tools: ["Read", "Write", "Shell"]
skill_max_turns: 10
budget: 0.50
Process — invoke a subprocess via NDJSON protocol:
runner:
type: process
command: ["python", "agent.py"]
protocol: ndjson
Input Sources
input:
source: inline # inline | generated | bundle | pulled
items:
- text: "Hello, summarize this document"
- text: "What are the key takeaways?"
When source: pulled, inputs are loaded from pulled/inputs.json after sync pull.
Environment Variable Substitution
YAML config values support ${VAR} syntax, resolved from environment variables.
Contracts
Static Contract
Validates SKILL.md structure before execution:
- Required sections (e.g.,
# Purpose,# Instructions) - File size limits
- Encoding checks
- Forbidden pattern detection
Behavior Contract
Asserts conditions on execution results:
tool_called/tool_not_calledturn_count(min/max)output_contains/output_matchesfile_existscost_below/duration_below
Authentication
FluxLoop uses OAuth device code flow for interactive login:
fluxloop auth login # Opens browser for approval
fluxloop auth login --no-browser # Manual code entry
fluxloop auth login --no-wait # Save pending, resume later
fluxloop auth login --resume # Resume pending login
Tokens are stored in ~/.fluxloop/auth.json. For CI environments, use FLUXLOOP_API_KEY instead.
Environment Variables
| Variable | Purpose |
|---|---|
FLUXLOOP_API_URL |
Backend API base URL |
FLUXLOOP_API_KEY |
API key for authenticated requests |
FLUXLOOP_SYNC_API_KEY |
API key specifically for sync operations |
ANTHROPIC_API_KEY |
Required for multi-turn UserSimulator |
OPENAI_API_KEY |
Alternative provider for UserSimulator |
Workspace-level variables can also be set in .fluxloop/.env.
Output
Test runs produce standardized output in .fluxloop/results/<experiment>-<timestamp>/:
| File | Content |
|---|---|
trace_summary.jsonl |
Per-run execution traces (tool calls, tokens, cost) |
summary.json |
Aggregated statistics (success rate, duration, cost) |
errors.json |
Failure inventory with diagnostics |
Developing
cd cli/python
# Install dependencies
uv sync --group dev
# Run in development mode
uv run fluxloop --help
# Run tests
uv run pytest
# Lint
uv run ruff check .
Building & Publishing
# Build
uv build
# Publish to PyPI
uv publish
# or
twine upload dist/*
Tech Stack
| Library | Purpose |
|---|---|
| Typer | CLI framework |
| Pydantic | Data validation |
| ruamel.yaml | YAML parsing |
| httpx | HTTP client |
| Rich | Terminal output formatting |
| claude-agent-sdk | Skill execution in Claude sandbox |
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fluxloop_cli-0.4.0.tar.gz.
File metadata
- Download URL: fluxloop_cli-0.4.0.tar.gz
- Upload date:
- Size: 153.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c0af643051d39e88614e874895290410157bbea7ecac1edfd114e61db66afb9
|
|
| MD5 |
27fc82b4b766b245c19caae5b6e97bc8
|
|
| BLAKE2b-256 |
fab79bf0df6d07d4cf348fd70ac3aeb209a3b64fad6802cb59e63b02f85ebb2f
|
File details
Details for the file fluxloop_cli-0.4.0-py3-none-any.whl.
File metadata
- Download URL: fluxloop_cli-0.4.0-py3-none-any.whl
- Upload date:
- Size: 86.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aea1d56f80f66233a04f0a125ed14ff0613430149dc85eb00567565305d9c2bd
|
|
| MD5 |
3ac5e2e8a15adf6ec8b0c01e8499145a
|
|
| BLAKE2b-256 |
83f6ef0e4b4d7bca731eee9f88abbc17855cb51c4ba9e5f721ee1346679f3ca3
|