CLI for running and managing Connexity agent evaluations
Project description
connexity-cli
Command-line client for Connexity — drive eval runs, manage agents and test cases, and gate CI on regressions, all from the terminal.
connexity-cli is a thin wrapper over the Connexity REST API. It covers the public surface used to drive eval workflows from CI: auth, agents, eval configs, test cases, runs (with SSE streaming), custom metrics, prompt editor, integrations, environments (including deploy + deployment history), calls, config, and health. Account self-service (signup, password reset) stays in the web UI.
Installation
pip install connexity-cli
The wheel pulls in only click, httpx, and httpx-sse — no FastAPI, no SQLModel, no LLM SDKs.
Authentication
The CLI authenticates against a Connexity API server using a Bearer JWT.
| Source | When used |
|---|---|
--token / --api-url flags |
Highest precedence — explicit per-invocation |
CONNEXITY_CLI_API_TOKEN / CONNEXITY_CLI_API_URL env vars |
Typical CI usage |
~/.config/connexity-cli/credentials.json (mode 0600) |
Written by connexity-cli login --save |
# One-time interactive login (writes credentials file)
connexity-cli login --email me@example.com --save
# Or set env vars in CI
export CONNEXITY_CLI_API_URL=https://evals.example.com
export CONNEXITY_CLI_API_TOKEN="$CI_EVALS_TOKEN"
Quick start
# Inspect resources
connexity-cli agents list
connexity-cli eval-configs list
connexity-cli test-cases list --tag smoke
# Author resources from JSON files (use "-" to read stdin)
connexity-cli agents create --from-file ./agent.json
connexity-cli eval-configs members replace <eval-config-id> --from-file ./members.json
# End-to-end: trigger a run, wait for completion, mark as baseline
connexity-cli run \
--agent my-agent \
--eval-config smoke-suite \
--stream \
--set-baseline
# CI gate: trigger a run AND fail if it doesn't clear the eval-config thresholds
# (exits 1 when metrics_passed=false or cases_passed=false; --no-fail-on-thresholds opts out)
connexity-cli run \
--agent my-agent \
--eval-config smoke-suite \
--metrics-pass-threshold 80 \
--cases-pass-threshold 100
# CI gate: regression check against the baseline (exits 1 on regression
# OR when the candidate fails its own CS-127 thresholds)
connexity-cli compare --candidate <run-id> --against-baseline
# Deploy a pre-validated agent version to Retell via a configured environment
# (eval-gated environments reject the deploy when thresholds fail)
connexity-cli environments deploy <env-id> --agent-version 7
# Stream agent execution events live
connexity-cli runs stream <run-id>
# AI-assisted prompt editing — SSE events go to stderr, final assistant
# message + edited_prompt to stdout (drops to non-streaming when piping)
connexity-cli prompt-editor chat <session-id> --message "tighten the refusal prose"
# JSON output for piping into jq
connexity-cli --output json agents list | jq '.data[].name'
Authoring patterns
Every command that creates or updates a resource takes a single --from-file PATH (or --from-file - for stdin) with a JSON body that matches the backend Pydantic schema (e.g. AgentCreate, RunCreate, EvalConfigCreate, CustomMetricCreate). The CLI does no schema duplication — the server validates and returns clear errors.
# Create an agent from a file
echo '{"name": "support-bot", "endpoint_url": "https://my-agent.example/api"}' \
| connexity-cli agents create --from-file -
# Patch an eval config
connexity-cli eval-configs update smoke-suite --from-file ./patch.json
# Run with a full RunConfig (judge_config, simulator_config, metrics_selection, ...)
connexity-cli runs create --from-file ./run.json --auto-execute
Pass/fail thresholds (CS-127)
Every run carries two run-level pass/fail dimensions, snapshotted from the eval config and overridable per run:
| Threshold | Meaning | Default |
|---|---|---|
metrics_pass_threshold |
Weighted average of the judge overall_score across cases that produced a verdict (0-100) |
80 |
cases_pass_threshold |
Fraction of cases that pass / total executions, errored cases counting as not-passed (0-100) | 100 |
connexity-cli run and connexity-cli compare gate their exit code on these by default. Override per invocation:
connexity-cli run \
--agent my-agent \
--eval-config smoke-suite \
--metrics-pass-threshold 75 \
--cases-pass-threshold 95
Pass --no-fail-on-thresholds to print the verdict but exit 0 regardless. Full formula and rationale: docs/scoring-and-thresholds.md.
Output formats
Two formats are supported, switchable per-command via --output or globally via --output on the root group:
table(default) — human-readable tables with auto-detected column widthsjson— pretty-printed JSON, friendly tojq/gron/ scripting
Command tree
Each top-level group mirrors a backend router:
| Group | Purpose |
|---|---|
login / logout / whoami |
Auth & session |
agents |
CRUD, draft/publish/rollback, versions, version diff, guidelines |
eval-configs |
CRUD, member (test-case) management |
test-cases |
CRUD, bulk import/export, generate, AI editor |
test-case-results |
Per-test-case run result CRUD |
runs |
CRUD, execute, cancel, stream (SSE), baselines, compare, suggestions |
custom-metrics |
CRUD plus LLM-backed metric preview generation |
prompt-editor |
Sessions, messages, presets, streaming chat |
integrations |
Third-party providers (Retell), connection test, list provider-side agents |
environments |
Bindings + deploy, retell-versions, deployments list (history) |
calls |
Observed external calls (Retell), refresh / mark-seen |
config |
Read-only API metadata, available metrics, LLM models |
health |
Server health probe |
run / compare / baseline |
Top-level convenience wrappers for common one-shot CI workflows |
Run connexity-cli <group> --help (or connexity-cli <group> <subcommand> --help) to see flags and arguments.
Subcommand reference (selected)
Not exhaustive — run --help for the full set. These are the commands you'll reach for in CI and day-to-day work:
agents list | show <ref> | create | update <id> | delete <id>
versions list <ref> | versions show <ref> <n> | versions diff <ref> <a> <b>
draft get <ref> | draft set <ref> | draft discard <ref>
publish <ref> | rollback <ref> --to-version <n>
guidelines get <ref> | guidelines update <ref>
runs list | show <id> | create | update <id> | delete <id>
execute <id> | cancel <id> | stream <id>
baseline get --agent <ref> --eval-config <ref> | baseline set <id>
compare --baseline <id> --candidate <id> [--include-analysis] [--fail-on-thresholds]
compare-suggestions --baseline <id> --candidate <id>
environments list --agent <ref> | create | delete <id>
deploy <env-id> --agent-version <n>
retell-versions <env-id>
deployments list (--agent <ref> | --env-id <id>)
prompt-editor sessions list | sessions show <id> | sessions create | sessions delete <id>
messages list <session-id> | chat <session-id> --message "..."
presets list
custom-metrics list | show <id> | create | update <id> | delete <id> | preview | generate
test-cases list | show <id> | create | update <id> | delete <id>
import <file> [--overwrite] | export | generate | ai create
# Top-level convenience wrappers for the most common CI flows:
run --agent <ref> --eval-config <ref> [--metrics-pass-threshold N] [--cases-pass-threshold N] [--stream] [--set-baseline]
compare --candidate <id> (--baseline <id> | --against-baseline)
baseline get | set <id>
Exit codes
0— success1— operation completed but indicates failure: run failed / cancelled, regression detected, candidate failed its CS-127 thresholds (default-on, opt out with--no-fail-on-thresholds), deploy returnedstatus=failed, orimportreturned errors2— argument / configuration error, timeout, network failure
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file connexity_cli-0.2.0.tar.gz.
File metadata
- Download URL: connexity_cli-0.2.0.tar.gz
- Upload date:
- Size: 34.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f374d1abb55dd210dc9cbca3d95ddde5abedd29a52c457e9c9fd70a3938022c
|
|
| MD5 |
d17750321631f0749b433da65b6fb13b
|
|
| BLAKE2b-256 |
ff04322c40516cabac46cd5d5d9fb6c140550d635427045d060e47bfceb4731f
|
Provenance
The following attestation bundles were made for connexity_cli-0.2.0.tar.gz:
Publisher:
publish-pypi.yml on Connexity-AI/connexity
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
connexity_cli-0.2.0.tar.gz -
Subject digest:
3f374d1abb55dd210dc9cbca3d95ddde5abedd29a52c457e9c9fd70a3938022c - Sigstore transparency entry: 1449728786
- Sigstore integration time:
-
Permalink:
Connexity-AI/connexity@1314d246a0a13306e20d4922282832b5212b33ee -
Branch / Tag:
refs/tags/cli-v0.2.0 - Owner: https://github.com/Connexity-AI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@1314d246a0a13306e20d4922282832b5212b33ee -
Trigger Event:
push
-
Statement type:
File details
Details for the file connexity_cli-0.2.0-py3-none-any.whl.
File metadata
- Download URL: connexity_cli-0.2.0-py3-none-any.whl
- Upload date:
- Size: 52.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a0829c9d49cbd3453ff9783d109c6f7fd1aeb5eff4959fc6c57078e7921d2f1
|
|
| MD5 |
680b5288112060d7812c0cf873522c6b
|
|
| BLAKE2b-256 |
9969c2c171632408ee8a5785d07b6e2800ca626981815d287f5330a28a1b4699
|
Provenance
The following attestation bundles were made for connexity_cli-0.2.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on Connexity-AI/connexity
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
connexity_cli-0.2.0-py3-none-any.whl -
Subject digest:
5a0829c9d49cbd3453ff9783d109c6f7fd1aeb5eff4959fc6c57078e7921d2f1 - Sigstore transparency entry: 1449728800
- Sigstore integration time:
-
Permalink:
Connexity-AI/connexity@1314d246a0a13306e20d4922282832b5212b33ee -
Branch / Tag:
refs/tags/cli-v0.2.0 - Owner: https://github.com/Connexity-AI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@1314d246a0a13306e20d4922282832b5212b33ee -
Trigger Event:
push
-
Statement type: