Skip to main content

Unified CLI for the Agent Quality Toolkit (agentmd, coderace, agentlint, agentreflect)

Project description

agentkit-cli

Unified CLI for the Agent Quality Toolkit (agentmd, coderace, agentlint, agentreflect).

Installation

pip install agentkit-cli

Quick Start

pip install agentkit-cli
agentkit quickstart    # ๐Ÿš€ fastest path to a score โ€” start here

agentkit quickstart checks your toolchain, runs a fast composite score (agentlint + agentmd), prints a beautiful Rich summary, and optionally publishes a shareable score card โ€” all in under 60 seconds.

agentkit run           # run the full pipeline
agentkit score         # compute composite score
agentkit gate          # fail if score < threshold
agentkit org github:vercel   # score every public repo in a GitHub org

Configuration

agentkit uses .agentkit.toml for project-level configuration.

agentkit config init       # create .agentkit.toml with defaults
agentkit config show       # show effective config with sources
agentkit config set gate.min_score 80
agentkit config get gate.min_score

Config Precedence

CLI flags > env vars > project .agentkit.toml > user config > defaults

Profiles

Profiles are named presets for gate thresholds, notify config, and sweep targets. Switch your entire quality policy in one command.

Built-in Presets

Profile Min Score Max Drop Notify On Gate
strict 85 3 fail enabled
balanced 70 10 never enabled
minimal 50 20 never disabled

Usage

# Switch to strict quality standards
agentkit profile use strict

# List all profiles (built-in + user-defined)
agentkit profile list

# Show profile details
agentkit profile show strict

# Run gate with a specific profile
agentkit gate --profile strict

# Create a custom profile based on strict
agentkit profile create myprofile --from strict --min-score 90

# Export a profile as JSON or TOML
agentkit profile export strict --format json

Using Profiles with Commands

All major commands support --profile:

agentkit gate --profile strict
agentkit run --profile balanced
agentkit sweep --profile minimal owner/repo1 owner/repo2
agentkit score --profile balanced
agentkit analyze --profile strict github:owner/repo

Explicit CLI flags always override profile values:

# Uses strict profile but overrides min-score to 99
agentkit gate --profile strict --min-score 99

Commands

  • agentkit quickstart โ€” ๐Ÿš€ fastest path to a score (start here)
  • agentkit run โ€” run the full pipeline
  • agentkit score โ€” compute composite score
  • agentkit gate โ€” fail if score < threshold
  • agentkit redteam [PATH] โ€” adversarial eval: score how well your agent context resists attacks
  • agentkit analyze <target> โ€” analyze any GitHub repo
  • agentkit sweep <targets> โ€” batch analyze multiple repos
  • agentkit duel <repo1> <repo2> โ€” head-to-head agent-readiness comparison
  • agentkit tournament <repo1> ... <repoN> โ€” round-robin bracket across 4-16 repos
  • agentkit profile <sub> โ€” manage quality profiles
  • agentkit config <sub> โ€” manage configuration
  • agentkit history โ€” show score history
  • agentkit timeline โ€” visual quality timeline (HTML chart from history DB)
  • agentkit leaderboard โ€” compare runs by label
  • agentkit insights โ€” cross-repo pattern synthesis
  • agentkit trending โ€” fetch and rank trending GitHub repos by agent quality
  • agentkit org <owner> โ€” score every public repo in a GitHub org or user account
  • agentkit pr github:<owner>/<repo> โ€” submit a CLAUDE.md PR to any public GitHub repo
  • agentkit campaign <target> โ€” batch PR submission to multiple repos in one command

Campaign: Batch PR Submission

agentkit campaign finds repos missing CLAUDE.md and submits PRs to all of them in one command.

# Submit CLAUDE.md PRs to all public repos in an org (up to 5, default)
agentkit campaign github:pallets

# Discover repos without submitting PRs (dry run)
agentkit campaign github:pallets --dry-run --limit 10

# Target by topic
agentkit campaign topic:ai-agents --language python --min-stars 500

# Use a file of repos
agentkit campaign repos-file:my-targets.txt

# Only discover repos (no PRs)
agentkit campaign github:pallets --skip-pr

# Generate and share an HTML report
agentkit campaign github:pallets --share

Example output:

Campaign ID: abc12345
Target: github:pallets  Limit: 5  File: CLAUDE.md

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Repo       โ”‚ Stars  โ”‚ Status โ”‚ PR URL / Note                  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ flask      โ”‚ โ˜… 68k  โ”‚ โœ… PR  โ”‚ https://github.com/.../pull/42 โ”‚
โ”‚ click      โ”‚ โ˜… 15k  โ”‚ โœ… PR  โ”‚ https://github.com/.../pull/7  โ”‚
โ”‚ jinja      โ”‚ โ˜… 10k  โ”‚ โญ skipโ”‚ Already has context file       โ”‚
โ”‚ werkzeug   โ”‚ โ˜… 7k   โ”‚ โœ… PR  โ”‚ https://github.com/.../pull/12 โ”‚
โ”‚ markupsafe โ”‚ โ˜… 600  โ”‚ โŒ err โ”‚ Fork creation failed           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Campaign complete. 3 PRs opened, 1 skipped, 1 failed.

Options:

  • --limit N โ€” max repos to target (default: 5)
  • --language TEXT โ€” filter by language (e.g. python, typescript)
  • --min-stars N โ€” minimum stars threshold (default: 100)
  • --file TEXT โ€” context file name (default: CLAUDE.md)
  • --force โ€” submit PR even if context file exists
  • --dry-run โ€” show what would happen, no PRs opened
  • --json โ€” output CampaignResult as JSON
  • --no-filter โ€” skip the "already has context file" check
  • --skip-pr โ€” only discover repos, don't submit PRs
  • --share โ€” upload HTML report to here.now

agentkit track โ€” Monitor Campaign PR Outcomes

After running agentkit campaign, use agentkit track to see which PRs got merged, closed, or are still open.

# Show last 20 tracked PRs
agentkit track

# Filter to a specific campaign
agentkit track --campaign-id abc12345

# Show all PRs (no limit)
agentkit track --all

# JSON output for CI/automation
agentkit track --json

# Upload a shareable HTML status report
agentkit track --share

Example output:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Repo         โ”‚ PR # โ”‚ Status โ”‚ Days Open โ”‚ Reviews โ”‚ Submitted  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ pallets/flaskโ”‚ 6001 โ”‚ merged โ”‚ 3         โ”‚ 2       โ”‚ 2026-03-14 โ”‚
โ”‚ encode/httpx โ”‚ 892  โ”‚ open   โ”‚ 1         โ”‚ 0       โ”‚ 2026-03-16 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
2 merged, 1 open, 0 closed

Options:

  • --campaign-id TEXT โ€” filter to a specific campaign
  • --limit N โ€” max PRs to show (default: 20)
  • --all โ€” show all tracked PRs (no limit)
  • --json โ€” output structured JSON
  • --share โ€” upload dark-theme HTML report to here.now

View campaign history with:

agentkit history --campaigns
agentkit history --campaign-id <id>

Org Analysis

agentkit org answers: "Which repos in this GitHub org are most AI-agent-ready?"

# Score all public repos in an org or user account
agentkit org github:vercel

# Include forked and archived repos, cap at 20
agentkit org github:microsoft --include-forks --include-archived --limit 20

# Parallel analysis with 5 workers, save HTML report
agentkit org github:anthropics --parallel 5 --output report.html

# Share report online
agentkit org github:openai --share

# JSON output for scripting
agentkit org github:tiangolo --json

# Use GitHub token to avoid rate limits
agentkit org github:google --token ghp_xxx

# Auto-generate CLAUDE.md for repos below 80 and show before/after score lift
agentkit org github:pallets --generate

# Only generate for repos scoring below 60, share an HTML before/after report
agentkit org github:pallets --generate --generate-only-below 60 --share

--generate flag

--generate turns the audit from read-only to actionable: for every repo below the threshold (default: 80), it clones the repo locally, runs agentmd generate to create a CLAUDE.md, re-scores the repo, and shows the before/after lift.

Before: pallets/flask  28.6/F
After:  pallets/flask  91.4/A  (+62.8 pts)

All generation is done in temporary local clones โ€” no remote writes to GitHub.

Options:

  • --generate-only-below N โ€” only generate for repos scoring below N (default: 80)
  • --share with --generate โ€” HTML report shows Before / After columns with color-coded delta badges

Trending Analysis

agentkit trending answers: "Which repos blowing up on GitHub are most AI-agent-ready today?"

# Rank this week's trending AI repos (default)
agentkit trending

# Fast mode: list repos without scoring
agentkit trending --no-analyze

# Filter by topic, publish a shareable report
agentkit trending --topic ai-agent --share

# Weekly trending, top 15, min 500 stars, JSON output
agentkit trending --period week --limit 15 --min-stars 500 --json

# Use a GitHub token for higher rate limits
agentkit trending --token ghp_xxx

Output: a ranked Rich table (Rank | Repo | Stars | Score | Grade | URL) and optionally a dark-theme HTML report published to here.now.

Tournament

agentkit tournament runs a round-robin bracket across 4-16 repos and ranks them by win/loss record with avg score tiebreak.

# Run a 4-repo tournament
agentkit tournament github:fastapi/fastapi github:tiangolo/starlette github:django/django github:pallets/flask

# Publish a shareable HTML bracket report
agentkit tournament github:fastapi/fastapi github:tiangolo/starlette github:django/django github:pallets/flask --share

# JSON output for CI/scripting
agentkit tournament github:fastapi/fastapi github:tiangolo/starlette github:django/django github:pallets/flask --json

# Sequential (no parallel), quiet mode, save HTML
agentkit tournament github:fastapi/fastapi github:tiangolo/starlette github:django/django github:pallets/flask \
  --no-parallel --quiet --output bracket.html

Output: standings table (Rank | Repo | W-L | Avg Score | Grade), match results matrix, and winner banner. Use --share to publish a dark-theme HTML bracket to here.now.

Portfolio Insights

Once you've analyzed multiple repos with agentkit analyze or agentkit run, the agentkit insights command synthesizes patterns across all historical runs:

# Portfolio health summary (avg score, best/worst repo, top issue)
agentkit insights

# Most common agentlint findings across all repos
agentkit insights --common-findings

# Repos scoring in the bottom quartile
agentkit insights --outliers

# Repos with significant score movement between runs
agentkit insights --trending

# All sections in one view
agentkit insights --all

# Machine-readable JSON (useful for scripts/dashboards)
agentkit insights --json

# Use a specific history DB
agentkit insights --db /path/to/history.db

Store agentlint findings alongside scores for richer cross-repo analysis:

agentkit run --record-findings
agentkit analyze github:owner/repo --record-findings

JSON output schema:

{
  "portfolio_summary": {
    "avg_score": 74.5,
    "total_runs": 12,
    "unique_repos": 4,
    "top_issue": "missing-tools-section",
    "best_repo": "owner/repo-a",
    "worst_repo": "owner/repo-d"
  },
  "common_findings": [
    {"finding": "missing-tools-section", "repo_count": 3, "total_occurrences": 5}
  ],
  "outliers": [
    {"project": "owner/repo-d", "latest_score": 42.0, "avg_score": 48.5, "run_count": 2}
  ],
  "trending": [
    {"project": "owner/repo-b", "previous_score": 55.0, "latest_score": 80.0, "delta": 25.0, "direction": "up"}
  ]
}

Sharing Results

Share your agent quality score card with a single command:

# Generate and upload a score card to here.now
agentkit share

# Share from a saved JSON report
agentkit share --report agentkit-report.json

# Hide raw numbers (show pass/fail only)
agentkit share --no-scores

# Output JSON with URL and score
agentkit share --json

# Auto-share after a run
agentkit run --share

# Auto-share after generating a report
agentkit report --share

# Quickest way to get a score + share URL for any repo
agentkit quickstart github:owner/repo

# Full analyze with share (more detail, slower)
agentkit analyze github:owner/repo --share

# Batch analyze repos and share a combined scorecard
agentkit sweep github:owner/repo1 github:owner/repo2 --share

Score cards are standalone HTML pages (dark theme) showing: composite score, per-tool breakdown, project name, git ref, and timestamp. Anonymous cards expire in 24h; set HERENOW_API_KEY for persistent links.

GitHub Actions

Use the agentkit GitHub Action to run quality checks on every PR:

- uses: mikiships/agentkit-cli@v0.7.0
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    min-score: 70

Or install and run directly:

- uses: actions/checkout@v4
- run: pip install agentkit-cli
- run: agentkit gate --profile strict

See agentkit setup-ci for automated workflow generation.

Local Dashboard

agentkit serve starts a lightweight local web dashboard showing all toolkit runs from the history database:

agentkit serve [OPTIONS]

Options:
  --port PORT    Port to serve on (default: 7890)
  --open         Auto-open the dashboard in your browser on start
  --once         Render dashboard HTML to stdout and exit (no server)
  --json         Print server URL as JSON and exit (useful for scripts)

The dashboard shows a dark-theme summary of every project run: latest score, grade (Aโ€“F), per-tool breakdown, timestamp, and run ID. Scores are color-coded green (โ‰ฅ80), yellow (โ‰ฅ60), and red (<60). The page auto-refreshes every 30 seconds.

Quick start:

agentkit serve --open           # start server + open browser
agentkit run --serve            # run pipeline, then print dashboard URL
agentkit serve --once > out.html  # render to file

No external dependencies โ€” uses Python stdlib only (http.server, threading, webbrowser).

Live Dashboard

Run once and watch scores update in real-time:

# Combined: watch files + serve dashboard (updates without reload)
agentkit watch --serve --port 7890

# Or start server in live mode (polls for external writes):
agentkit serve --live

The dashboard connects via SSE (/events) and re-renders the runs table in-place when new pipeline results arrive. A โ— Live indicator shows connection status; it drops to โ—‹ Offline if the server stops.

agentkit pr โ€” Submit CLAUDE.md PRs to Open Source Repos

agentkit pr is a viral distribution mechanic: one command generates a CLAUDE.md for any public GitHub repo and opens a PR against it.

# Submit a CLAUDE.md PR to a public repo
agentkit pr github:owner/repo

# Preview what would happen (no git or API calls)
agentkit pr github:owner/repo --dry-run

# Generate AGENTS.md instead
agentkit pr github:owner/repo --file AGENTS.md

# Force overwrite if CLAUDE.md already exists
agentkit pr github:owner/repo --force

# JSON output
agentkit pr github:owner/repo --json

Requires: GITHUB_TOKEN environment variable with repo and workflow scopes.

export GITHUB_TOKEN=ghp_...
agentkit pr github:vercel/next.js

What it does:

  1. Clones the repo (shallow, depth 1)
  2. Runs agentmd generate . to create CLAUDE.md
  3. Forks the repo under your authenticated GitHub account (if needed)
  4. Creates a branch agentkit/add-claude-md
  5. Commits and pushes the generated file
  6. Opens a PR against the original repo

Release Check

agentkit release-check verifies the 4-part release surface to confirm a package is truly shipped, not just locally complete:

agentkit release-check [PATH] [OPTIONS]

Options:
  --version VERSION   Version to verify (default: from pyproject.toml/package.json)
  --package NAME      Package name (default: from pyproject.toml/package.json)
  --registry          pypi|npm|auto (default: auto-detected)
  --skip-tests        Skip the pytest/npm test step for quick checks
  --json              Output structured JSON for CI integration

Example output:

agentkit release-check โ€” /your/project

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Check      โ”‚ Status โ”‚ Detail                          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ tests      โ”‚ โœ“ PASS โ”‚ 42 passed in 1.23s              โ”‚
โ”‚ git_push   โ”‚ โœ“ PASS โ”‚ Local HEAD abc12345 matches rem โ”‚
โ”‚ git_tag    โ”‚ โœ“ PASS โ”‚ Tag v1.0.0 found on remote.     โ”‚
โ”‚ registry   โ”‚ โœ“ PASS โ”‚ PyPI: mypkg==1.0.0 is live.    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Verdict: SHIPPED

Verdict levels:

  • SHIPPED โ€” all 4 surfaces confirmed (exit 0)
  • RELEASE-READY โ€” tests + git confirmed, registry not yet live (exit 1)
  • BUILT โ€” tests pass locally, not yet pushed (exit 1)
  • UNKNOWN โ€” tests failing (exit 1)

Integrate with agentkit gate --release-check or agentkit run --release-check to add release verification to your pipeline.

Architecture

All quartet tool invocations (agentmd, agentlint, coderace, agentreflect) go through ToolAdapter in agentkit_cli/tools.py. This ensures canonical correct flags are used everywhere and flag-wiring bugs cannot recur across subcommands.

Run pytest -m smoke before any release to catch integration regressions.

agentkit certify

Generate a dated, shareable certification report proving a repo passed all agentkit quality checks.

# Run cert on current directory
agentkit certify .

# Output JSON cert (for CI integration)
agentkit certify . --json

# Write HTML cert card to file
agentkit certify . --output cert.html

# Share HTML report via here.now (requires HERENOW_API_KEY)
agentkit certify . --output cert.html --share

# Fail exit if composite score < 80
agentkit certify . --min-score 80

# Inject/update cert badge in README.md
agentkit certify . --badge

# Preview badge change without writing
agentkit certify . --badge --dry-run

The cert report includes:

  • cert_id: 8-char hex fingerprint (prefix of SHA256)
  • timestamp: UTC ISO 8601
  • verdict: PASS / WARN / FAIL
  • Composite Score (agentkit score) โ€” PASS โ‰ฅ 80
  • Redteam Resistance (agentkit redteam) โ€” PASS โ‰ฅ 70
  • Context Freshness (agentlint check-context) โ€” PASS โ‰ฅ 70
  • Tests Found (agentkit doctor)
  • SHA256 content hash for tamper detection

License

MIT

Timeline

agentkit timeline generates a dark-theme HTML chart showing your composite score progression over time. Reads from the existing SQLite history DB populated by agentkit run.

# Generate timeline for all projects
agentkit timeline

# Filter to one project
agentkit timeline --project my-agent

# Show only the last 20 runs since a date
agentkit timeline --limit 20 --since 2026-01-01

# Output raw chart data as JSON
agentkit timeline --json

# Publish and share
agentkit timeline --share

# Auto-generate timeline after a run
agentkit run --timeline

The report includes:

  • Main chart: line chart (x = date, y = composite score), one line per project
  • Per-tool breakdown: CSS-bar sparklines for lint score, code quality, context freshness, test count
  • Stats panel: min/max/avg, trend direction (โ†‘โ†“โ†’), streak badge (e.g. "12 runs above 80")
  • Project summary table: run count, latest score, trend per project

Red-Team Your Agent Setup

agentkit redteam scores how well your agent context file (CLAUDE.md / AGENTS.md) resists adversarial attacks. Static analysis only โ€” no LLM required. Truly model-agnostic.

# Analyze current directory
agentkit redteam

# Analyze a specific project
agentkit redteam ./my-agent-project

# CI gate: fail if resistance score < 70
agentkit redteam --min-score 70

# JSON output for programmatic use
agentkit redteam --json

# Save HTML report
agentkit redteam --output redteam-report.html

# Share HTML report via here.now
agentkit redteam --share

Categories checked:

  • prompt_injection โ€” attempts to inject instructions via user input
  • jailbreak โ€” persona and restriction bypass attempts
  • context_confusion โ€” fake context and history injection
  • instruction_override โ€” priority and mode override attempts
  • data_extraction โ€” system prompt and credential extraction
  • role_escalation โ€” privilege and authority escalation

CI integration:

- name: Red-team agent config
  run: agentkit redteam --min-score 70

Exit code 1 if --min-score threshold is not met. Combine with agentkit run --redteam to add adversarial eval to your full pipeline.

Distribution angle: After OpenAI's $86M acquisition of Promptfoo, teams using non-OpenAI models need a neutral red-team tool. Static analysis = no model dependency = truly model-agnostic.

Auto-Harden Your Agent Context

agentkit harden is the detectโ†’fix loop closed in one command. Run it after agentkit redteam to auto-patch all detected vulnerabilities.

# Analyze and auto-remediate CLAUDE.md / AGENTS.md in cwd
agentkit harden

# Harden a specific file or directory
agentkit harden ./my-agent-project

# Preview what would change without writing
agentkit harden --dry-run

# Write hardened file to a different path
agentkit harden --output hardened-CLAUDE.md

# JSON output for CI integration
agentkit harden --json

# Generate dark-theme HTML score-card report
agentkit harden --report

# Apply fix flag in redteam command
agentkit redteam --fix

# Auto-apply with dry-run preview
agentkit redteam --fix --dry-run

# Run harden after full pipeline
agentkit run --harden

What agentkit harden does:

  1. Detects all 6 vulnerability categories (prompt injection, jailbreak, context confusion, instruction override, data extraction, role escalation)
  2. Applies targeted, idempotent remediations (never duplicates existing sections)
  3. Creates a backup (.bak) before modifying files
  4. Re-scores the hardened file and shows a before/after table

Idempotent: Running it multiple times on an already-hardened file makes no additional changes.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentkit_cli-0.44.0.tar.gz (395.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentkit_cli-0.44.0-py3-none-any.whl (225.4 kB view details)

Uploaded Python 3

File details

Details for the file agentkit_cli-0.44.0.tar.gz.

File metadata

  • Download URL: agentkit_cli-0.44.0.tar.gz
  • Upload date:
  • Size: 395.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for agentkit_cli-0.44.0.tar.gz
Algorithm Hash digest
SHA256 8a7e519af348ad4201931d50906d6a12b1781d40872975376488c84603ffe250
MD5 fffa00b82d65a21c6b77e5f75f8d8b47
BLAKE2b-256 df2892610898c0da342ac4c8659c821bce2c2295cd154347e561c998f927cf4a

See more details on using hashes here.

File details

Details for the file agentkit_cli-0.44.0-py3-none-any.whl.

File metadata

  • Download URL: agentkit_cli-0.44.0-py3-none-any.whl
  • Upload date:
  • Size: 225.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for agentkit_cli-0.44.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5317203a4027c2fb5e3a5714b6311148e714e344359043b2dcc7c3d6992f73cb
MD5 f3d1b3da25352b0d29fd672a7928ab6f
BLAKE2b-256 412cf945a09cb0fb614c5bb754d5fd1a0dff5e8a9ca91427202eb24bc8e550fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page