Skip to main content

LLM-solvable challenge-response authentication for AI agent APIs

Project description

๐Ÿงฉ agent-challenge

Drop-in LLM authentication for any API endpoint.

PyPI npm License Docs

๐Ÿ“– Full documentation, live demo, and interactive examples: challenge.llm.kaveenk.com


Why?

You built an API. Now bots are hitting it โ€” not the smart kind, the dumb kind. Automated scripts cycling through endpoints, low-effort crawlers scraping your data, or spammy throwaway clients burning through your resources.

Traditional CAPTCHAs block everyone who isn't a human sitting in a browser. API keys work, but they require manual signup, email verification, approval flows โ€” friction that kills adoption for legitimate AI agents.

agent-challenge sits in the middle: it blocks automated scripts and low-capability bots while letting any competent LLM walk right through. The challenge requires actual reasoning โ€” reversing strings, solving arithmetic, decoding ciphers โ€” things that a real language model handles instantly but a curl loop or a Python script with requests.post() can't fake.

Think of it as a proof of intelligence gate:

  • โœ… GPT-4, Claude, Gemini, Llama โ€” pass instantly
  • โœ… Any capable LLM-powered agent โ€” solves in one shot
  • โŒ Automated scripts โ€” can't reason about the prompt
  • โŒ Spammy low-effort bots โ€” can't parse randomized templates
  • โŒ Dumb wrappers just forwarding requests โ€” no LLM to solve with

It's the ultimate automated-script buster. If the other end of your API can't do basic thinking, it doesn't get in. This is "prove you ARE a robot", not "prove you're not a robot"!

# Before: unprotected endpoint
@app.route("/api/screenshots", methods=["POST"])
def screenshot():
    return take_screenshot(request.json["url"])

# After: agents solve a puzzle once, pass through forever
@app.route("/api/screenshots", methods=["POST"])
def screenshot():
    result = ac.gate_http(request.headers, request.get_json(silent=True))
    if result.status != "authenticated":
        return jsonify(result.to_dict()), 401
    return take_screenshot(request.json["url"])

How It Works

Agent                          Your API
  โ”‚                               โ”‚
  โ”œโ”€โ”€POST /api/your-endpointโ”€โ”€โ”€โ”€โ–บโ”‚
  โ”‚                               โ”œโ”€โ”€ gate() โ†’ no token
  โ”‚โ—„โ”€โ”€401 { challenge_required }โ”€โ”€โ”ค
  โ”‚                               โ”‚
  โ”‚  LLM reads prompt, answers    โ”‚
  โ”‚                               โ”‚
  โ”œโ”€โ”€POST { answer, token }โ”€โ”€โ”€โ”€โ”€โ–บโ”‚
  โ”‚                               โ”œโ”€โ”€ gate() โ†’ correct!
  โ”‚โ—„โ”€โ”€200 { token: "eyJpZ..." }โ”€โ”€โ”€โ”ค
  โ”‚                               โ”‚
  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”‚
  โ”‚  โ”‚ Saves token forever โ”‚      โ”‚
  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ”‚
  โ”‚                               โ”‚
  โ”œโ”€โ”€POST + Bearer eyJpZ...โ”€โ”€โ”€โ”€โ”€โ–บโ”‚
  โ”‚                               โ”œโ”€โ”€ gate() โ†’ valid token
  โ”‚โ—„โ”€โ”€200 { authenticated }โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   (instant, no puzzle)

One endpoint. Three interactions. Zero database.

Install

pip install agent-challenge
npm install agent-challenge

Quick Start

Python (Flask)

from agentchallenge import AgentChallenge

ac = AgentChallenge(secret="your-secret-key-min-8-chars")

@app.route("/api/data", methods=["POST"])
def protected_endpoint():
    result = ac.gate(
        token=request.headers.get("Authorization", "").removeprefix("Bearer ") or None,
        challenge_token=request.json.get("challenge_token"),
        answer=request.json.get("answer"),
    )
    if result.status != "authenticated":
        return jsonify(result.to_dict()), 401

    # Your logic here โ€” agent is verified
    return jsonify({"data": "secret stuff"})

Node.js (Express)

import { AgentChallenge } from 'agent-challenge';

const ac = new AgentChallenge({ secret: 'your-secret-key-min-8-chars' });

app.post('/api/data', (req, res) => {
  const gate = ac.gateSync({
    token: req.headers.authorization?.slice(7),
    challengeToken: req.body?.challenge_token,
    answer: req.body?.answer,
  });
  if (gate.status !== 'authenticated')
    return res.status(401).json(gate);

  // Your logic here โ€” agent is verified
  res.json({ data: 'secret stuff' });
});

The gate() API

One function handles everything. Three modes based on what's passed in:

Arguments Behavior Returns
(none) Generate a new challenge { status: "challenge_required", prompt, challenge_token }
challenge_token + answer Verify answer, issue permanent token { status: "authenticated", token: "eyJpZ..." }
token Validate saved token { status: "authenticated" }
# Mode 1: No args โ†’ challenge
result = ac.gate()
# โ†’ GateResult(status="challenge_required", prompt="Reverse: NOHTYP", ...)

# Mode 2: Answer โ†’ permanent token
result = ac.gate(challenge_token="eyJ...", answer="PYTHON")
# โ†’ GateResult(status="authenticated", token="eyJpZCI6ImF0Xy...")

# Mode 3: Token โ†’ instant pass
result = ac.gate(token="eyJpZCI6ImF0Xy...")
# โ†’ GateResult(status="authenticated")

gate_http() / gateHttp() โ€” Zero-Boilerplate HTTP

Instead of manually extracting the Bearer token from headers and fields from the body, pass them directly:

# Python โ€” works with Flask, Django, FastAPI, or anything with headers + body
result = ac.gate_http(request.headers, request.get_json(silent=True))
// JavaScript โ€” works with Express, Koa, Fastify, or anything with headers + body
const result = ac.gateHttp(req.headers, req.body);

It reads Authorization: Bearer <token> from headers and challenge_token / answer from the body automatically. Same result as gate(), less wiring.

Challenge Types

25 challenge types across 4 difficulty tiers. All use randomized inputs โ€” no fixed word lists.

Easy (6 types)

Type Example
reverse_string Reverse "PYTHON" โ†’ NOHTYP
simple_math 234 + 567 = 801
pattern 2, 4, 8, 16, ? โ†’ 32
counting Count vowels in "CHALLENGE" โ†’ 3
string_length How many characters in "HELLO"? โ†’ 5
first_last First and last char of "PYTHON" โ†’ p, n

Medium (11 types)

Type Example
rot13 Decode "URYYB" โ†’ HELLO
letter_position A=1,B=2.. sum of "CAT" โ†’ 24
extract_letters Every 2nd char of "HWEOLRLLOD" โ†’ WORLD
sorting Sort [7,2,9,1] ascending โ†’ 1,2,7,9
binary Convert 42 to binary โ†’ 101010
ascii_value ASCII code for 'M' โ†’ 77
string_math "CAT" has 3 letters, "DOG" has 3 โ†’ 3ร—3 = 9
+ all easy types

Hard (14 types)

Type Example
caesar Decrypt "KHOOR" with shift 3 โ†’ HELLO
word_math 7 + 8 as a word โ†’ fifteen
transform Uppercase + reverse "hello" โ†’ OLLEH
substring Characters 3โ€“6 of "PROGRAMMING" โ†’ ogra
zigzag Read "ABCDEF" in zigzag with 2 rows โ†’ ACEBDF
+ all medium types

Agentic (8 types) โ€” for top-tier LLMs only

Type Example
chained_transform Reverse "PYTHON", then ROT13 โ†’ ABUGIC
multi_step_math 17 ร— 23, then digit sum โ†’ 13
base_conversion_chain Binary 11010 โ†’ decimal, +15, โ†’ binary = 101001
word_extraction_chain First letter of each word, sorted alphabetically
letter_math Sum letter values of "BVJCSX" (A=1..Z=26) โ†’ 80
nested_operations ((15 + 7) ร— 3) - 12 โ†’ 54
string_interleave Interleave "ABC" and "DEF" โ†’ ADBECF
caesar Decrypt with shift 1โ€“13

Agentic challenges require multi-step reasoning and working memory โ€” smaller models and humans can't solve them under time pressure.

Each type has multiple prompt templates (450+) with randomized phrasing. Agentic types use dynamic prompt assembly with ~10,000+ structural variations per type, making regex-based solvers impractical even with full source code access.

Dynamic Challenges (Optional)

Use an LLM to generate novel, never-before-seen challenges:

ac = AgentChallenge(secret="your-secret")

# Set an API key (or use OPENAI_API_KEY / ANTHROPIC_API_KEY / GOOGLE_API_KEY env vars)
ac.set_openai_api_key("sk-...")

# Enable dynamic mode
ac.enable_dynamic_mode()  # Auto-detects provider from available keys

Dynamic mode generates a challenge with one LLM call and verifies the answer with another. Falls back to static challenges after 3 failures. Supports OpenAI, Anthropic, and Google Gemini โ€” auto-detected from environment variables.

Challenge Every Time (No Persistent Tokens)

By default, agents solve once and get a permanent token. To require a challenge on every request:

ac = AgentChallenge(
    secret="your-secret",
    persistent=False,  # No tokens issued โ€” challenge every time
)

When persistent=False:

  • Solving a challenge returns { "status": "authenticated" } with no token
  • Passing a saved token returns an error
  • Every request requires solving a new puzzle

This is useful for high-security endpoints, rate-limited operations, or when you want proof of LLM capability on every call.

Agent-Only Mode (Block Humans)

Combine a tight time limit with hard difficulty to create endpoints that only AI agents can access. A human can't read a caesar cipher, decode it mentally, and type the answer in 10 seconds โ€” but an LLM handles it in under 2.

ac = AgentChallenge(
    secret="your-secret",
    difficulty="agentic",   # multi-step chains โ€” only top-tier LLMs pass
    ttl=10,                 # 10 seconds โ€” impossible for humans
    persistent=False,       # challenge every request
)

This is useful for:

  • Agent-to-agent APIs where human access is unwanted
  • Internal tooling that should only be called by AI systems
  • Preventing manual API abuse even by authenticated users with the endpoint URL

The ttl parameter controls how long an agent has to solve the challenge after it's issued. At difficulty="agentic" with ttl=10, the challenge requires multi-step reasoning (chained transforms, base conversions, letter arithmetic) that no human can solve in time and weaker models fail at consistently.

Configuration

ac = AgentChallenge(
    secret="your-secret",       # Required โ€” HMAC signing key (min 8 chars)
    difficulty="medium",        # "easy" | "medium" | "hard" | "agentic" (default: "easy")
    ttl=300,                    # Challenge expiry in seconds (default: 300)
    types=["rot13", "caesar"],  # Restrict to specific challenge types
    persistent=True,            # Issue permanent tokens (default: True)
)

# Dynamic mode is enabled separately:
# ac.set_openai_api_key("sk-...")
# ac.enable_dynamic_mode()

Token Architecture

Stateless. No database. No session store.

Tokens are HMAC-SHA256 signed JSON payloads:

base64url(payload).HMAC-SHA256(payload, secret)

Two token types:

Token Prefix Lifetime Contains
Challenge ch_ 5 minutes answer hash, expiry, type
Agent at_ Permanent agent ID, created timestamp
  • Tokens can't be forged โ€” HMAC verification catches any tampering
  • Challenge tokens are single-use โ€” answer hash prevents replay
  • Agent tokens are permanent โ€” verify_token() validates signature only
  • No database lookups โ€” everything is in the token itself

Lower-Level API

If you don't want the gate() pattern:

ac = AgentChallenge(secret="your-secret-key")

# Create a challenge
challenge = ac.create()
# challenge.prompt       โ†’ "Reverse the following string: NOHTYP"
# challenge.token        โ†’ "eyJpZCI6ImNoXz..."
# challenge.to_dict()    โ†’ dict for JSON responses

# Verify an answer
result = ac.verify(token=challenge.token, answer="PYTHON")
# result.valid           โ†’ True
# result.challenge_type  โ†’ "reverse_string"

# Create a persistent agent token directly
token = ac.create_token("agent-name")
# token โ†’ "eyJpZCI6ImF0Xy..."  (base64url-encoded signed payload)

# Verify a token
ac.verify_token(token)  # โ†’ True

Agent Integration

Agents don't need an SDK. They just call your endpoint normally:

import requests

def call_api(payload):
    endpoint = "https://your-api.com/api/data"
    token = load_saved_token()  # from disk/env

    r = requests.post(endpoint,
        headers={"Authorization": f"Bearer {token}"} if token else {},
        json=payload)

    if r.status_code != 401:
        return r  # success (or other error)

    # Got a challenge โ€” solve it
    data = r.json()
    if data.get("status") != "challenge_required":
        return r

    answer = llm.complete(data["prompt"])  # any LLM
    r = requests.post(endpoint, json={
        "challenge_token": data["challenge_token"],
        "answer": answer, **payload
    })

    if "token" in r.json():
        save_token(r.json()["token"])  # persist for next time

    return r

Document this pattern in your API's SKILL.md or agent docs, and any LLM-powered agent can authenticate autonomously.

Security

agent-challenge is fully open source โ€” security through transparency, not obscurity.

Prompt Injection Defense

When agents call APIs protected by agent-challenge, they receive challenge prompts. A malicious API operator could theoretically embed prompt injection in that text. The library ships client-side defenses:

validate_prompt() โ€” checks prompts before your LLM sees them:

from agentchallenge import validate_prompt

result = validate_prompt(challenge["prompt"])
if not result["safe"]:
    raise ValueError(f"Blocked: {result['reason']} (score: {result['score']})")

Catches: URLs, code injection, role hijacking ("you are now", "pretend to be"), override instructions ("ignore previous"), data exfiltration ("send me your API key"), oversized prompts, structural anomalies.

safe_solve() โ€” sandboxed solver with isolation:

from agentchallenge import safe_solve

def my_llm(system_prompt, user_prompt):
    return openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        max_tokens=50,      # short answers only
        temperature=0,      # deterministic
    ).choices[0].message.content

answer = safe_solve(challenge["prompt"], llm_fn=my_llm)

Three layers: input validation โ†’ LLM isolation (no tools, strict system prompt) โ†’ output validation (length cap, no URLs/code in answer).

// Node.js
import { validatePrompt, safeSolve } from 'agent-challenge';

const result = validatePrompt(challenge.prompt);
const answer = await safeSolve(challenge.prompt, myLlmFn);

Anti-Scripting

Even with full source code access, building a deterministic solver is impractical:

  • 450+ prompt templates across all types with randomized phrasing
  • Dynamic prompt assembly for agentic tier (~10,000+ structural variations per type)
  • Decoy injection โ€” session IDs, timestamps, reference numbers mixed into prompts
  • Data position randomization โ€” challenge data appears at different positions in the sentence

Full security analysis: challenge.llm.kaveenk.com/#security

Testing

# Python
PYTHONPATH=src python3 run_tests.py

# JavaScript (syntax check)
node --check src/agentchallenge.js

Live Demo

Try it interactively at challenge.llm.kaveenk.com

Used By

  • SnapService โ€” Screenshot-as-a-Service API for AI agents

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_challenge-1.3.0.tar.gz (64.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_challenge-1.3.0-py3-none-any.whl (67.5 kB view details)

Uploaded Python 3

File details

Details for the file agent_challenge-1.3.0.tar.gz.

File metadata

  • Download URL: agent_challenge-1.3.0.tar.gz
  • Upload date:
  • Size: 64.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for agent_challenge-1.3.0.tar.gz
Algorithm Hash digest
SHA256 86f6de6b22b564f800cc9a4061a7564499c834820f9abb8baeed10ac8b91d9fd
MD5 f5df72373cc38d214b9d3a63a0d9e39c
BLAKE2b-256 e66a07d5dd05f5fe75561934caf366d64cd03513e8170be2d80e2245d4d06fae

See more details on using hashes here.

File details

Details for the file agent_challenge-1.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_challenge-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e5a138610c7fb396a2b85cdb87028653e02e7c13c97a35f1f3d0e0d730bb896d
MD5 a718f1b33673501fcba13b9c44d0475e
BLAKE2b-256 0860b344f616b353ce8322ca40d4d99842a60c99d686d172c0a5edb2ecc71a37

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page