pytest plugin that auto-generates resilience tests for LLM apps using Lark MCP and TrueFoundry AI Gateway

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

golikovichev

These details have not been verified by PyPI

Project description

pytest-resilience-agent

Auto-generated resilience tests for LLM applications. Powered by Lark MCP and TrueFoundry AI Gateway.

pytest-resilience-agent demo

Full loop demo output

Built for: DevNetwork AI + ML Hackathon 2026 Targeted sponsor challenges: Lark, TrueFoundry Stack: Python · pytest · Lark MCP · TrueFoundry AI Gateway · httpx Demo video: YouTube (3 min) · also available offline as videos/pytest-resilience-agent-demo.mp4. See videos/CREDITS.md for music attribution.

The problem

You ship an LLM feature. Your eval suite is green. Then one of these happens in production:

Your primary model browns out at 2:14am on a Saturday.
An MCP server starts returning tool errors for the one tool your agent uses most.
A rate limit kicks in halfway through a long completion.
A retry loop hides a 5-second latency spike and your users see a spinner for 30 seconds.

Existing LLM eval frameworks measure correctness on a clean path. They do not measure what happens when the infrastructure underneath cracks. The agent might still answer correctly, or it might silently fall back to a dumber model, or it might just hang.

pytest-resilience-agent closes that gap. It runs your test suite under controlled chaos (gateway timeouts, model brownouts, MCP errors, rate limits, partial outages) and asserts that the agent still meets its contract: it responds, it surfaces a clear error, it logs the fallback path. When tests fail, the plugin reports back to Lark so the failing scenario shows up next to the regular test results in the Lark UI.

How it works

flowchart LR
    A[pytest test<br/>@mark.resilience]
    B[ChaosController<br/>respx mock router]
    C[Application<br/>under test]
    D[TrueFoundry<br/>AI Gateway]
    E[Lark MCP<br/>server]

    A --> B
    A --> C
    C -->|httpx POST| D
    C -->|tool call| E
    B -.intercepts.-> D
    B -.intercepts.-> E
    A -->|resolution| E

The three pieces:

Plugin registers the resilience marker and the ai_gateway + chaos fixtures.
ChaosController owns a respx.MockRouter that intercepts every outbound httpx call to the gateway and Lark URLs. Each named scenario (timeout, 5xx, 429, mcp_error, partial_outage) installs a route handler that injects the failure mode. The agent code does not need to know it is under test.
AI Gateway client is a thin OpenAI-compatible wrapper that points at a TrueFoundry gateway. The gateway config decides the fallback chain (primary model → secondary → tertiary) and retries; we just call it.
Lark MCP client lists failing tests in the host repo (input signal) and reports back when a resilience scenario is reproduced (closes the loop, so failure → resolution traceability lives next to the test result in the Lark UI).

Quickstart

pip install pytest-resilience-agent

export TFY_GATEWAY_URL=https://your-tfy-gateway.example.com/v1
export LARK_MCP_URL=https://your-lark-instance.example.com

Write a resilience test:

import pytest

@pytest.mark.resilience(scenarios=["llm_timeout", "rate_limit"])
def test_chat_keeps_responding(ai_gateway, chaos):
    reply = ai_gateway.chat([{"role": "user", "content": "summarise the quarter"}])
    assert reply.content, "agent must respond even under chaos"
    assert any(e.scenario == "llm_timeout" for e in chaos.events), \
        "chaos must record the timeout it injected"

Run with the standard pytest invocation:

pytest -m resilience -v

Multi-turn chaos

Real agents hold a conversation, and infrastructure can degrade partway through it. Use turns= to bind a scenario set to each conversation turn and advance with chaos.next_turn(). Each turn is an independent window: counters reset on every turn, so chaos can appear and clear mid-conversation.

@pytest.mark.resilience(turns=[
    [],            # turn 1: clean
    ["llm_5xx"],   # turn 2: gateway 5xx
    [],            # turn 3: recovered
])
def test_agent_recovers_mid_conversation(ai_gateway, chaos):
    reply1 = ai_gateway.chat([{"role": "user", "content": "start a plan"}])
    assert reply1.content

    chaos.next_turn()
    reply2 = ai_gateway.chat([{"role": "user", "content": "add a step"}])
    assert reply2.content, "agent must survive the brownout on turn 2"

    chaos.next_turn()
    reply3 = ai_gateway.chat([{"role": "user", "content": "summarise"}])
    assert reply3.content

turns= and scenarios= are mutually exclusive. Each turn boundary emits a chaos.turn.N OpenTelemetry span.

Built-in chaos scenarios

Scenario	What it does
`llm_timeout`	Gateway sleeps past the request timeout
`llm_5xx`	Gateway returns 502/503 a configurable share of the time
`rate_limit`	Gateway returns 429 with `Retry-After`
`mcp_error`	Lark MCP server raises a tool error mid-conversation
`partial_outage`	First call fails, retry succeeds (verifies retry logic)
`cost_exceeded`	Gateway returns 402 quota_exceeded
`wrong_model_returned`	Gateway silently routes to an unintended model
`stream_stall`	200 with empty content (silent quality bug)
`network_blip`	ConnectError on first N calls
`malformed_json`	200 with an HTML error body instead of JSON (proxy swallowed the failure)

Live sponsor integration

Beyond the mock servers (which let judges clone and run the full loop without accounts), the plugin is wired against three real sponsor surfaces:

Lark Open Platform. LarkMCPClient(app_id, app_secret) issues a real tenant_access_token via POST /auth/v3/tenant_access_token/internal, caches it for the 7200 s lifetime, and refreshes within 60 s of expiry. Verified against cli_aa9ced2266389e15 (live app in the author's workspace).
TrueFoundry AI Gateway. Personal Access Token acquired; Custom Endpoint configured via tfy apply -f .secrets/tf-crusoe-cloud-v2.applied.yaml, registering crusoe-cloud/crusoe-llama-3.3-70b as a proxied model under the Crusoe upstream. The TF /models endpoint confirms the registration. Direct /proxy-api/*/chat/completions traffic from a server requires the Cloudflare cf_clearance JS challenge solution; the same call from a TF dashboard session passes through to the backend, so the wire-up is functional through the SDK / browser-origin paths and through the standard TF SDK.
Crusoe Cloud Intelligence. OpenAI-compatible /v1/chat/completions, verified against meta-llama/Llama-3.3-70B-Instruct. The AIGatewayClient sets a User-Agent header (Crusoe's edge requires it) and otherwise needs no changes; same code path as TF or any other OpenAI-shaped backend.

Credentials live in .env (gitignored). Smoke test:

python -X utf8 scripts/smoke_live_integrations.py

What is and is not covered

Covered

Infrastructure-level failures: gateway, model, MCP, rate limiter.
Assertions on outcome contract (must respond, must log fallback, must surface error).
Reporting to Lark so failure → resolution traceability lives in one place.

Not covered (yet)

Semantic regressions (use phoenix2pytest or DeepEval for that).
Multi-turn conversation chaos (planned for v0.2).
Distributed-system chaos (network partitions across services).

Roadmap

v0.1 (hackathon submission, May 2026): nine built-in chaos scenarios, live Lark + TrueFoundry + Crusoe integration, mock-server fallbacks, reference tests, end-to-end demo.
v0.2 (June 2026): multi-turn conversation chaos (failure injected and cleared per turn), OpenTelemetry spans for every chaos event and turn boundary.
v0.3: semantic assertion hooks, chaos scenario composition (e.g. rate_limit then partial_outage), property-based fuzzing of timing.

Why this is different

Existing LLM testing tooling falls in two buckets:

Eval frameworks (DeepEval, Opik, pytest-evals) score model output quality on a clean path.
Trace-to-test tools (phoenix2pytest, my other project) turn observed production failures into pytest cases.

Neither of those tests the infrastructure layer between your code and the model. pytest-resilience-agent is the missing third piece: prove your agent survives the chaos that production will throw at it.

Try it locally in two minutes

No accounts required. The chaos controller mocks the gateway at the HTTP layer, so you can see the full loop without spending on credits.

git clone <this repo>
cd pytest-resilience-agent
python -m pip install -e ".[demo,dev]"

Three entry points, ordered by how much of the story they show.

1. Run the test suite (17 tests, all chaos scenarios verified)

python -X utf8 -m pytest -v -m "not slow"

2. Run the sample FastAPI agent against every chaos scenario

python -X utf8 -m demo.run_demo

Prints a table of what was injected, how the agent reacted, retry count, fallback flag, and verdict.

3. Full loop: Lark failures → generated tests → run → resolution reported

python -X utf8 -m demo.run_full_loop

Starts the mock Lark MCP server in a background thread, lists three failing tests from it, generates one resilience pytest file per failure (scenarios chosen by matching the failure text), runs them in a subprocess, and reports each passing test back to Lark as a resolution. End-to-end product story in one command.

CLI

After install, the pytest-resilience-agent command is on PATH:

pytest-resilience-agent scenarios                       # list registered scenarios
pytest-resilience-agent --lark-url URL discover         # list failing tests
pytest-resilience-agent --lark-url URL generate --out . # generate resilience tests
pytest-resilience-agent run --path generated_resilience_tests/
pytest-resilience-agent --lark-url URL report --test-name X --pytest-path P

License

MIT.

Acknowledgements

Built on Lark MCP, TrueFoundry AI Gateway, and pytest. Thanks to the maintainers of all three.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

golikovichev

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jun 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_resilience_agent-0.2.0.tar.gz (19.2 MB view details)

Uploaded Jun 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytest_resilience_agent-0.2.0-py3-none-any.whl (23.3 kB view details)

Uploaded Jun 13, 2026 Python 3

File details

Details for the file pytest_resilience_agent-0.2.0.tar.gz.

File metadata

Download URL: pytest_resilience_agent-0.2.0.tar.gz
Upload date: Jun 13, 2026
Size: 19.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pytest_resilience_agent-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`76afe3d60b1e335898c9719f8d24d418d9f9fc409ca0122a374708394ead4c2a`
MD5	`84923dff5557e8b1981abcc68d996286`
BLAKE2b-256	`2a1d2c33881d0da2c157d295f1f2b723f19a464ac8248b44d58c1cae1c357d2d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_resilience_agent-0.2.0.tar.gz:

Publisher: publish.yml on golikovichev/pytest-resilience-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pytest_resilience_agent-0.2.0.tar.gz
- Subject digest: 76afe3d60b1e335898c9719f8d24d418d9f9fc409ca0122a374708394ead4c2a
- Sigstore transparency entry: 1810594153
- Sigstore integration time: Jun 13, 2026
Source repository:
- Permalink: golikovichev/pytest-resilience-agent@72cc14cfa930e8e96bd3070019867d888e04f6e1
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/golikovichev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@72cc14cfa930e8e96bd3070019867d888e04f6e1
- Trigger Event: release

File details

Details for the file pytest_resilience_agent-0.2.0-py3-none-any.whl.

File metadata

Download URL: pytest_resilience_agent-0.2.0-py3-none-any.whl
Upload date: Jun 13, 2026
Size: 23.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pytest_resilience_agent-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d779cb6464216c7f4a1af700404da694f06d9a4ae06b39a9dd6a39422423aebd`
MD5	`426c4c9befcfdce7e6c43d46cccd177d`
BLAKE2b-256	`8495eb50e0927931a14a56acf535dc850518bddcf38585481e698b040a849362`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_resilience_agent-0.2.0-py3-none-any.whl:

Publisher: publish.yml on golikovichev/pytest-resilience-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pytest_resilience_agent-0.2.0-py3-none-any.whl
- Subject digest: d779cb6464216c7f4a1af700404da694f06d9a4ae06b39a9dd6a39422423aebd
- Sigstore transparency entry: 1810594174
- Sigstore integration time: Jun 13, 2026
Source repository:
- Permalink: golikovichev/pytest-resilience-agent@72cc14cfa930e8e96bd3070019867d888e04f6e1
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/golikovichev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@72cc14cfa930e8e96bd3070019867d888e04f6e1
- Trigger Event: release

pytest-resilience-agent 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

pytest-resilience-agent

The problem

How it works

Quickstart

Multi-turn chaos

Built-in chaos scenarios

Live sponsor integration

What is and is not covered

Roadmap

Why this is different

Try it locally in two minutes

CLI

License

Acknowledgements

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance