LLM pipeline integrity testing — catch the failures your tests never will
Project description
SilentFail
LLM pipeline integrity testing — catch the failures your tests never will.
pip install silentefail
Origin
While building data extraction pipelines at Fullcast, I kept hitting the same class of bug: the pipeline returned something that looked valid — no exception, no null, a real dict — but the data was wrong, truncated, or hallucinated. Unit tests passed because we were testing the happy path. Production broke because LLMs don't always follow the schema. SilentFail is the harness I wish I'd had. It runs your pipeline against adversarial inputs designed to expose the four failure modes that slip past every eval framework I tried.
Quick Start
from pydantic import BaseModel
from silentefail import Auditor, FailureClass
class InputData(BaseModel):
text: str
context: str
class ExtractedData(BaseModel):
name: str
value: float
category: str
def my_pipeline(input_data: dict) -> dict:
# Your LLM call here
...
auditor = Auditor(
pipeline=my_pipeline,
input_schema=InputData,
output_schema=ExtractedData,
golden_dataset=[
("What is 2+2?", "4", ["4", "four"]),
("Capital of France?", "Paris", ["Paris"]),
],
context_window=128000,
test_inputs=[
{"text": "Revenue was $1.2M", "context": "Q3 report"},
],
)
report = auditor.run([
FailureClass.SCHEMA_DRIFT,
FailureClass.HALLUCINATED_STRUCTURE,
])
report.summary()
# SilentFail Audit Report
# ========================
# Tests run: 24
# Failures detected: 3
# Pass rate: 87.5%
#
# HIGH severity: 2
# • SCHEMA_DRIFT: None returned on missing 'context' field
# • HALLUCINATED_STRUCTURE: Invented key 'confidence_score' not in schema
# MEDIUM severity: 1
# • SCHEMA_DRIFT: Unhandled KeyError on extra field 'metadata'
report.export("silentefail_report.html")
LangChain Integration
from silentefail import Auditor
from silentefail.runners import LangChainRunner
runner = LangChainRunner(your_langchain_chain)
auditor = Auditor(
pipeline=runner,
input_schema=InputData,
output_schema=OutputSchema,
...
)
report = auditor.run()
The Four Failure Classes
Class 1 — Schema Drift
Your pipeline receives a slightly malformed input: a missing required field, a None where a string is expected, an extra unexpected key. What happens?
- Silent
Nonereturn — the pipeline swallows the bad input and returns nothing. No error, no log. Your downstream code crashes mysteriously. - Unhandled exception —
KeyError,AttributeError, orTypeErrorbubbles up instead of a clearValidationError.
SilentFail generates a battery of adversarial input variants from your input_schema and runs them all.
auditor = Auditor(pipeline=my_fn, input_schema=MyInputModel)
report = auditor.run([FailureClass.SCHEMA_DRIFT])
Class 2 — Confident Wrong Answers
Given a golden dataset of known question→answer pairs, SilentFail checks two things: is the answer wrong, and does the pipeline express any uncertainty?
A pipeline that answers confidently and incorrectly 30% of the time is a calibration failure. This class finds it.
golden = [
("What is 2+2?", "4", ["4", "four"]),
("Capital of France?", "Paris", ["Paris"]),
]
auditor = Auditor(pipeline=my_fn, golden_dataset=golden)
report = auditor.run([FailureClass.CONFIDENT_WRONG])
Class 3 — Silent Truncation
At 50% context fill your pipeline returns 800 tokens. At 95% context fill it returns 80 tokens. No error — just a quietly shorter answer. Required fields are present but empty. Reasoning stops mid-sentence.
SilentFail pads inputs to 50%, 75%, 90%, 95%, and 99% of your declared context window and compares output length and completeness.
auditor = Auditor(pipeline=my_fn, context_window=128000)
report = auditor.run([FailureClass.SILENT_TRUNCATION])
Class 4 — Hallucinated Structure
Your output schema says {name, value, category}. The model returns {name, value, category, confidence_score, reasoning, source_url}. Or it returns {name} and drops the rest. Or the types are wrong — value is a string, not a float.
This class runs real inputs through your pipeline and validates every output against your output_schema.
auditor = Auditor(
pipeline=my_fn,
output_schema=ExtractedData,
test_inputs=[{"text": "Revenue was $1.2M", "context": "Q3"}],
)
report = auditor.run([FailureClass.HALLUCINATED_STRUCTURE])
Why This Is Different
| Tool | What it tests |
|---|---|
| Evals / LLM-as-judge | Output quality |
| Pytest + mocks | Happy-path logic |
| SilentFail | Pipeline integrity under adversarial conditions |
Eval frameworks tell you if your model is smart. SilentFail tells you if your pipeline is safe — whether it will fail silently or loudly when reality diverges from your assumptions.
Report
report.export("report.html") generates a self-contained dark-themed HTML file — no external dependencies, no CDN calls. Each failure shows the input, the output, the failure type, a severity badge, and a one-line fix recommendation. Share it with your team or drop it in a PR.
API Reference
Auditor(
pipeline: Callable, # Any callable: fn, chain.invoke, LangChainRunner(chain)
input_schema: BaseModel, # For SCHEMA_DRIFT
output_schema: BaseModel, # For HALLUCINATED_STRUCTURE
golden_dataset: list[tuple], # For CONFIDENT_WRONG: (question, answer, [keywords])
context_window: int, # For SILENT_TRUNCATION: token limit
test_inputs: list, # For HALLUCINATED_STRUCTURE: real inputs
)
auditor.run(failure_classes=[...]) -> AuditReport
report.summary() # Rich console output
report.export(path) # HTML file
report.to_dict() # Machine-readable dict
Installation
# Core (no LangChain)
pip install silentefail
# With LangChain support
pip install "silentefail[langchain]"
# For development
pip install "silentefail[dev]"
Portfolio
Built by Dharsha Reddy.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file silentefail-0.1.0.tar.gz.
File metadata
- Download URL: silentefail-0.1.0.tar.gz
- Upload date:
- Size: 20.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c568848cc493feb4ef52611808802133c516d8ace579378be1138872d47a9a4c
|
|
| MD5 |
7b8c6031c7b54cdc0c29e847d99f8285
|
|
| BLAKE2b-256 |
04ce29b9bab0f3a6ea0b34454fa724a62d0f1f8cadf5b7effe86f402e3110ed3
|
Provenance
The following attestation bundles were made for silentefail-0.1.0.tar.gz:
Publisher:
ci.yml on DharshanaReddy/silentefail
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
silentefail-0.1.0.tar.gz -
Subject digest:
c568848cc493feb4ef52611808802133c516d8ace579378be1138872d47a9a4c - Sigstore transparency entry: 1842201252
- Sigstore integration time:
-
Permalink:
DharshanaReddy/silentefail@d42e68d8739c5839b6dbab8103b84661bbd60060 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/DharshanaReddy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@d42e68d8739c5839b6dbab8103b84661bbd60060 -
Trigger Event:
release
-
Statement type:
File details
Details for the file silentefail-0.1.0-py3-none-any.whl.
File metadata
- Download URL: silentefail-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
251a51dec4791e8ac1b7d6d363eaecd50597ad018bc8df82f848db9beb425c8a
|
|
| MD5 |
503ee6c62a810d7b6210b0519087531d
|
|
| BLAKE2b-256 |
81205f9b0f8df1a6aec6a4c059030f981c67cfe859f80142e12612899371c4ba
|
Provenance
The following attestation bundles were made for silentefail-0.1.0-py3-none-any.whl:
Publisher:
ci.yml on DharshanaReddy/silentefail
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
silentefail-0.1.0-py3-none-any.whl -
Subject digest:
251a51dec4791e8ac1b7d6d363eaecd50597ad018bc8df82f848db9beb425c8a - Sigstore transparency entry: 1842201498
- Sigstore integration time:
-
Permalink:
DharshanaReddy/silentefail@d42e68d8739c5839b6dbab8103b84661bbd60060 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/DharshanaReddy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@d42e68d8739c5839b6dbab8103b84661bbd60060 -
Trigger Event:
release
-
Statement type: