Skip to main content

LLM pipeline integrity testing — catch the failures your tests never will

Project description

SilentFail

LLM pipeline integrity testing — catch the failures your tests never will.

pip install silentefail

Origin

While building data extraction pipelines at Fullcast, I kept hitting the same class of bug: the pipeline returned something that looked valid — no exception, no null, a real dict — but the data was wrong, truncated, or hallucinated. Unit tests passed because we were testing the happy path. Production broke because LLMs don't always follow the schema. SilentFail is the harness I wish I'd had. It runs your pipeline against adversarial inputs designed to expose the four failure modes that slip past every eval framework I tried.


Quick Start

from pydantic import BaseModel
from silentefail import Auditor, FailureClass

class InputData(BaseModel):
    text: str
    context: str

class ExtractedData(BaseModel):
    name: str
    value: float
    category: str

def my_pipeline(input_data: dict) -> dict:
    # Your LLM call here
    ...

auditor = Auditor(
    pipeline=my_pipeline,
    input_schema=InputData,
    output_schema=ExtractedData,
    golden_dataset=[
        ("What is 2+2?", "4", ["4", "four"]),
        ("Capital of France?", "Paris", ["Paris"]),
    ],
    context_window=128000,
    test_inputs=[
        {"text": "Revenue was $1.2M", "context": "Q3 report"},
    ],
)

report = auditor.run([
    FailureClass.SCHEMA_DRIFT,
    FailureClass.HALLUCINATED_STRUCTURE,
])

report.summary()
# SilentFail Audit Report
# ========================
# Tests run:         24
# Failures detected: 3
# Pass rate:         87.5%
#
# HIGH severity: 2
#   • SCHEMA_DRIFT: None returned on missing 'context' field
#   • HALLUCINATED_STRUCTURE: Invented key 'confidence_score' not in schema
# MEDIUM severity: 1
#   • SCHEMA_DRIFT: Unhandled KeyError on extra field 'metadata'

report.export("silentefail_report.html")

LangChain Integration

from silentefail import Auditor
from silentefail.runners import LangChainRunner

runner = LangChainRunner(your_langchain_chain)

auditor = Auditor(
    pipeline=runner,
    input_schema=InputData,
    output_schema=OutputSchema,
    ...
)
report = auditor.run()

The Four Failure Classes

Class 1 — Schema Drift

Your pipeline receives a slightly malformed input: a missing required field, a None where a string is expected, an extra unexpected key. What happens?

  • Silent None return — the pipeline swallows the bad input and returns nothing. No error, no log. Your downstream code crashes mysteriously.
  • Unhandled exceptionKeyError, AttributeError, or TypeError bubbles up instead of a clear ValidationError.

SilentFail generates a battery of adversarial input variants from your input_schema and runs them all.

auditor = Auditor(pipeline=my_fn, input_schema=MyInputModel)
report = auditor.run([FailureClass.SCHEMA_DRIFT])

Class 2 — Confident Wrong Answers

Given a golden dataset of known question→answer pairs, SilentFail checks two things: is the answer wrong, and does the pipeline express any uncertainty?

A pipeline that answers confidently and incorrectly 30% of the time is a calibration failure. This class finds it.

golden = [
    ("What is 2+2?", "4", ["4", "four"]),
    ("Capital of France?", "Paris", ["Paris"]),
]
auditor = Auditor(pipeline=my_fn, golden_dataset=golden)
report = auditor.run([FailureClass.CONFIDENT_WRONG])

Class 3 — Silent Truncation

At 50% context fill your pipeline returns 800 tokens. At 95% context fill it returns 80 tokens. No error — just a quietly shorter answer. Required fields are present but empty. Reasoning stops mid-sentence.

SilentFail pads inputs to 50%, 75%, 90%, 95%, and 99% of your declared context window and compares output length and completeness.

auditor = Auditor(pipeline=my_fn, context_window=128000)
report = auditor.run([FailureClass.SILENT_TRUNCATION])

Class 4 — Hallucinated Structure

Your output schema says {name, value, category}. The model returns {name, value, category, confidence_score, reasoning, source_url}. Or it returns {name} and drops the rest. Or the types are wrong — value is a string, not a float.

This class runs real inputs through your pipeline and validates every output against your output_schema.

auditor = Auditor(
    pipeline=my_fn,
    output_schema=ExtractedData,
    test_inputs=[{"text": "Revenue was $1.2M", "context": "Q3"}],
)
report = auditor.run([FailureClass.HALLUCINATED_STRUCTURE])

Why This Is Different

Tool What it tests
Evals / LLM-as-judge Output quality
Pytest + mocks Happy-path logic
SilentFail Pipeline integrity under adversarial conditions

Eval frameworks tell you if your model is smart. SilentFail tells you if your pipeline is safe — whether it will fail silently or loudly when reality diverges from your assumptions.


Report

report.export("report.html") generates a self-contained dark-themed HTML file — no external dependencies, no CDN calls. Each failure shows the input, the output, the failure type, a severity badge, and a one-line fix recommendation. Share it with your team or drop it in a PR.


API Reference

Auditor(
    pipeline: Callable,           # Any callable: fn, chain.invoke, LangChainRunner(chain)
    input_schema: BaseModel,      # For SCHEMA_DRIFT
    output_schema: BaseModel,     # For HALLUCINATED_STRUCTURE
    golden_dataset: list[tuple],  # For CONFIDENT_WRONG: (question, answer, [keywords])
    context_window: int,          # For SILENT_TRUNCATION: token limit
    test_inputs: list,            # For HALLUCINATED_STRUCTURE: real inputs
)

auditor.run(failure_classes=[...]) -> AuditReport

report.summary()         # Rich console output
report.export(path)      # HTML file
report.to_dict()         # Machine-readable dict

Installation

# Core (no LangChain)
pip install silentefail

# With LangChain support
pip install "silentefail[langchain]"

# For development
pip install "silentefail[dev]"

Portfolio

Built by Dharsha Reddy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

silentefail-0.1.0.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

silentefail-0.1.0-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file silentefail-0.1.0.tar.gz.

File metadata

  • Download URL: silentefail-0.1.0.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for silentefail-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c568848cc493feb4ef52611808802133c516d8ace579378be1138872d47a9a4c
MD5 7b8c6031c7b54cdc0c29e847d99f8285
BLAKE2b-256 04ce29b9bab0f3a6ea0b34454fa724a62d0f1f8cadf5b7effe86f402e3110ed3

See more details on using hashes here.

Provenance

The following attestation bundles were made for silentefail-0.1.0.tar.gz:

Publisher: ci.yml on DharshanaReddy/silentefail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file silentefail-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: silentefail-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for silentefail-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 251a51dec4791e8ac1b7d6d363eaecd50597ad018bc8df82f848db9beb425c8a
MD5 503ee6c62a810d7b6210b0519087531d
BLAKE2b-256 81205f9b0f8df1a6aec6a4c059030f981c67cfe859f80142e12612899371c4ba

See more details on using hashes here.

Provenance

The following attestation bundles were made for silentefail-0.1.0-py3-none-any.whl:

Publisher: ci.yml on DharshanaReddy/silentefail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page