Mine behavioral invariants from production logs and auto-generate tests
Project description
Sediment
Mine behavioral invariants from LLM production logs. Auto-generate tests.
Quickstart · CLI · Invariant types · CI integration · Formats · API
Sediment reads your production logs, discovers what your LLM system actually does (not what you think it does), and turns those discoveries into runnable pytest tests and CI checks.
pip install sediment
sediment discover logs/prod.jsonl
Discovered 14 invariants from logs/prod.jsonl
[structural] output_never_empty confidence=100% support=2841
[structural] output_always_json confidence=98% support=2784
[pattern] no_email_in_output confidence=100% support=2841 ← PII guard
[pattern] no_credit_card_in_output confidence=100% support=2841 ← PII guard
[statistical] latency_p95_threshold confidence=94% support=2672 p95=1240ms
[temporal] output_length_drift confidence=91% support=2841
[semantic] semantic_consistency confidence=87% support=2841
...
What it does
| Step | Description |
|---|---|
| Ingest | Reads logs in any format — JSONL, CSV, Parquet, gzip, OpenAI, LangSmith, OTel, and more |
| Infer | Auto-detects format and field schema (input, output, latency, model, session, …) |
| Discover | Mines behavioral invariants across 7 miner types |
| Generate | Writes a pytest test file you can drop straight into CI |
| Track | Saves a baseline and alerts when production behavior drifts |
Install
pip install sediment # core — zero required dependencies
pip install "sediment[parquet]" # + Parquet / Arrow support
pip install "sediment[avro]" # + Avro support
pip install "sediment[cloud]" # + S3 / GCS / Azure Blob sources
pip install "sediment[openai]" # + OpenAI embedding backend
pip install "sediment[sentence-transformers]" # + sentence-transformers backend
pip install "sediment[config]" # + .sediment.yml config file support
pip install "sediment[full]" # everything
Quickstart
Python API
from sediment import LogAnalyzer
a = LogAnalyzer("logs/prod.jsonl")
# Inspect what was detected
print(a.summary())
# Discover invariants
invariants = a.discover(min_confidence=0.8)
for inv in invariants:
print(inv)
# Generate a pytest test file
a.emit_tests("test_invariants.py", function_hint="call_llm")
# → Run with: pytest test_invariants.py -v
# Generate an interactive HTML report
a.report("report.html")
CLI
# Explore what's in your logs
sediment summary logs/prod.jsonl
# Discover and print invariants
sediment discover logs/prod.jsonl --min-confidence 0.8
# Save a baseline for future staleness checks
sediment save logs/prod.jsonl baseline.json
# Check for drift against a new batch of logs
sediment check-staleness logs/today.jsonl baseline.json
# Compare invariants between two log snapshots
sediment compare logs/v1.jsonl logs/v2.jsonl
# Generate an HTML report
sediment report logs/prod.jsonl -o report.html
# Scaffold a .sediment.yml config and first baseline
sediment init logs/prod.jsonl
Invariant types
Sediment runs 7 miner types in parallel. Each produces typed, confidence-annotated invariants.
Structural
What your outputs always look like:
output_never_empty— output is always non-null, non-emptyoutput_length_range— character length stays within observed boundsoutput_always_json— every output is valid JSONoutput_json_keys_consistent— JSON outputs always contain the same keysoutput_type_consistent— output type (str / list / dict) is stable
Statistical
Distributional properties of your system:
latency_p95_threshold— p95 latency stays under thresholdcost_p95_threshold— p95 cost per request stays under thresholderror_rate— error rate at or below observed baselinemodel_consistency— a single model is used throughout
Pattern — PII & safety
What must never appear in outputs:
no_email_in_output— no email addresses leaked 🔴 criticalno_phone_us_in_output— no US phone numbers leaked 🔴 criticalno_ssn_in_output— no Social Security numbers leaked 🔴 criticalno_credit_card_in_output— no credit card numbers leaked (Luhn-validated) 🔴 criticalno_ipv4_in_output— no IP addresses leaked 🔴 critical
PII detection uses validated regex — SSNs checked against SSA rules, credit cards validated with the Luhn algorithm, phone numbers validated against NANP rules.
Relational
Input → output relationships:
output_minimum_length— outputs stay above a safe minimum length relative to inputrefusal_rate— model refuses or apologises within observed boundsinput_output_length_correlation— longer inputs produce longer outputs (when expected)
Semantic
Meaning-level consistency:
semantic_consistency— outputs remain semantically similar to baselinesemantic_outliers— no outputs diverge more than 2σ from the centroidnear_duplicate_outputs— outputs are not near-identical (flags stuck / looping models)
Temporal
Drift over time:
output_length_drift— output length distribution hasn't shiftedlatency_drift— latency distribution hasn't shiftedmodel_drift— model hasn't silently changederror_rate_drift— error rate hasn't crept up
Session
Multi-turn conversation patterns:
session_turn_count_range— turns per session stays within expected rangesession_avg_turns— average session length is stablesession_user_return_rate— returning user rate is stable
Staleness tracking & CI
Save a baseline once
sediment save logs/prod.jsonl .sediment-baseline.json
Check daily in CI
sediment check-staleness logs/today.jsonl .sediment-baseline.json
# exits 1 if any invariants are violated
Staleness Report — checked 2024-03-15 09:00 UTC
Original discovery: 2024-03-01 source: logs/prod.jsonl
✓ Holds: 11/14
↓ Degraded: 2/14 (confidence dropped > 10pp)
✗ Violated: 1/14 (confidence dropped > 30pp) ← CI fails here
? Missing: 0/14
GitHub Actions
# .github/workflows/sediment.yml
- name: Check invariant staleness
run: sediment check-staleness ${{ env.LOG_SOURCE }} .sediment-baseline.json
pytest plugin
Collect *.sediment.json baselines as native pytest test items:
pytest --sediment-source=logs/today.jsonl
Each invariant becomes a separate test. Violated invariants fail; degraded ones warn.
Compare two releases
sediment compare logs/v1.jsonl logs/v2.jsonl
Sediment Compare: logs/v1.jsonl → logs/v2.jsonl
────────────────────────────────────────────────────────────
New: 2 invariants appeared
Removed: 0 invariants disappeared
Improved: 3 confidence increased ≥5%
Degraded: 1 confidence decreased ≥5%
Stable: 10 no meaningful change
⚠️ DEGRADED latency_p95_threshold 87% (-8%)
✅ No regressions detected.
Supported formats
| Format | Auto-detected | Notes |
|---|---|---|
| JSONL / NDJSON | ✅ | Streaming, nested field paths |
| JSON array | ✅ | [{…}, {…}] |
| CSV / TSV | ✅ | Any delimiter, quoted fields |
| logfmt | ✅ | key=value key="quoted value" |
| Apache / nginx | ✅ | Combined log format |
| Parquet | ✅ | Requires pyarrow |
| Avro | ✅ | Requires fastavro |
| gzip | ✅ | .jsonl.gz, .csv.gz, etc. |
| OpenAI API logs | ✅ | Auto-detected |
| LangSmith traces | ✅ | Auto-detected |
| LangFuse generations | ✅ | Auto-detected |
| OpenTelemetry GenAI | ✅ | Auto-detected |
| Helicone | ✅ | Auto-detected |
| W&B Weave | ✅ | Auto-detected |
| MLflow traces | ✅ | Auto-detected |
| Datadog LLM Obs | ✅ | Auto-detected |
| S3 / GCS / Azure Blob | ✅ | Requires sediment[cloud] |
| stdin | ✅ | sediment discover - |
Glob patterns, directories, and cloud URIs all work:
LogAnalyzer("logs/*.jsonl.gz")
LogAnalyzer("logs/")
LogAnalyzer("s3://my-bucket/logs/*.jsonl")
LogAnalyzer("-") # stdin
Sampling
For large log files:
LogAnalyzer("huge.jsonl", sample=10_000, sampling_strategy="importance")
| Strategy | Description |
|---|---|
random |
Uniform random sample (default) |
stratified |
Preserves output-length distribution |
importance |
Oversamples rare / anomalous entries |
time_windowed |
Weights recent entries higher |
Configuration
Create .sediment.yml in your project root (or run sediment init logs/prod.jsonl):
# .sediment.yml
min_confidence: 0.8
min_support: 2
baseline: .sediment-baseline.json
types:
- structural
- statistical
- pattern
- relational
- semantic
- temporal
- session
# sample: 10000
# sampling_strategy: random
report:
format: html
output: sediment_report.html
All CLI commands pick this up automatically.
Custom miners
Register your own miner function to discover domain-specific invariants:
from sediment.discovery.base import InvariantResult
def apology_rate_miner(entries):
count = sum(1 for e in entries if "sorry" in str(e.output).lower())
rate = count / len(entries)
return [InvariantResult(
id="apology_rate",
type="custom",
description=f"Model apologises in {rate:.0%} of responses",
confidence=1.0 - rate,
support=count,
total=len(entries),
severity="warning" if rate > 0.1 else "info",
)]
results = LogAnalyzer("logs.jsonl").register_miner(apology_rate_miner).discover()
Embedding backends
Used by the semantic miner. Swap for better accuracy:
from sediment.embeddings.openai_emb import OpenAIEmbedder
a = LogAnalyzer("logs.jsonl")
results = a.discover(embedder=OpenAIEmbedder(api_key="sk-..."))
| Backend | Class | Quality | Install |
|---|---|---|---|
| TF-IDF | TfidfEmbedder |
Basic | built-in |
OpenAI text-embedding-3-small |
OpenAIEmbedder |
High | sediment[openai] |
all-MiniLM-L6-v2 |
SentenceTransformerEmbedder |
High | sediment[sentence-transformers] |
Schema evolution detection
Detects when field names change mid-stream — e.g. a deploy that renamed prompt → input:
drifts = LogAnalyzer("logs.jsonl").check_schema_evolution()
for d in drifts:
print(d)
# [SCHEMA DRIFT] input: 'prompt' → 'input' (around entry 5000, early=94% late=97%)
Jupyter
a = LogAnalyzer("logs.jsonl")
a.show() # renders interactive HTML report inline
API reference
LogAnalyzer(
source, # file, glob, directory, s3://, gs://, az://, or "-"
schema=None, # override inferred schema
sample=None, # max entries to load
sampling_strategy="random", # random | stratified | importance | time_windowed
format_hint=None, # skip auto-detection
)
# Exploration
.summary() → Summary
.infer() → SchemaMap
.entries() → Iterator[LogEntry]
async .async_entries() → AsyncIterator[LogEntry]
# Discovery
.discover(
min_confidence=0.8,
min_support=2,
types=None, # list of miner type strings, or None for all
dedup=True,
embedder=None,
) → list[InvariantResult]
# Output
.emit_tests(output_path, min_confidence=0.8, function_hint="my_function")
.report(output_path, fmt="html", min_confidence=0.5)
.show(min_confidence=0.5) # Jupyter inline display
# Staleness
.save_invariants(path, min_confidence=0.8)
.check_staleness(invariants_path) → StalenessReport
.check_schema_evolution() → list[SchemaDrift]
# Extension
.register_miner(fn) → LogAnalyzer (chainable)
Development
git clone https://github.com/sediment-py/sediment
cd sediment
pip install -e ".[dev]"
pytest tests/ -v
210 tests · zero required dependencies · Python 3.9+
License
MIT © Sediment Contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sediment-0.1.1.tar.gz.
File metadata
- Download URL: sediment-0.1.1.tar.gz
- Upload date:
- Size: 129.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f785eede9ab2ce1d9f8d1707488a2f59efb00867a694b1be399455d934e0a5de
|
|
| MD5 |
1326e72594b41675c85a5619d023228a
|
|
| BLAKE2b-256 |
a438734550e918fed9bece6058d289903137cd0be61ef47a1a71d76ebf4e7956
|
File details
Details for the file sediment-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sediment-0.1.1-py3-none-any.whl
- Upload date:
- Size: 82.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9ce5a9ad8e0ba7d53c2ef1a5a3b31f50140a904d39296ad7cd862d16ca31d40
|
|
| MD5 |
e8048ea711e7297bd2dcd6b11bbcb3f7
|
|
| BLAKE2b-256 |
9868bf24f3ba6fc5fc4dfa9e4f12351c653f1398404ef2bf40727eb55a5baa5b
|