Skip to main content

Mine behavioral invariants from production logs and auto-generate tests

Project description

Sediment logo

Sediment

Mine behavioral invariants from LLM production logs. Auto-generate tests.

PyPI version Python versions 210 tests passing zero required dependencies MIT license

Quickstart · CLI · Invariant types · CI integration · Formats · API


Sediment reads your production logs, discovers what your LLM system actually does (not what you think it does), and turns those discoveries into runnable pytest tests and CI checks.

pip install sediment
sediment discover logs/prod.jsonl
Discovered 14 invariants from logs/prod.jsonl

[structural]  output_never_empty          confidence=100%  support=2841
[structural]  output_always_json          confidence=98%   support=2784
[pattern]     no_email_in_output          confidence=100%  support=2841  ← PII guard
[pattern]     no_credit_card_in_output    confidence=100%  support=2841  ← PII guard
[statistical] latency_p95_threshold       confidence=94%   support=2672  p95=1240ms
[temporal]    output_length_drift         confidence=91%   support=2841
[semantic]    semantic_consistency        confidence=87%   support=2841
...

What it does

Step Description
Ingest Reads logs in any format — JSONL, CSV, Parquet, gzip, OpenAI, LangSmith, OTel, and more
Infer Auto-detects format and field schema (input, output, latency, model, session, …)
Discover Mines behavioral invariants across 7 miner types
Generate Writes a pytest test file you can drop straight into CI
Track Saves a baseline and alerts when production behavior drifts

Install

pip install sediment                          # core — zero required dependencies
pip install "sediment[parquet]"               # + Parquet / Arrow support
pip install "sediment[avro]"                  # + Avro support
pip install "sediment[cloud]"                 # + S3 / GCS / Azure Blob sources
pip install "sediment[openai]"                # + OpenAI embedding backend
pip install "sediment[sentence-transformers]" # + sentence-transformers backend
pip install "sediment[config]"                # + .sediment.yml config file support
pip install "sediment[full]"                  # everything

Quickstart

Python API

from sediment import LogAnalyzer

a = LogAnalyzer("logs/prod.jsonl")

# Inspect what was detected
print(a.summary())

# Discover invariants
invariants = a.discover(min_confidence=0.8)
for inv in invariants:
    print(inv)

# Generate a pytest test file
a.emit_tests("test_invariants.py", function_hint="call_llm")
# → Run with: pytest test_invariants.py -v

# Generate an interactive HTML report
a.report("report.html")

CLI

# Explore what's in your logs
sediment summary logs/prod.jsonl

# Discover and print invariants
sediment discover logs/prod.jsonl --min-confidence 0.8

# Save a baseline for future staleness checks
sediment save logs/prod.jsonl baseline.json

# Check for drift against a new batch of logs
sediment check-staleness logs/today.jsonl baseline.json

# Compare invariants between two log snapshots
sediment compare logs/v1.jsonl logs/v2.jsonl

# Generate an HTML report
sediment report logs/prod.jsonl -o report.html

# Scaffold a .sediment.yml config and first baseline
sediment init logs/prod.jsonl

Invariant types

Sediment runs 7 miner types in parallel. Each produces typed, confidence-annotated invariants.

Structural

What your outputs always look like:

  • output_never_empty — output is always non-null, non-empty
  • output_length_range — character length stays within observed bounds
  • output_always_json — every output is valid JSON
  • output_json_keys_consistent — JSON outputs always contain the same keys
  • output_type_consistent — output type (str / list / dict) is stable

Statistical

Distributional properties of your system:

  • latency_p95_threshold — p95 latency stays under threshold
  • cost_p95_threshold — p95 cost per request stays under threshold
  • error_rate — error rate at or below observed baseline
  • model_consistency — a single model is used throughout

Pattern — PII & safety

What must never appear in outputs:

  • no_email_in_output — no email addresses leaked  🔴 critical
  • no_phone_us_in_output — no US phone numbers leaked  🔴 critical
  • no_ssn_in_output — no Social Security numbers leaked  🔴 critical
  • no_credit_card_in_output — no credit card numbers leaked (Luhn-validated)  🔴 critical
  • no_ipv4_in_output — no IP addresses leaked  🔴 critical

PII detection uses validated regex — SSNs checked against SSA rules, credit cards validated with the Luhn algorithm, phone numbers validated against NANP rules.

Relational

Input → output relationships:

  • output_minimum_length — outputs stay above a safe minimum length relative to input
  • refusal_rate — model refuses or apologises within observed bounds
  • input_output_length_correlation — longer inputs produce longer outputs (when expected)

Semantic

Meaning-level consistency:

  • semantic_consistency — outputs remain semantically similar to baseline
  • semantic_outliers — no outputs diverge more than 2σ from the centroid
  • near_duplicate_outputs — outputs are not near-identical (flags stuck / looping models)

Temporal

Drift over time:

  • output_length_drift — output length distribution hasn't shifted
  • latency_drift — latency distribution hasn't shifted
  • model_drift — model hasn't silently changed
  • error_rate_drift — error rate hasn't crept up

Session

Multi-turn conversation patterns:

  • session_turn_count_range — turns per session stays within expected range
  • session_avg_turns — average session length is stable
  • session_user_return_rate — returning user rate is stable

Staleness tracking & CI

Save a baseline once

sediment save logs/prod.jsonl .sediment-baseline.json

Check daily in CI

sediment check-staleness logs/today.jsonl .sediment-baseline.json
# exits 1 if any invariants are violated
Staleness Report — checked 2024-03-15 09:00 UTC
Original discovery: 2024-03-01  source: logs/prod.jsonl

  ✓ Holds:    11/14
  ↓ Degraded:  2/14   (confidence dropped > 10pp)
  ✗ Violated:  1/14   (confidence dropped > 30pp)  ← CI fails here
  ? Missing:   0/14

GitHub Actions

# .github/workflows/sediment.yml
- name: Check invariant staleness
  run: sediment check-staleness ${{ env.LOG_SOURCE }} .sediment-baseline.json

pytest plugin

Collect *.sediment.json baselines as native pytest test items:

pytest --sediment-source=logs/today.jsonl

Each invariant becomes a separate test. Violated invariants fail; degraded ones warn.

Compare two releases

sediment compare logs/v1.jsonl logs/v2.jsonl
Sediment Compare: logs/v1.jsonl  →  logs/v2.jsonl
────────────────────────────────────────────────────────────
  New:        2   invariants appeared
  Removed:    0   invariants disappeared
  Improved:   3   confidence increased ≥5%
  Degraded:   1   confidence decreased ≥5%
  Stable:    10   no meaningful change

⚠️  DEGRADED  latency_p95_threshold  87% (-8%)
✅ No regressions detected.

Supported formats

Format Auto-detected Notes
JSONL / NDJSON Streaming, nested field paths
JSON array [{…}, {…}]
CSV / TSV Any delimiter, quoted fields
logfmt key=value key="quoted value"
Apache / nginx Combined log format
Parquet Requires pyarrow
Avro Requires fastavro
gzip .jsonl.gz, .csv.gz, etc.
OpenAI API logs Auto-detected
LangSmith traces Auto-detected
LangFuse generations Auto-detected
OpenTelemetry GenAI Auto-detected
Helicone Auto-detected
W&B Weave Auto-detected
MLflow traces Auto-detected
Datadog LLM Obs Auto-detected
S3 / GCS / Azure Blob Requires sediment[cloud]
stdin sediment discover -

Glob patterns, directories, and cloud URIs all work:

LogAnalyzer("logs/*.jsonl.gz")
LogAnalyzer("logs/")
LogAnalyzer("s3://my-bucket/logs/*.jsonl")
LogAnalyzer("-")   # stdin

Sampling

For large log files:

LogAnalyzer("huge.jsonl", sample=10_000, sampling_strategy="importance")
Strategy Description
random Uniform random sample (default)
stratified Preserves output-length distribution
importance Oversamples rare / anomalous entries
time_windowed Weights recent entries higher

Configuration

Create .sediment.yml in your project root (or run sediment init logs/prod.jsonl):

# .sediment.yml
min_confidence: 0.8
min_support: 2
baseline: .sediment-baseline.json

types:
  - structural
  - statistical
  - pattern
  - relational
  - semantic
  - temporal
  - session

# sample: 10000
# sampling_strategy: random

report:
  format: html
  output: sediment_report.html

All CLI commands pick this up automatically.


Custom miners

Register your own miner function to discover domain-specific invariants:

from sediment.discovery.base import InvariantResult

def apology_rate_miner(entries):
    count = sum(1 for e in entries if "sorry" in str(e.output).lower())
    rate  = count / len(entries)
    return [InvariantResult(
        id="apology_rate",
        type="custom",
        description=f"Model apologises in {rate:.0%} of responses",
        confidence=1.0 - rate,
        support=count,
        total=len(entries),
        severity="warning" if rate > 0.1 else "info",
    )]

results = LogAnalyzer("logs.jsonl").register_miner(apology_rate_miner).discover()

Embedding backends

Used by the semantic miner. Swap for better accuracy:

from sediment.embeddings.openai_emb import OpenAIEmbedder

a = LogAnalyzer("logs.jsonl")
results = a.discover(embedder=OpenAIEmbedder(api_key="sk-..."))
Backend Class Quality Install
TF-IDF TfidfEmbedder Basic built-in
OpenAI text-embedding-3-small OpenAIEmbedder High sediment[openai]
all-MiniLM-L6-v2 SentenceTransformerEmbedder High sediment[sentence-transformers]

Schema evolution detection

Detects when field names change mid-stream — e.g. a deploy that renamed promptinput:

drifts = LogAnalyzer("logs.jsonl").check_schema_evolution()
for d in drifts:
    print(d)
# [SCHEMA DRIFT] input: 'prompt' → 'input'  (around entry 5000, early=94% late=97%)

Jupyter

a = LogAnalyzer("logs.jsonl")
a.show()   # renders interactive HTML report inline

API reference

LogAnalyzer(
    source,                      # file, glob, directory, s3://, gs://, az://, or "-"
    schema=None,                 # override inferred schema
    sample=None,                 # max entries to load
    sampling_strategy="random",  # random | stratified | importance | time_windowed
    format_hint=None,            # skip auto-detection
)

# Exploration
.summary()                         Summary
.infer()                           SchemaMap
.entries()                         Iterator[LogEntry]
async .async_entries()             AsyncIterator[LogEntry]

# Discovery
.discover(
    min_confidence=0.8,
    min_support=2,
    types=None,                  # list of miner type strings, or None for all
    dedup=True,
    embedder=None,
)                                  list[InvariantResult]

# Output
.emit_tests(output_path, min_confidence=0.8, function_hint="my_function")
.report(output_path, fmt="html", min_confidence=0.5)
.show(min_confidence=0.5)        # Jupyter inline display

# Staleness
.save_invariants(path, min_confidence=0.8)
.check_staleness(invariants_path)  StalenessReport
.check_schema_evolution()          list[SchemaDrift]

# Extension
.register_miner(fn)                LogAnalyzer  (chainable)

Development

git clone https://github.com/sediment-py/sediment
cd sediment
pip install -e ".[dev]"
pytest tests/ -v

210 tests · zero required dependencies · Python 3.9+


License

MIT © Sediment Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sediment-0.1.1.tar.gz (129.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sediment-0.1.1-py3-none-any.whl (82.9 kB view details)

Uploaded Python 3

File details

Details for the file sediment-0.1.1.tar.gz.

File metadata

  • Download URL: sediment-0.1.1.tar.gz
  • Upload date:
  • Size: 129.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for sediment-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f785eede9ab2ce1d9f8d1707488a2f59efb00867a694b1be399455d934e0a5de
MD5 1326e72594b41675c85a5619d023228a
BLAKE2b-256 a438734550e918fed9bece6058d289903137cd0be61ef47a1a71d76ebf4e7956

See more details on using hashes here.

File details

Details for the file sediment-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: sediment-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 82.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for sediment-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d9ce5a9ad8e0ba7d53c2ef1a5a3b31f50140a904d39296ad7cd862d16ca31d40
MD5 e8048ea711e7297bd2dcd6b11bbcb3f7
BLAKE2b-256 9868bf24f3ba6fc5fc4dfa9e4f12351c653f1398404ef2bf40727eb55a5baa5b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page