Skip to main content

Detect silent information loss at system boundaries — semantic exception analysis and round-trip data loss fuzzing for Python

Project description

Crossing

Detect silent information loss at system boundaries in Python codebases.

Two Tools

1. Semantic Scanner — Exception Pattern Analysis

Find where the same exception type carries different meanings depending on the code path, but handlers can't distinguish them.

# Basic scan
crossing-semantic /path/to/project

# With implicit raises (dict access, getattr, etc.)
crossing-semantic --implicit /path/to/project

# JSON output for tooling
crossing-semantic --format json /path/to/project

# CI mode: fail if elevated/high risk crossings found
crossing-semantic --ci --min-risk elevated /path/to/project

Example: a KeyError that means "config key missing" and a KeyError that means "factor-filtered to empty" arrive at the same except KeyError handler. The handler assumes one meaning. The bug is silent.

2. Data Loss Fuzzer — Round-Trip Testing

Test whether information survives boundary crossings: serialization, API calls, database writes, format conversions.

from crossing import Crossing, cross

c = Crossing(
    encode=lambda d: json.dumps(d),
    decode=lambda s: json.loads(s),
)

report = cross(c, samples=1000)
report.print()  # shows what was lost, where, and how

This isn't fuzzing for crashes. It's fuzzing for silent data loss — the operation succeeds but the output is missing something the input had.


Semantic Scanner

What It Finds

  • Polymorphic exceptions: Multiple raise sites for the same exception type, caught by handlers that don't distinguish between them
  • Cross-function crossings: Exceptions raised in called functions, caught by handlers in the caller
  • Cross-file crossings: Same pattern across module boundaries via import resolution
  • Implicit raises: dict[key] -> KeyError, getattr(obj, name) -> AttributeError, int(x) -> ValueError
  • Inheritance crossings: except ValueError catching subclass raises like ValidationError
  • Scope analysis: Whether handlers catch exceptions from direct raises or from called functions
  • Message differentiation: Risk downgraded when all raise sites pass distinct string messages

Risk Levels

Level Meaning
low Single raise site, or polymorphic with matching handler strategies
medium Multiple raise sites with uniform handler treatment
elevated Scope mismatches or cross-function reachability
high Many raise sites, few handlers, mixed implicit/explicit

CLI Options

crossing-semantic [OPTIONS] PATH

Options:
  --implicit          Detect implicit raises (dict access, getattr, etc.)
  --format FORMAT     Output format: text (default), json, markdown
  --min-risk LEVEL    Minimum risk to report: low, medium, elevated, high
  --exclude PATTERN   Exclude directories (repeatable)
  --ci                Exit code 1 if elevated/high risk crossings found

Example Output

============================================================
Semantic Crossing Scan: /path/to/tox
============================================================
Files scanned:        42
Exception raises:     87 (58 explicit, 29 implicit)
Exception handlers:   34
Semantic crossings:   12
  Polymorphic (multi-raise):  8
  Elevated risk:              3

--- KeyError: 3 raise sites, 14 handlers --- high risk ---
  3 raise sites across different loaders (API, TOML, INI),
  14 handlers catching without distinguishing source
============================================================

Information-Theoretic Scoring

Each crossing reports quantitative metrics based on Shannon entropy:

Metric What it measures
Semantic entropy Bits of information carried by the exception type at raise sites (log2 of distinct origins)
Handler discrimination Bits preserved by handlers (re-raise = full, return/pass = zero)
Information loss Bits destroyed: entropy minus discrimination
Collapse ratio Normalized loss: 0% (no collapse) to 100% (total meaning erasure)
--- AttributeError: 4 raise sites, 3 handlers — high risk ---
  Information: 2.0 bits entropy, 0.3 bits lost, 83% collapse

In JSON output, each crossing includes an information_theory object, and the summary includes total_information_loss_bits and mean_collapse_ratio across all crossings.

Real Bugs Found

The semantic scanner has identified real bugs in production codebases:

  • tox #3809: KeyError meaning "factor-filtered to empty" caught by handler expecting "key doesn't exist"
  • Rich #3960: Exception __notes__ leaking across chained exceptions
  • pytest #14214: Verbosity config not propagated across internal call boundary

Data Loss Fuzzer

Built-in Crossings

Crossing What it tests Typical loss rate
json_crossing() JSON with default=str ~24% lossy, 34% crashes
json_crossing_strict() JSON without fallback ~6% lossy, 52% crashes
pickle_crossing() Python pickle 0% (lossless baseline)
yaml_crossing() YAML safe_load ~0% lossy, 49% crashes
toml_crossing() TOML via tomllib/tomli_w varies
csv_crossing() CSV (everything becomes strings) ~82% lossy
env_file_crossing() .env files (KEY=VALUE) ~83% lossy
url_query_crossing() URL query string encoding ~80% lossy

Custom Crossings

from crossing import Crossing, cross

# Test your API serialization
c = Crossing(
    encode=lambda d: my_api_serialize(d),
    decode=lambda s: my_api_deserialize(s),
    name="My API boundary",
)
report = cross(c, samples=1000)
report.print()

CLI

# Test a single format
crossing test json -n 500 --seed 42

# Test all built-in formats
crossing test -n 200

# Compare how two formats compose
crossing compose json csv -n 300

# Measure how loss scales with repeated crossings
crossing scale json --max-n 5

# List all available crossings
crossing list

Compose Pipelines

from crossing import compose, json_crossing, string_truncation_crossing, cross

# Simulate: serialize -> store in VARCHAR(100) -> deserialize
pipeline = compose(
    json_crossing(),
    string_truncation_crossing(100),
)
report = cross(pipeline, samples=500)

Diff

Compare how two boundaries handle the same data:

from crossing import diff, json_crossing, pickle_crossing

report = diff(json_crossing(), pickle_crossing(), samples=500)
print(f"{report.divergent_count} samples differ between JSON and pickle")

Scaling Analysis

Measure how loss rate changes when data passes through N copies of a boundary:

from crossing import scaling, json_crossing

sr = scaling(json_crossing(), max_n=5, samples=200)
# JSON is idempotent: loss happens on first pass, then saturates (exponent ≈ 0)
# Non-idempotent crossings show positive scaling exponents

Codebase Scanning

python3 scan.py /path/to/project

Finds encode/decode pairs for: JSON, YAML, pickle, TOML, base64, URL encoding, CSV, struct, zlib, gzip.


GitHub Action

Add Crossing to your CI pipeline:

# .github/workflows/crossing.yml
name: Exception Analysis
on: [pull_request]

jobs:
  crossing:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: worksbyfriday/crossing@main
        with:
          path: 'src/'
          fail-on-risk: 'elevated'

Inputs: path, min-risk, format, implicit, exclude, fail-on-risk.


Benchmarks

Scanned 11 popular Python projects (Feb 2026):

Project Files Crossings High Risk Info Loss
pydantic 402 119 12 22.9 bits
sqlalchemy 661 103 16 79.8 bits
django 902 80 6
aiohttp 166 53 11 25.5 bits
click 62 14 5 7.4 bits
celery 161 12 3
flask 24 6 2
requests 18 5 2
rich 100 5 1
astroid 96 5 0
fastapi 47 0 0 0 bits

FastAPI scoring clean validates the tool. Sample audit reports: SQLAlchemy, Django, Celery, Flask, Requests.


API

Scan any installed Python package via HTTP:

curl https://api.fridayops.xyz/crossing/package/flask

Returns JSON with full crossing analysis, information theory metrics, and risk levels.

Audit report — full markdown report with findings, recommendations, and benchmarks:

curl https://api.fridayops.xyz/crossing/report/flask

Badge — embed in your README:

![crossing](https://api.fridayops.xyz/crossing/badge/flask)

crossing

All endpoints:

  • POST /crossing — scan raw Python source
  • GET /crossing/package/{name} — JSON scan results
  • GET /crossing/report/{name} — full markdown audit report
  • GET /crossing/badge/{name} — SVG badge
  • GET /crossing/benchmark — comparison data from 17 projects
  • GET /crossing/packages — list of example packages
  • GET /crossing/example — demo snippet

Install

pip install crossing

Or copy the files directly — no external dependencies. Python 3.10+.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crossing-1.5.0.tar.gz (40.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crossing-1.5.0-py3-none-any.whl (42.3 kB view details)

Uploaded Python 3

File details

Details for the file crossing-1.5.0.tar.gz.

File metadata

  • Download URL: crossing-1.5.0.tar.gz
  • Upload date:
  • Size: 40.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for crossing-1.5.0.tar.gz
Algorithm Hash digest
SHA256 9a9a1c48c409511a0f3f16b905e40a2964f63b4b45fc3a56e4c98e9a995b81e3
MD5 11750d8f5b63b6746dce41dcc83d74ec
BLAKE2b-256 789d45992d4619e00ac9e9a0df33b39865b3395b5f72cf09faeee611e153c543

See more details on using hashes here.

File details

Details for the file crossing-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: crossing-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 42.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for crossing-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2ab23bc3080f682f0ad1a6531be5848001c3af4de01a0c8a688d0881555b2c13
MD5 bef7832b347380e9e507e2930ec29a99
BLAKE2b-256 97c589b940ac3b0498bb6c70c5ef5cd86d5e7df9968bf026b775fffca8fbdf1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page