Skip to main content

Detect silent information loss at system boundaries — semantic exception analysis and round-trip data loss fuzzing for Python

Project description

Crossing

Detect silent information loss at system boundaries in Python codebases.

Two Tools

1. Semantic Scanner — Exception Pattern Analysis

Find where the same exception type carries different meanings depending on the code path, but handlers can't distinguish them.

# Basic scan
crossing-semantic /path/to/project

# With implicit raises (dict access, getattr, etc.)
crossing-semantic --implicit /path/to/project

# JSON output for tooling
crossing-semantic --format json /path/to/project

# CI mode: fail if elevated/high risk crossings found
crossing-semantic --ci --min-risk elevated /path/to/project

Example: a KeyError that means "config key missing" and a KeyError that means "factor-filtered to empty" arrive at the same except KeyError handler. The handler assumes one meaning. The bug is silent.

2. Data Loss Fuzzer — Round-Trip Testing

Test whether information survives boundary crossings: serialization, API calls, database writes, format conversions.

from crossing import Crossing, cross

c = Crossing(
    encode=lambda d: json.dumps(d),
    decode=lambda s: json.loads(s),
)

report = cross(c, samples=1000)
report.print()  # shows what was lost, where, and how

This isn't fuzzing for crashes. It's fuzzing for silent data loss — the operation succeeds but the output is missing something the input had.


Semantic Scanner

What It Finds

  • Polymorphic exceptions: Multiple raise sites for the same exception type, caught by handlers that don't distinguish between them
  • Cross-function crossings: Exceptions raised in called functions, caught by handlers in the caller
  • Cross-file crossings: Same pattern across module boundaries via import resolution
  • Implicit raises: dict[key] -> KeyError, getattr(obj, name) -> AttributeError, int(x) -> ValueError
  • Inheritance crossings: except ValueError catching subclass raises like ValidationError
  • Scope analysis: Whether handlers catch exceptions from direct raises or from called functions
  • Message differentiation: Risk downgraded when all raise sites pass distinct string messages

Risk Levels

Level Meaning
low Single raise site, or polymorphic with matching handler strategies
medium Multiple raise sites with uniform handler treatment
elevated Scope mismatches or cross-function reachability
high Many raise sites, few handlers, mixed implicit/explicit

CLI Options

crossing-semantic [OPTIONS] PATH

Options:
  --implicit          Detect implicit raises (dict access, getattr, etc.)
  --format FORMAT     Output format: text (default), json, markdown
  --min-risk LEVEL    Minimum risk to report: low, medium, elevated, high
  --exclude PATTERN   Exclude directories (repeatable)
  --ci                Exit code 1 if elevated/high risk crossings found

Example Output

============================================================
Semantic Crossing Scan: /path/to/tox
============================================================
Files scanned:        42
Exception raises:     87 (58 explicit, 29 implicit)
Exception handlers:   34
Semantic crossings:   12
  Polymorphic (multi-raise):  8
  Elevated risk:              3

--- KeyError: 3 raise sites, 14 handlers --- high risk ---
  3 raise sites across different loaders (API, TOML, INI),
  14 handlers catching without distinguishing source
============================================================

Information-Theoretic Scoring

Each crossing reports quantitative metrics based on Shannon entropy:

Metric What it measures
Semantic entropy Bits of information carried by the exception type at raise sites (log2 of distinct origins)
Handler discrimination Bits preserved by handlers (re-raise = full, return/pass = zero)
Information loss Bits destroyed: entropy minus discrimination
Collapse ratio Normalized loss: 0% (no collapse) to 100% (total meaning erasure)
--- AttributeError: 4 raise sites, 3 handlers — high risk ---
  Information: 2.0 bits entropy, 0.3 bits lost, 83% collapse

In JSON output, each crossing includes an information_theory object, and the summary includes total_information_loss_bits and mean_collapse_ratio across all crossings.

Real Bugs Found

The semantic scanner has identified real bugs in production codebases:

  • tox #3809: KeyError meaning "factor-filtered to empty" caught by handler expecting "key doesn't exist"
  • Rich #3960: Exception __notes__ leaking across chained exceptions
  • pytest #14214: Verbosity config not propagated across internal call boundary

Data Loss Fuzzer

Built-in Crossings

Crossing What it tests Typical loss rate
json_crossing() JSON with default=str ~24% lossy, 34% crashes
json_crossing_strict() JSON without fallback ~6% lossy, 52% crashes
pickle_crossing() Python pickle 0% (lossless baseline)
yaml_crossing() YAML safe_load ~0% lossy, 49% crashes
toml_crossing() TOML via tomllib/tomli_w varies
csv_crossing() CSV (everything becomes strings) ~82% lossy
env_file_crossing() .env files (KEY=VALUE) ~83% lossy
url_query_crossing() URL query string encoding ~80% lossy

Custom Crossings

from crossing import Crossing, cross

# Test your API serialization
c = Crossing(
    encode=lambda d: my_api_serialize(d),
    decode=lambda s: my_api_deserialize(s),
    name="My API boundary",
)
report = cross(c, samples=1000)
report.print()

CLI

# Test a single format
crossing test json -n 500 --seed 42

# Test all built-in formats
crossing test -n 200

# Compare how two formats compose
crossing compose json csv -n 300

# Measure how loss scales with repeated crossings
crossing scale json --max-n 5

# List all available crossings
crossing list

Compose Pipelines

from crossing import compose, json_crossing, string_truncation_crossing, cross

# Simulate: serialize -> store in VARCHAR(100) -> deserialize
pipeline = compose(
    json_crossing(),
    string_truncation_crossing(100),
)
report = cross(pipeline, samples=500)

Diff

Compare how two boundaries handle the same data:

from crossing import diff, json_crossing, pickle_crossing

report = diff(json_crossing(), pickle_crossing(), samples=500)
print(f"{report.divergent_count} samples differ between JSON and pickle")

Scaling Analysis

Measure how loss rate changes when data passes through N copies of a boundary:

from crossing import scaling, json_crossing

sr = scaling(json_crossing(), max_n=5, samples=200)
# JSON is idempotent: loss happens on first pass, then saturates (exponent ≈ 0)
# Non-idempotent crossings show positive scaling exponents

Codebase Scanning

python3 scan.py /path/to/project

Finds encode/decode pairs for: JSON, YAML, pickle, TOML, base64, URL encoding, CSV, struct, zlib, gzip.


GitHub Action

Add Crossing to your CI pipeline:

# .github/workflows/crossing.yml
name: Exception Analysis
on: [pull_request]

jobs:
  crossing:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: worksbyfriday/crossing@main
        with:
          path: 'src/'
          fail-on-risk: 'elevated'

Inputs: path, min-risk, format, implicit, exclude, fail-on-risk.


Benchmarks

Scanned 11 popular Python projects (Feb 2026):

Project Files Crossings High Risk Info Loss
pydantic 402 119 12 22.9 bits
sqlalchemy 661 103 16 79.8 bits
django 902 80 6
aiohttp 166 53 11 25.5 bits
click 62 14 5 7.4 bits
celery 161 12 3
flask 24 6 2
requests 18 5 2
rich 100 5 1
astroid 96 5 0
fastapi 47 0 0 0 bits

FastAPI scoring clean validates the tool. Sample audit reports: SQLAlchemy, Django, Celery, Flask, Requests.


API

Scan any installed Python package via HTTP:

curl https://api.fridayops.xyz/crossing/package/flask

Returns JSON with full crossing analysis, information theory metrics, and risk levels.

Audit report — full markdown report with findings, recommendations, and benchmarks:

curl https://api.fridayops.xyz/crossing/report/flask

Badge — embed in your README:

![crossing](https://api.fridayops.xyz/crossing/badge/flask)

crossing

All endpoints:

  • POST /crossing — scan raw Python source
  • GET /crossing/package/{name} — JSON scan results
  • GET /crossing/report/{name} — full markdown audit report
  • GET /crossing/badge/{name} — SVG badge
  • GET /crossing/benchmark — comparison data from 17 projects
  • GET /crossing/packages — list of example packages
  • GET /crossing/example — demo snippet

Install

pip install crossing

Or copy the files directly — no external dependencies. Python 3.10+.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crossing-1.3.0.tar.gz (38.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crossing-1.3.0-py3-none-any.whl (40.6 kB view details)

Uploaded Python 3

File details

Details for the file crossing-1.3.0.tar.gz.

File metadata

  • Download URL: crossing-1.3.0.tar.gz
  • Upload date:
  • Size: 38.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for crossing-1.3.0.tar.gz
Algorithm Hash digest
SHA256 f4210fc95a3135ca8434824394d68be9cc29f00353eede1ebebfded57bae9dbb
MD5 0d64b60ae4792189e35a464ce8e6b542
BLAKE2b-256 fdbe329ac538e49b24257a17e1f6c5110d6607a6748221ba9b5ad5baa0a38c79

See more details on using hashes here.

File details

Details for the file crossing-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: crossing-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 40.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for crossing-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5b9c83d90ca20cb82bd19632cab3b9c3079dca002d94f6d7f1bf06e4664299ca
MD5 582ce027cf8417025f675d34ed0369cc
BLAKE2b-256 f9fb8b5f5822e7276b3ee52cc7f2e93ae400053629aff2ed8a432256cc4dbf29

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page