Skip to main content

Detect silent information loss at system boundaries — semantic exception analysis and round-trip data loss fuzzing for Python

Project description

Crossing

Detect silent information loss at system boundaries in Python codebases.

Two Tools

1. Semantic Scanner — Exception Pattern Analysis

Find where the same exception type carries different meanings depending on the code path, but handlers can't distinguish them.

# Basic scan
crossing-semantic /path/to/project

# With implicit raises (dict access, getattr, etc.)
crossing-semantic --implicit /path/to/project

# JSON output for tooling
crossing-semantic --format json /path/to/project

# CI mode: fail if elevated/high risk crossings found
crossing-semantic --ci --min-risk elevated /path/to/project

Example: a KeyError that means "config key missing" and a KeyError that means "factor-filtered to empty" arrive at the same except KeyError handler. The handler assumes one meaning. The bug is silent.

2. Data Loss Fuzzer — Round-Trip Testing

Test whether information survives boundary crossings: serialization, API calls, database writes, format conversions.

from crossing import Crossing, cross

c = Crossing(
    encode=lambda d: json.dumps(d),
    decode=lambda s: json.loads(s),
)

report = cross(c, samples=1000)
report.print()  # shows what was lost, where, and how

This isn't fuzzing for crashes. It's fuzzing for silent data loss — the operation succeeds but the output is missing something the input had.


Semantic Scanner

What It Finds

  • Polymorphic exceptions: Multiple raise sites for the same exception type, caught by handlers that don't distinguish between them
  • Cross-function crossings: Exceptions raised in called functions, caught by handlers in the caller
  • Cross-file crossings: Same pattern across module boundaries via import resolution
  • Implicit raises: dict[key] -> KeyError, getattr(obj, name) -> AttributeError, int(x) -> ValueError
  • Inheritance crossings: except ValueError catching subclass raises like ValidationError
  • Scope analysis: Whether handlers catch exceptions from direct raises or from called functions
  • Message differentiation: Risk downgraded when all raise sites pass distinct string messages

Risk Levels

Level Meaning
low Single raise site, or polymorphic with matching handler strategies
medium Multiple raise sites with uniform handler treatment
elevated Scope mismatches or cross-function reachability
high Many raise sites, few handlers, mixed implicit/explicit

CLI Options

crossing-semantic [OPTIONS] PATH

Options:
  --implicit          Detect implicit raises (dict access, getattr, etc.)
  --format FORMAT     Output format: text (default), json, markdown
  --min-risk LEVEL    Minimum risk to report: low, medium, elevated, high
  --exclude PATTERN   Exclude directories (repeatable)
  --ci                Exit code 1 if elevated/high risk crossings found

Example Output

============================================================
Semantic Crossing Scan: /path/to/tox
============================================================
Files scanned:        42
Exception raises:     87 (58 explicit, 29 implicit)
Exception handlers:   34
Semantic crossings:   12
  Polymorphic (multi-raise):  8
  Elevated risk:              3

--- KeyError: 3 raise sites, 14 handlers --- high risk ---
  3 raise sites across different loaders (API, TOML, INI),
  14 handlers catching without distinguishing source
============================================================

Information-Theoretic Scoring

Each crossing reports quantitative metrics based on Shannon entropy:

Metric What it measures
Semantic entropy Bits of information carried by the exception type at raise sites (log2 of distinct origins)
Handler discrimination Bits preserved by handlers (re-raise = full, return/pass = zero)
Information loss Bits destroyed: entropy minus discrimination
Collapse ratio Normalized loss: 0% (no collapse) to 100% (total meaning erasure)
--- AttributeError: 4 raise sites, 3 handlers — high risk ---
  Information: 2.0 bits entropy, 0.3 bits lost, 83% collapse

In JSON output, each crossing includes an information_theory object, and the summary includes total_information_loss_bits and mean_collapse_ratio across all crossings.

Real Bugs Found

The semantic scanner has identified real bugs in production codebases:

  • tox #3809: KeyError meaning "factor-filtered to empty" caught by handler expecting "key doesn't exist"
  • Rich #3960: Exception __notes__ leaking across chained exceptions
  • pytest #14214: Verbosity config not propagated across internal call boundary

Data Loss Fuzzer

Built-in Crossings

Crossing What it tests Typical loss rate
json_crossing() JSON with default=str ~24% lossy, 34% crashes
json_crossing_strict() JSON without fallback ~6% lossy, 52% crashes
pickle_crossing() Python pickle 0% (lossless baseline)
yaml_crossing() YAML safe_load ~0% lossy, 49% crashes
toml_crossing() TOML via tomllib/tomli_w varies
csv_crossing() CSV (everything becomes strings) ~82% lossy
env_file_crossing() .env files (KEY=VALUE) ~83% lossy
url_query_crossing() URL query string encoding ~80% lossy

Custom Crossings

from crossing import Crossing, cross

# Test your API serialization
c = Crossing(
    encode=lambda d: my_api_serialize(d),
    decode=lambda s: my_api_deserialize(s),
    name="My API boundary",
)
report = cross(c, samples=1000)
report.print()

Compose Pipelines

from crossing import compose, json_crossing, string_truncation_crossing, cross

# Simulate: serialize -> store in VARCHAR(100) -> deserialize
pipeline = compose(
    json_crossing(),
    string_truncation_crossing(100),
)
report = cross(pipeline, samples=500)

Codebase Scanning

python3 scan.py /path/to/project

Finds encode/decode pairs for: JSON, YAML, pickle, TOML, base64, URL encoding, CSV, struct, zlib, gzip.


GitHub Action

Add Crossing to your CI pipeline:

# .github/workflows/crossing.yml
name: Exception Analysis
on: [pull_request]

jobs:
  crossing:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: worksbyfriday/crossing@main
        with:
          path: 'src/'
          fail-on-risk: 'elevated'

Inputs: path, min-risk, format, implicit, exclude, fail-on-risk.


Benchmarks

Scanned 11 popular Python projects (Feb 2026):

Project Files Crossings High Risk Info Loss
pydantic 402 119 12 22.9 bits
sqlalchemy 661 103 16 79.8 bits
django 902 80 6
aiohttp 166 53 11 25.5 bits
click 62 14 5 7.4 bits
celery 161 12 3
flask 24 6 2
requests 18 5 2
rich 100 5 1
astroid 96 5 0
fastapi 47 0 0 0 bits

FastAPI scoring clean validates the tool. Sample audit reports: SQLAlchemy, Django, Celery, Flask, Requests.


API

Scan any installed Python package via HTTP:

curl https://api.fridayops.xyz/crossing/package/flask

Returns JSON with full crossing analysis, information theory metrics, and risk levels.

Audit report — full markdown report with findings, recommendations, and benchmarks:

curl https://api.fridayops.xyz/crossing/report/flask

Badge — embed in your README:

![crossing](https://api.fridayops.xyz/crossing/badge/flask)

crossing

All endpoints:

  • POST /crossing — scan raw Python source
  • GET /crossing/package/{name} — JSON scan results
  • GET /crossing/report/{name} — full markdown audit report
  • GET /crossing/badge/{name} — SVG badge
  • GET /crossing/benchmark — comparison data from 17 projects
  • GET /crossing/packages — list of example packages
  • GET /crossing/example — demo snippet

Install

pip install crossing

Or copy the files directly — no external dependencies. Python 3.10+.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crossing-1.0.0.tar.gz (35.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crossing-1.0.0-py3-none-any.whl (37.0 kB view details)

Uploaded Python 3

File details

Details for the file crossing-1.0.0.tar.gz.

File metadata

  • Download URL: crossing-1.0.0.tar.gz
  • Upload date:
  • Size: 35.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for crossing-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6e0e1ebe7b9c015d2c7dbe3be5089f26acf0bdf7baf8517e6802f56a6a332797
MD5 ea0914f20427730f003138e48594d807
BLAKE2b-256 373f2f35fd05c4f0ac8f5469977a817649c4118e9b083be8ad112ace7ef1251f

See more details on using hashes here.

File details

Details for the file crossing-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: crossing-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 37.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for crossing-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ea52fd156340e7b5872331f5188126dfb1a03c914770b2edeb45388add0abe17
MD5 177b5a78594d266f24d3cb8544a050b4
BLAKE2b-256 a22974cce7025d3df6e9e610c3de6460b62d2d35236d26bab86895da2bf9db8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page