Skip to main content

Detect silent information loss at system boundaries — semantic exception analysis and round-trip data loss fuzzing for Python

Project description

Crossing

Detect silent information loss at system boundaries in Python codebases.

Two Tools

1. Semantic Scanner — Exception Pattern Analysis

Find where the same exception type carries different meanings depending on the code path, but handlers can't distinguish them.

# Basic scan
crossing-semantic /path/to/project

# With implicit raises (dict access, getattr, etc.)
crossing-semantic --implicit /path/to/project

# JSON output for tooling
crossing-semantic --format json /path/to/project

# CI mode: fail if elevated/high risk crossings found
crossing-semantic --ci --min-risk elevated /path/to/project

Example: a KeyError that means "config key missing" and a KeyError that means "factor-filtered to empty" arrive at the same except KeyError handler. The handler assumes one meaning. The bug is silent.

2. Data Loss Fuzzer — Round-Trip Testing

Test whether information survives boundary crossings: serialization, API calls, database writes, format conversions.

from crossing import Crossing, cross

c = Crossing(
    encode=lambda d: json.dumps(d),
    decode=lambda s: json.loads(s),
)

report = cross(c, samples=1000)
report.print()  # shows what was lost, where, and how

This isn't fuzzing for crashes. It's fuzzing for silent data loss — the operation succeeds but the output is missing something the input had.


Semantic Scanner

What It Finds

  • Polymorphic exceptions: Multiple raise sites for the same exception type, caught by handlers that don't distinguish between them
  • Cross-function crossings: Exceptions raised in called functions, caught by handlers in the caller
  • Cross-file crossings: Same pattern across module boundaries via import resolution
  • Implicit raises: dict[key] -> KeyError, getattr(obj, name) -> AttributeError, int(x) -> ValueError
  • Inheritance crossings: except ValueError catching subclass raises like ValidationError
  • Scope analysis: Whether handlers catch exceptions from direct raises or from called functions
  • Message differentiation: Risk downgraded when all raise sites pass distinct string messages

Risk Levels

Level Meaning
low Single raise site, or polymorphic with matching handler strategies
medium Multiple raise sites with uniform handler treatment
elevated Scope mismatches or cross-function reachability
high Many raise sites, few handlers, mixed implicit/explicit

CLI Options

crossing-semantic [OPTIONS] PATH

Options:
  --implicit          Detect implicit raises (dict access, getattr, etc.)
  --format FORMAT     Output format: text (default), json, markdown
  --min-risk LEVEL    Minimum risk to report: low, medium, elevated, high
  --exclude PATTERN   Exclude directories (repeatable)
  --ci                Exit code 1 if elevated/high risk crossings found

Example Output

============================================================
Semantic Crossing Scan: /path/to/tox
============================================================
Files scanned:        42
Exception raises:     87 (58 explicit, 29 implicit)
Exception handlers:   34
Semantic crossings:   12
  Polymorphic (multi-raise):  8
  Elevated risk:              3

--- KeyError: 3 raise sites, 14 handlers --- high risk ---
  3 raise sites across different loaders (API, TOML, INI),
  14 handlers catching without distinguishing source
============================================================

Information-Theoretic Scoring

Each crossing reports quantitative metrics based on Shannon entropy:

Metric What it measures
Semantic entropy Bits of information carried by the exception type at raise sites (log2 of distinct origins)
Handler discrimination Bits preserved by handlers (re-raise = full, return/pass = zero)
Information loss Bits destroyed: entropy minus discrimination
Collapse ratio Normalized loss: 0% (no collapse) to 100% (total meaning erasure)
--- AttributeError: 4 raise sites, 3 handlers — high risk ---
  Information: 2.0 bits entropy, 0.3 bits lost, 83% collapse

In JSON output, each crossing includes an information_theory object, and the summary includes total_information_loss_bits and mean_collapse_ratio across all crossings.

Real Bugs Found

The semantic scanner has identified real bugs in production codebases:

  • tox #3809: KeyError meaning "factor-filtered to empty" caught by handler expecting "key doesn't exist"
  • Rich #3960: Exception __notes__ leaking across chained exceptions
  • pytest #14214: Verbosity config not propagated across internal call boundary

Data Loss Fuzzer

Built-in Crossings

Crossing What it tests Typical loss rate
json_crossing() JSON with default=str ~24% lossy, 34% crashes
json_crossing_strict() JSON without fallback ~6% lossy, 52% crashes
pickle_crossing() Python pickle 0% (lossless baseline)
yaml_crossing() YAML safe_load ~0% lossy, 49% crashes
toml_crossing() TOML via tomllib/tomli_w varies
csv_crossing() CSV (everything becomes strings) ~82% lossy
env_file_crossing() .env files (KEY=VALUE) ~83% lossy
url_query_crossing() URL query string encoding ~80% lossy

Custom Crossings

from crossing import Crossing, cross

# Test your API serialization
c = Crossing(
    encode=lambda d: my_api_serialize(d),
    decode=lambda s: my_api_deserialize(s),
    name="My API boundary",
)
report = cross(c, samples=1000)
report.print()

Compose Pipelines

from crossing import compose, json_crossing, string_truncation_crossing, cross

# Simulate: serialize -> store in VARCHAR(100) -> deserialize
pipeline = compose(
    json_crossing(),
    string_truncation_crossing(100),
)
report = cross(pipeline, samples=500)

Codebase Scanning

python3 scan.py /path/to/project

Finds encode/decode pairs for: JSON, YAML, pickle, TOML, base64, URL encoding, CSV, struct, zlib, gzip.


GitHub Action

Add Crossing to your CI pipeline:

# .github/workflows/crossing.yml
name: Exception Analysis
on: [pull_request]

jobs:
  crossing:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: worksbyfriday/crossing@main
        with:
          path: 'src/'
          fail-on-risk: 'elevated'

Inputs: path, min-risk, format, implicit, exclude, fail-on-risk.


Benchmarks

Scanned 11 popular Python projects (Feb 2026):

Project Files Crossings High Risk Info Loss
pydantic 402 119 12 22.9 bits
sqlalchemy 661 103 16 79.8 bits
django 902 80 6
aiohttp 166 53 11 25.5 bits
click 62 14 5 7.4 bits
celery 161 12 3
flask 24 6 2
requests 18 5 2
rich 100 5 1
astroid 96 5 0
fastapi 47 0 0 0 bits

FastAPI scoring clean validates the tool. Sample audit reports: SQLAlchemy, Django, Celery, Flask, Requests.


API

Scan any installed Python package via HTTP:

curl https://api.fridayops.xyz/crossing/package/flask

Returns JSON with full crossing analysis, information theory metrics, and risk levels.

Audit report — full markdown report with findings, recommendations, and benchmarks:

curl https://api.fridayops.xyz/crossing/report/flask

Badge — embed in your README:

![crossing](https://api.fridayops.xyz/crossing/badge/flask)

crossing

All endpoints:

  • POST /crossing — scan raw Python source
  • GET /crossing/package/{name} — JSON scan results
  • GET /crossing/report/{name} — full markdown audit report
  • GET /crossing/badge/{name} — SVG badge
  • GET /crossing/benchmark — comparison data from 17 projects
  • GET /crossing/packages — list of example packages
  • GET /crossing/example — demo snippet

Install

pip install crossing

Or copy the files directly — no external dependencies. Python 3.10+.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crossing-1.1.0.tar.gz (36.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crossing-1.1.0-py3-none-any.whl (38.2 kB view details)

Uploaded Python 3

File details

Details for the file crossing-1.1.0.tar.gz.

File metadata

  • Download URL: crossing-1.1.0.tar.gz
  • Upload date:
  • Size: 36.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for crossing-1.1.0.tar.gz
Algorithm Hash digest
SHA256 45d1ccfd7d7e78a97798db96fe471892729e339a9eab3f024d59e40471255189
MD5 74e421009bc72147f7a45a34b28179d0
BLAKE2b-256 d8f5b51b90e1b438a9ae38f61541dd9d64f4bdfedc2e31555e1c793827d10831

See more details on using hashes here.

File details

Details for the file crossing-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: crossing-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 38.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for crossing-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e5242070e42df1f2e2fea837d6dbce4a249eff39c98ed251025c4b607e940e6c
MD5 61868d89b4dfdda1288b09696b297762
BLAKE2b-256 a91422fbf56b4236c3a1a7606f64ebd6c4d6068d20faf71ccb86b8e376363203

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page