Catch real Python bugs before production — 99%+ accuracy, Z3 symbolic execution, LLM-powered false-positive filtering, zero-config GitHub CI
Project description
PythonFromScratch
A static analysis tool for Python that finds real bugs in large codebases using bytecode analysis, barrier-certificate proofs, and Z3-backed symbolic execution.
Tested on Microsoft DeepSpeed (5,000+ functions) — found 6 confirmed true positives including silent data corruption and unguarded division-by-zero bugs, while automatically proving 87.6% of candidates as false positives.
Install
git clone https://github.com/halleyyoung/PythonFromScratch.git
cd PythonFromScratch
pip install -e .
Requires Python ≥ 3.11 and z3-solver (installed automatically).
Walkthrough: Analyze a Real Project
1. Clone a target repo
git clone https://github.com/microsoft/DeepSpeed.git external_tools/DeepSpeed
2. Run the analyzer
python3.11 -m pyfromscratch external_tools/DeepSpeed/deepspeed/
This runs the full pipeline automatically:
======================================================================
PythonFromScratch — Full Project Analysis
Target: external_tools/DeepSpeed/deepspeed
======================================================================
STEP 1: BUILDING CALL GRAPH
Functions: 5003 (2.2s)
STEP 2: COMPUTING CRASH SUMMARIES
Summaries: 5003 (329.4s)
STEP 3: BUILDING CODE OBJECTS FOR DSE
Code objects: 5003 (0.0s)
STEP 4: BUG TYPE COVERAGE
2928 NULL_PTR
689 BOUNDS
358 ASSERT_FAIL
119 DIV_ZERO
35 RUNTIME_ERROR
...
STEP 5: BARRIER CERTIFICATE + DSE ANALYSIS
Total bug instances: 4613
Fully guarded (guards): 3008
Unguarded: 1605
Barrier results (35.8s):
Proven FP: 1031/1605
Remaining: 574
STEP 6: DSE RESULTS
DSE confirmed FP: 4
DSE confirmed TP: 493
STEP 7: TRUE POSITIVE CANDIDATES
Production code bugs: 571
Test-only code bugs: 3
TRUE POSITIVES (DSE-confirmed reachable):
⚠️ DIV_ZERO in utils.groups._ensure_divisibility
⚠️ DIV_ZERO in utils.timer.ThroughputTimer._is_report_boundary
⚠️ DIV_ZERO in inference.v2.inference_utils.ceil_div
...
SUMMARY
Functions analysed: 5003
Total bug instances: 4613
Proven false positive: 4039 (87.6%)
Remaining candidates: 574
DSE-confirmed TPs: 493
Results saved to results/deepspeed_results.pkl
3. Filter remaining false positives with Copilot
The analyzer's barrier certificates and DSE eliminate ~88% of false positives automatically. The remaining candidates include bugs that are technically reachable but may be guarded by framework invariants invisible at the bytecode level (e.g., "this parameter is always non-None because PyTorch guarantees it").
Ask GitHub Copilot (or any LLM) to triage the remaining candidates:
Look at the output from
python3.11 -m pyfromscratch external_tools/DeepSpeed/deepspeed/. For each remaining TP candidate, read the actual source code and callers to determine if it's a real bug or a false positive. Classify each as:
- REAL_BUG — genuinely reachable crash from user input or config
- INTENTIONAL_GUARD — deliberate
raise(working as designed)- FP_SELF — attribute access on
self(never None)- FP_FRAMEWORK — parameter guaranteed by framework (pytest, argparse, etc.)
- FP_INTERNAL — parameter guaranteed by internal plumbing
Write up the confirmed true positives in a markdown report.
This step typically reduces 500+ candidates down to 5–10 real bugs with source-level evidence.
See docs/TRUE_POSITIVE_ANALYSIS.md for our full DeepSpeed investigation.
Single-File Analysis
# Analyze one file
python3.11 -m pyfromscratch myfile.py
# Security analysis — treats each function as an entry point with tainted params
python3.11 -m pyfromscratch myfile.py --functions
# Verbose output
python3.11 -m pyfromscratch myfile.py --verbose
Exit codes: 0 = SAFE, 1 = BUG found, 2 = UNKNOWN, 3 = error
All Options
| Option | Description |
|---|---|
--verbose |
Detailed output |
--functions |
Treat each function as a tainted entry point |
--all-functions |
Analyze ALL functions as entry points |
--interprocedural |
Cross-function taint analysis with call graph |
--entry-points NAME,... |
Specify entry point functions |
--min-confidence 0.0-1.0 |
Filter bugs by confidence score |
--deduplicate |
Deduplicate findings by type + location |
--save-results PATH |
Custom output path (default: results/<name>_results.pkl) |
--context-depth N |
k-CFA context sensitivity (0, 1, 2, ...) |
--check-termination |
Detect non-terminating loops |
--synthesize-invariants |
Generate inductive loop invariants |
--no-concolic |
Pure symbolic analysis (no concrete execution) |
Detected Bug Types
Security Vulnerabilities (47 types)
Injection
SQL_INJECTION— Unsanitized input in SQL queriesCOMMAND_INJECTION— Shell command injectionCODE_INJECTION— Eval/exec with untrusted dataPATH_INJECTION— Path traversal attacksLDAP_INJECTION,XPATH_INJECTION,NOSQL_INJECTIONREGEX_INJECTION— ReDoS via user-controlled patternsHEADER_INJECTION,COOKIE_INJECTION
Cross-Site Scripting (XSS)
REFLECTED_XSS— User input reflected in HTML output
Server-Side Request Forgery
SSRF— Requests to user-controlled URLsPARTIAL_SSRF— Partial URL control
Deserialization
UNSAFE_DESERIALIZATION— Pickle/YAML with untrusted dataXXE— XML External Entity injectionXML_BOMB— Billion laughs attack
Sensitive Data
CLEARTEXT_LOGGING— Passwords/secrets in logsCLEARTEXT_STORAGE— Unencrypted sensitive dataHARDCODED_CREDENTIALS
Cryptography
WEAK_CRYPTO— MD5/SHA1 for securityWEAK_CRYPTO_KEY— Insufficient key sizesBROKEN_CRYPTO_ALGORITHM— DES, RC4, etc.INSECURE_PROTOCOL— HTTP, FTP, Telnet
Web Security
URL_REDIRECT— Open redirect vulnerabilitiesCSRF_PROTECTION_DISABLEDFLASK_DEBUG— Debug mode in productionINSECURE_COOKIE— Missing Secure/HttpOnly flagsJINJA2_AUTOESCAPE_FALSE
File System
TAR_SLIP— Tar extraction path traversalINSECURE_TEMPORARY_FILEWEAK_FILE_PERMISSIONS
Network
BIND_TO_ALL_INTERFACES— 0.0.0.0 bindingMISSING_HOST_KEY_VALIDATIONCERT_VALIDATION_DISABLED
Regex
REDOS— Catastrophic backtrackingPOLYNOMIAL_REDOSBAD_TAG_FILTERINCOMPLETE_HOSTNAME_REGEXP
Core Bug Types (20 types)
DIV_ZERO— Division by zeroNULL_PTR— None dereferenceBOUNDS— Index out of boundsTYPE_CONFUSION— Type errorsASSERT_FAIL— Failed assertionsINTEGER_OVERFLOWNON_TERMINATION— Infinite loopsMEMORY_LEAK,USE_AFTER_FREE,DOUBLE_FREEDATA_RACE,DEADLOCKINFO_LEAK,TIMING_CHANNEL
Examples
Finding SQL Injection
# vulnerable.py
import sqlite3
def get_user(user_id):
conn = sqlite3.connect('users.db')
cursor = conn.cursor()
cursor.execute(f"SELECT * FROM users WHERE id = {user_id}") # BUG!
return cursor.fetchone()
$ pyfromscratch vulnerable.py --functions
Analyzing: vulnerable.py
Function-level entry points: 1
get_user: BUG
SQL_INJECTION: Tainted value flows to SQL query at line 7
Total bugs found: 1
Verifying Safe Code
# safe.py
import sqlite3
def get_user(user_id):
conn = sqlite3.connect('users.db')
cursor = conn.cursor()
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,)) # Safe!
return cursor.fetchone()
$ pyfromscratch safe.py --functions
Analyzing: safe.py
Function-level entry points: 1
get_user: SAFE
Total bugs found: 0
How It Works
The analyzer runs a 7-step pipeline on a project directory:
- Call Graph — Builds a whole-program call graph from all
.pyfiles - Crash Summaries — Disassembles bytecode, finds unguarded divisions, None-dereferences, out-of-bounds accesses, etc.
- Code Objects — Extracts Python code objects for symbolic execution
- Guard Detection — Identifies bugs already protected by
if,try/except,assert,isinstancechecks - Barrier Certificates — 10 proof patterns (assume-guarantee, post-condition, refinement types, inductive invariants, control flow, dataflow, disjunctive, callee return-guarantee, validated params, DSE confirmation) attempt to formally prove each remaining bug is unreachable
- DSE (Z3) — Dynamic symbolic execution confirms whether a concrete input can trigger each surviving bug
- Classification — Separates production code from test code, reports true positive candidates
The tool produces one of three verdicts per bug:
- FP (proven) — barrier certificate or DSE proves the bug is unreachable
- TP candidate — no proof found; needs human/LLM triage
- DSE-confirmed TP — Z3 found a satisfying assignment that reaches the bug
Architecture
pyfromscratch/
├── __main__.py # python -m pyfromscratch entry point
├── cli.py # CLI: single-file and project-directory analysis
├── analyzer.py # Core analysis engine
├── frontend/ # Python loading, bytecode compilation
├── cfg/ # Control-flow graph + call graph construction
├── semantics/ # Symbolic bytecode execution, crash summaries
├── z3model/ # Z3 value/heap modeling
├── unsafe/ # Bug type predicates (67 types)
├── contracts/ # External call modeling, taint sources/sinks
├── dse/ # Concolic execution oracle (Z3-backed)
└── barriers/ # Barrier certificate synthesis (10 patterns)
Docker
# Build
docker build -t pyfromscratch .
# Analyze a directory
docker run --rm -v $(pwd)/my_project:/target pyfromscratch /target
# Analyze a single file
docker run --rm -v $(pwd):/code pyfromscratch /code/myfile.py --functions
Development
pytest # Run tests
pytest --cov=pyfromscratch # With coverage
License
See LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file a3_python-0.1.11.tar.gz.
File metadata
- Download URL: a3_python-0.1.11.tar.gz
- Upload date:
- Size: 285.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c63d07bfe79ab9b01e4461a370a511933676e4ed14adf20267d502c210bdd2c2
|
|
| MD5 |
987e6535630559c1fc9f8172922cc1a2
|
|
| BLAKE2b-256 |
a080ce313ff07067e8841c725a34434cb4007214138b3f02d7b6c1327224aaa7
|
Provenance
The following attestation bundles were made for a3_python-0.1.11.tar.gz:
Publisher:
publish.yml on thehalleyyoung/PythonFromScratch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
a3_python-0.1.11.tar.gz -
Subject digest:
c63d07bfe79ab9b01e4461a370a511933676e4ed14adf20267d502c210bdd2c2 - Sigstore transparency entry: 938252606
- Sigstore integration time:
-
Permalink:
thehalleyyoung/PythonFromScratch@831e2b078b9ec88511d1ed9b4d3b82b54aea119d -
Branch / Tag:
refs/tags/v0.1.11 - Owner: https://github.com/thehalleyyoung
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@831e2b078b9ec88511d1ed9b4d3b82b54aea119d -
Trigger Event:
push
-
Statement type:
File details
Details for the file a3_python-0.1.11-py3-none-any.whl.
File metadata
- Download URL: a3_python-0.1.11-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
001d3f92bafa14beece4748ef5b90383a07cc9457a41cfba2f84d6011f5bde58
|
|
| MD5 |
676e7500cfdef902ab070e7a614f0bee
|
|
| BLAKE2b-256 |
dfc2162025959148ec2580f6cb5dd6589a30e78d69bcd51a3a50abdb34b885e2
|
Provenance
The following attestation bundles were made for a3_python-0.1.11-py3-none-any.whl:
Publisher:
publish.yml on thehalleyyoung/PythonFromScratch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
a3_python-0.1.11-py3-none-any.whl -
Subject digest:
001d3f92bafa14beece4748ef5b90383a07cc9457a41cfba2f84d6011f5bde58 - Sigstore transparency entry: 938252610
- Sigstore integration time:
-
Permalink:
thehalleyyoung/PythonFromScratch@831e2b078b9ec88511d1ed9b4d3b82b54aea119d -
Branch / Tag:
refs/tags/v0.1.11 - Owner: https://github.com/thehalleyyoung
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@831e2b078b9ec88511d1ed9b4d3b82b54aea119d -
Trigger Event:
push
-
Statement type: