Next-generation codebase analysis toolkit.
Project description
ScubaTrace
Next-Generation Codebase Analysis Toolkit.
Features
- Multi-Language Support (C, C++, Java, Python, JavaScript, Go)
- No Need To Compile
- Statement-Based AST Abstraction
- Code Call Graph
- Code Control Flow Graph
- Code Data/Control Dependency Graph
- References Inference
- CPG Based Multi-Granularity Slicing
Install
pip install scubatrace
Usage
Project-Level Analysis
Load a project (codebase)
proj = scubatrace.CProject("path/to/your/codebase", enable_lsp=True)
Call Graph
# Get the call graph of the project
callgraph = proj.callgraph
# Export call graph to a dot file
proj.export_callgraph("callgraph.dot")
Code Search
stat = proj.search_function("relative/path/to/your/file.c", start_line=20)
File-Level Analysis
Load a file from a project
file = proj.files["relative/path/to/your/file.c"]
Function-Level Analysis
Load a function from a file
the_first_func = file.functions[0]
func_in_tenth_line = file.function_by_line(10)
Call Relationships
def callers(self) -> dict[Function, list[Statement]]: ...
def callees(self) -> dict[Function, list[Statement]]: ...
def calls(self) -> list[Statement]: ...
Function Control Flow Graph
# Export the control flow graph to a dot file
func.export_cfg_dot("cfg.dot")
Function Data Dependency Graph
# Export the data dependency graph to a dot file
func.export_cfg_dot("ddg.dot", with_ddg=True)
Function Control Dependency Graph
# Export the control dependency graph to a dot file
func.export_cfg_dot("cdg.dot", with_cdg=True)
Function Code Walk
statements_you_interest = list(
func.walk_backward(
filter=lambda x: x.is_jump_statement,
stop_by=lambda x: x.is_jump_statement,
depth=-1,
base="control",
)
)
statements_you_interest = list(
func.walk_forward(
filter=lambda x: x.is_jump_statement,
stop_by=lambda x: x.is_jump_statement,
depth=-1,
base="control",
)
)
Multi-Granularity Slicing
# Slicing by lines
lines_you_interest = [4, 5, 19]
slice_statements = func.slice_by_lines(
lines=lines_you_interest,
control_depth=3,
data_dependent_depth=5,
control_dependent_depth=2,
)
# Slicing by statements
statements_you_interest = func.statements[0:3]
slice_statements = func.slice_by_statements(
statements=statements_you_interest,
control_depth=3,
data_dependent_depth=5,
control_dependent_depth=2,
)
Statement-Level Analysis
Load a statement from a function
the_first_stmt = the_first_func.statements[0]
stmt_in_second_line = the_first_func.statement_by_line(2)
stmt_by_type = func.statements_by_type('tree-sitter Queries', recursive=True)
Statement Controls
pre_controls: list[Statement] = stat.pre_controls
post_controls: list[Statement] = stat.post_controls
Statement Data Dependencies
pre_data_dependents: dict[Identifier, list[Statement]] = stat.pre_data_dependents
post_data_dependents: dict[Identifier, list[Statement]] = stat.post_data_dependents
Statement Control Dependencies
pre_control_dependents: list[Statement] = stat.pre_control_dependents
post_control_dependents: list[Statement] = stat.post_control_dependents
Statement References
references: dict[Identifier, list[Statement]] = stat.references
Statement Definitions
definitions: dict[Identifier, list[Statement]] = stat.definitions
Taint Analysis
# Check if the statement is tainted from function entry
is_taint_from_entry: bool = stat.is_taint_from_entry
AST Node
You can also get the AST node from a file, function, or statement.
file_ast = file.node
func_ast = func.node
stmt_ast = stat.node
ScubaTrace Landscape
Comparison with Other Tools
| Tool | Type | Capabilities | Requires Compilation (Instruction) | Supported Languages | Limitations |
|---|---|---|---|---|---|
| ScubaTrace | Lib | CG/CFG/DataFlow/Slicing | ✅ No | Multiple Languages | |
| Soot | CLI/Lib (Java) | CG/CFG/DataFlow | ❌ Yes | Java (Bytecode) | Cannot directly analyze the source code |
| LLVM | CLI/Lib (C) | CG/CFG/DataFlow | ❌ Yes | C/C++ (IR) | Cannot directly analyze the source code |
| pycallgraph | CLI | CG | ✅ No | Python | Does not provide a library, requires parsing the tool output |
| pycg | CLI | CG | ✅ No | Python | Precision is low, requires parsing the tool output, no longer maintained |
| Jelly | CLI | CG | ✅ No | JavaScript | Incomplete call graph (CG), the generated output requires further processing |
| Infer | OCaml | CG/CFG/DataFlow | ❌ Yes | Multiple Languages | 1. High cost of adaptation |
| CodeQL | QL | CG/CFG/DataFlow | ❌ Required for compiled languages ✅ Not required for interpreted languages |
Multiple Languages | 1. Compiled languages require compilation 2. Requires learning QL and using it for analysis 3. Lower performance, slow for large-scale projects |
| Joern | CLI/Scala | CG/CFG/DataFlow | ✅ No | Multiple Languages | 1. The generated CG and other results cannot be directly used, require further processing 2. Generated CG graphs are prone to errors in resolving output failures 3. Lower performance, slow for large-scale projects |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scubatrace-0.9.5.tar.gz.
File metadata
- Download URL: scubatrace-0.9.5.tar.gz
- Upload date:
- Size: 40.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f2bc941f30eb4d9c9577644c3fb1cf173c359d8c429285cca092c3443a04e62
|
|
| MD5 |
955ba8673bb0247d9edccc0450a3e3b5
|
|
| BLAKE2b-256 |
2bbd7ab5cfd232a7d72cd32341d2be137b49d246311acf4cc8bab51612d736b4
|
Provenance
The following attestation bundles were made for scubatrace-0.9.5.tar.gz:
Publisher:
python-publish.yml on SunBK201/ScubaTrace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scubatrace-0.9.5.tar.gz -
Subject digest:
5f2bc941f30eb4d9c9577644c3fb1cf173c359d8c429285cca092c3443a04e62 - Sigstore transparency entry: 301489838
- Sigstore integration time:
-
Permalink:
SunBK201/ScubaTrace@39f47e25e15dc6fdd5a8e2596cb910aae35aacef -
Branch / Tag:
refs/tags/0.9.5 - Owner: https://github.com/SunBK201
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@39f47e25e15dc6fdd5a8e2596cb910aae35aacef -
Trigger Event:
push
-
Statement type:
File details
Details for the file scubatrace-0.9.5-py3-none-any.whl.
File metadata
- Download URL: scubatrace-0.9.5-py3-none-any.whl
- Upload date:
- Size: 49.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b1466acfe059498a8a4ba956f8a28d0c5ff00bea9618cc13da0d5567241cfdf
|
|
| MD5 |
6b327c13706f7dd7075605a2169d17f8
|
|
| BLAKE2b-256 |
e8839e39b5aab0b42830a2e31fb951d6fe901b540070aef7c1aa959d94a71aba
|
Provenance
The following attestation bundles were made for scubatrace-0.9.5-py3-none-any.whl:
Publisher:
python-publish.yml on SunBK201/ScubaTrace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scubatrace-0.9.5-py3-none-any.whl -
Subject digest:
7b1466acfe059498a8a4ba956f8a28d0c5ff00bea9618cc13da0d5567241cfdf - Sigstore transparency entry: 301489845
- Sigstore integration time:
-
Permalink:
SunBK201/ScubaTrace@39f47e25e15dc6fdd5a8e2596cb910aae35aacef -
Branch / Tag:
refs/tags/0.9.5 - Owner: https://github.com/SunBK201
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@39f47e25e15dc6fdd5a8e2596cb910aae35aacef -
Trigger Event:
push
-
Statement type: