Skip to main content

Next-generation codebase analysis toolkit.

Project description

ScubaTrace

Next-Generation Codebase Analysis Toolkit.


Features

  • Multi-Language Support (C, C++, Java, Python, JavaScript, Go)
  • No Need To Compile
  • Statement-Based AST Abstraction
  • Code Call Graph
  • Code Control Flow Graph
  • Code Data/Control Dependency Graph
  • References Inference
  • CPG Based Multi-Granularity Slicing

Install

pip install scubatrace

Usage

Project-Level Analysis

Load a project (codebase)

proj = scubatrace.CProject("path/to/your/codebase", enable_lsp=True)

Call Graph

# Get the call graph of the project
callgraph = proj.callgraph
# Export call graph to a dot file
proj.export_callgraph("callgraph.dot")

Code Search

stat = proj.search_function("relative/path/to/your/file.c", start_line=20)

File-Level Analysis

Load a file from a project

file = proj.files["relative/path/to/your/file.c"]

Function-Level Analysis

Load a function from a file

the_first_func = file.functions[0]
func_in_tenth_line = file.function_by_line(10)

Call Relationships

callers = func.callers
callfrom, callto, callsite_line, callsite_column = (
    callers[0].src,
    callers[0].dst,
    callers[0].line,
    callers[0].column,
)
callees = func.callees
callfrom, callto, callsite_line, callsite_column = (
    callees[0].src,
    callees[0].dst,
    callees[0].line,
    callees[0].column,
)

Function Control Flow Graph

# Export the control flow graph to a dot file
func.export_cfg_dot("cfg.dot")

Function Data Dependency Graph

# Export the data dependency graph to a dot file
func.export_cfg_dot("ddg.dot", with_ddg=True)

Function Control Dependency Graph

# Export the control dependency graph to a dot file
func.export_cfg_dot("cdg.dot", with_cdg=True)

Function Code Walk

statements_you_interest = list(
    func.walk_backward(
        filter=lambda x: x.is_jump_statement,
        stop_by=lambda x: x.is_jump_statement,
        depth=-1,
        base="control",
    )
)
statements_you_interest = list(
    func.walk_forward(
        filter=lambda x: x.is_jump_statement,
        stop_by=lambda x: x.is_jump_statement,
        depth=-1,
        base="control",
    )
)

Multi-Granularity Slicing

# Slicing by lines
lines_you_interest = [4, 5, 19]
slice_statements = func.slice_by_lines(
    lines=lines_you_interest,
    control_depth=3,
    data_dependent_depth=5,
    control_dependent_depth=2,
)

# Slicing by statements
statements_you_interest = func.statements[0:3]
slice_statements = func.slice_by_statements(
    statements=statements_you_interest,
    control_depth=3,
    data_dependent_depth=5,
    control_dependent_depth=2,
)

Statement-Level Analysis

Load a statement from a function

the_first_stmt = the_first_func.statements[0]
stmt_in_second_line = the_first_func.statement_by_line(2)
stmt_by_type = func.statements_by_type('tree-sitter Queries', recursive=True)

Statement Controls

pre_controls: list[Statement] = stat.pre_controls
post_controls: list[Statement] = stat.post_controls

Statement Data Dependencies

pre_data_dependents: dict[Identifier, list[Statement]] = stat.pre_data_dependents
post_data_dependents: dict[Identifier, list[Statement]] = stat.post_data_dependents

Statement Control Dependencies

pre_control_dependents: list[Statement] = stat.pre_control_dependents
post_control_dependents: list[Statement] = stat.post_control_dependents

Statement References

references: dict[Identifier, list[Statement]] = stat.references

Statement Definitions

definitions: dict[Identifier, list[Statement]] = stat.definitions

Taint Analysis

# Check if the statement is tainted from function entry
is_taint_from_entry: bool = stat.is_taint_from_entry

AST Node

You can also get the AST node from a file, function, or statement.

file_ast = file.node
func_ast = func.node
stmt_ast = stat.node

ScubaTrace Landscape

ScubaTrace Landscape

Comparison with Other Tools

Tool Type Capabilities Requires Compilation (Instruction) Supported Languages Limitations
ScubaTrace Lib CG/CFG/DataFlow/Slicing ✅ No Multiple Languages
Soot CLI/Lib (Java) CG/CFG/DataFlow ❌ Yes Java (Bytecode) Cannot directly analyze the source code
LLVM CLI/Lib (C) CG/CFG/DataFlow ❌ Yes C/C++ (IR) Cannot directly analyze the source code
pycallgraph CLI CG ✅ No Python Does not provide a library, requires parsing the tool output
pycg CLI CG ✅ No Python Precision is low, requires parsing the tool output, no longer maintained
Jelly CLI CG ✅ No JavaScript Incomplete call graph (CG), the generated output requires further processing
Infer OCaml CG/CFG/DataFlow ❌ Yes Multiple Languages 1. High cost of adaptation
CodeQL QL CG/CFG/DataFlow ❌ Required for compiled languages
✅ Not required for interpreted languages
Multiple Languages 1. Compiled languages require compilation
2. Requires learning QL and using it for analysis
3. Lower performance, slow for large-scale projects
Joern CLI/Scala CG/CFG/DataFlow ✅ No Multiple Languages 1. The generated CG and other results cannot be directly used, require further processing
2. Generated CG graphs are prone to errors in resolving output failures
3. Lower performance, slow for large-scale projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scubatrace-0.8.0.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scubatrace-0.8.0-py3-none-any.whl (40.0 kB view details)

Uploaded Python 3

File details

Details for the file scubatrace-0.8.0.tar.gz.

File metadata

  • Download URL: scubatrace-0.8.0.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for scubatrace-0.8.0.tar.gz
Algorithm Hash digest
SHA256 cddb41400285418129f1943e122b56eb93ce5d358be8e7be723691187e17f4cb
MD5 9b1b03f2f6979a309d2c2f8c80c24773
BLAKE2b-256 ca6b2f26c63b7034f9c24b27bed91ccaf3790bc9159cd4058850286d77f30e79

See more details on using hashes here.

Provenance

The following attestation bundles were made for scubatrace-0.8.0.tar.gz:

Publisher: python-publish.yml on SunBK201/ScubaTrace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scubatrace-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: scubatrace-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 40.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for scubatrace-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c085bd1063ee8edc31e40c32d16eadb6e9e0886b9273318e6d67c3567899b1ac
MD5 3f50ac06d2b7d90c1d8613f86ffcb2fb
BLAKE2b-256 b687bee9bb9674632ec7ef4dcf3bfb48375b592f9abafdbd47c1447c3195c8e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for scubatrace-0.8.0-py3-none-any.whl:

Publisher: python-publish.yml on SunBK201/ScubaTrace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page