Skip to main content

A call graph generator designed for codebase RAG

Project description

Scope

A call graph generator designed for codebase RAG. Uses a combination of LSP and AST parsing to achieve very high accuracy, even for dynamic languages.

  • Supports 10+ popular languages
    • JavaScript
    • Python
    • TypeScript
    • Rust
    • C#
    • Java
    • Go
    • Ruby
    • Dart
    • C
    • C++
    • PHP
  • Can be used programmatically or via the command-line
  • Provides easy retrieval methods (e.g. definitions(), references(), calltree(), etc.)

Install

> pip install codescope

If you want to use the LSP Callgraph, you'll need to have each language you'd like to parse installed on your machine and in your PATH.

LSP Callgraph Usage

from scope import CallGraph
from withrepo import repo
from scope.enums.CallTreeType import CallTreeType

# Build a call graph from a directory, it works with multi-language codebases too
cg = CallGraph.build("./my_codebase")

# or call repo
with repo("shobrook", "openlimit") as r:
    cg = CallGraph.build(r.path)

# Save/load call graphs
json_str = cg.json()  # serialize to JSON
cg = CallGraph.from_json(json_str)  # load from JSON

# Get all file paths in the call graph
paths = cg.paths()
# Filter paths with a callback
python_files = cg.paths(lambda path: path.endswith('.py'))

# Get all function/class definitions
definitions = cg.definitions()
# Filter definitions with a callback
class_defs = cg.definitions(lambda path, defn: defn.type == 'class')

# Get all references (function calls)
references = cg.references()
# Filter references with callbacks
filtered_refs = cg.references(
    cb_defn=lambda path, defn: defn.name == 'main',
    cb_ref=lambda path, ref: 'test' not in ref.path
)

# Generate call trees (who calls what)
def_obj = definitions[0]  # get a specific definition
# Get downstream calls (what this function calls)
downstream = cg.calltree(def_obj, CallTreeType.DOWN, depth=2)
# Get upstream calls (who calls this function)
upstream = cg.calltree(def_obj, CallTreeType.UP, depth=2)

Approximate Callgraph Usage

from scope import ScopeAST, ASTNode, ApproximateCallGraph
from withrepo import repo

# Build an approximate callgraph from a repo
with repo("shobrook", "openlimit") as r:
    files = r.tree()
    files_to_dto = []
    # provide your own File class (just path, abs_path, content)
    for file in files:
        files_to_dto.append(File(file.path, file.abs_path, content=file.content))

    ast = ScopeAST(files_to_dto, timeit=True)
    callgraph = ApproximateCallGraph.build(ast, timeit=True, progress_bar=True)
    print(callgraph)

    # serialize (edges are int tuples, so no need to explicitly serialize)
    nodes_to_dict = [node.to_dict() for node in callgraph.nodes]
    # deserialize callgraph from nodes and edges
    back_to_nodes = [ASTNode.from_dict(d) for d in nodes_to_dict]
    callgraph_from_nodes = ApproximateCallGraph.from_nodes_and_edges(
        back_to_nodes, callgraph.edges
    )

    # convert to an igraph for network analysis or visualization
    igraph = callgraph.to_igraph()

    # more advanced usage
    print(callgraph.indexing_queue())

Contribution and Development

We welcome PRs! To run tests: pytest tests/scope

Roadmap

Category Feature Status
Performance Async multilspy support In Progress
Tree-sitter fallback for unsupported languages In Progress
LSP-free fastpath (only tree-sitter) for approximate callgraphs In Progress
Caching for common operations In Progress
Core Architecture ID-based indexing and serialization In Progress
Enhanced logging system In Progress
Pydantic schema migration Planned
Features Incremental graph upserting/updating Planned
Subgraph extraction Planned
Definition/Reference deduplication Planned
Visualization GraphViz schema support Planned
Mermaid schema support Planned
Documentation API documentation In Progress
CLI documentation Planned
Tools CLI Planned
Interactive debugging mode Planned
Standalone server Planned
Research Dataflow diagram extraction Exploring
Temporal callgraphs Planned
Filegraphs In-progress
Evals for RAG Performance In-progress

Architecture

Please see the scope/callgraph/README.md for more information about how both the LSP and tree-sitter callgraphs work under the hood.

Limitations

For LSP Callgraphs ONLY: Scope isn't currently fully optimized the library for indexing performance or Callgraph size yet. Larger codebases may take a 30 seconds to a few minutes to index, but you'll need to do that very infrequently. CallTree objects are also not true trees, but rather a list of CallStack objects, e.g each possible path from the root to the leaf. We also don't yet support languages like C/C++, Zig, nor do we support common mobile languages like Swift, Kotlin, or Objective-C. Let us know if you'd like support for your language, and we'll prioritize it.

Acknowledgements

Scope was built in part with the collaboration of Microsoft Research. Adrenaline AI is also contributing to their library, multilspy.

@inproceedings{NEURIPS2023_662b1774,
 author = {Agrawal, Lakshya A and Kanade, Aditya and Goyal, Navin and Lahiri, Shuvendu and Rajamani, Sriram},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
 pages = {32270--32298},
 publisher = {Curran Associates, Inc.},
 title = {Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context},
 url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/662b1774ba8845fc1fa3d1fc0177ceeb-Paper-Conference.pdf},
 volume = {36},
 year = {2023}
}

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codescope-1.0.1.tar.gz (48.9 kB view details)

Uploaded Source

Built Distribution

codescope-1.0.1-py3-none-any.whl (63.2 kB view details)

Uploaded Python 3

File details

Details for the file codescope-1.0.1.tar.gz.

File metadata

  • Download URL: codescope-1.0.1.tar.gz
  • Upload date:
  • Size: 48.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for codescope-1.0.1.tar.gz
Algorithm Hash digest
SHA256 ad97f4cd9a6990a43a2ede9ab494ebed50c5e5621ed0e89f23bfe1fff1224d36
MD5 eee85664ebbc6b94798b0af2e818bd5b
BLAKE2b-256 459c91f4fca086b9613540172a567833276c60e5766e26dd3a4f529bb77c9dfa

See more details on using hashes here.

Provenance

The following attestation bundles were made for codescope-1.0.1.tar.gz:

Publisher: publish.yml on Adrenaline-AI/scope

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file codescope-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: codescope-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 63.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for codescope-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 961e399388e9157a1d575fef0674f06634c6e4b91b29d72123d1a2f1f0161a4a
MD5 fdc2c9dafca410514804832020b95f98
BLAKE2b-256 e603a96bd5d4900e3c9c79c25845dd983b16712153154cb2a76a673b1e60870f

See more details on using hashes here.

Provenance

The following attestation bundles were made for codescope-1.0.1-py3-none-any.whl:

Publisher: publish.yml on Adrenaline-AI/scope

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page