A call graph generator designed for codebase RAG
Project description
Scope
A call graph generator designed for codebase RAG. Uses a combination of LSP and AST parsing to achieve very high accuracy, even for dynamic languages.
- Supports 10+ popular languages
- JavaScript
- Python
- TypeScript
- Rust
- C#
- Java
- Go
- Ruby
- Dart
- C
- C++
- PHP
- Can be used programmatically or via the command-line
- Provides easy retrieval methods (e.g.
definitions()
,references()
,calltree()
, etc.)
Install
> pip install codescope
If you want to use the LSP Callgraph, you'll need to have each language you'd like to parse installed on your machine and in your PATH.
LSP Callgraph Usage
from scope import CallGraph
from withrepo import repo
from scope.enums.CallTreeType import CallTreeType
# Build a call graph from a directory, it works with multi-language codebases too
cg = CallGraph.build("./my_codebase")
# or call repo
with repo("shobrook", "openlimit") as r:
cg = CallGraph.build(r.path)
# Save/load call graphs
json_str = cg.json() # serialize to JSON
cg = CallGraph.from_json(json_str) # load from JSON
# Get all file paths in the call graph
paths = cg.paths()
# Filter paths with a callback
python_files = cg.paths(lambda path: path.endswith('.py'))
# Get all function/class definitions
definitions = cg.definitions()
# Filter definitions with a callback
class_defs = cg.definitions(lambda path, defn: defn.type == 'class')
# Get all references (function calls)
references = cg.references()
# Filter references with callbacks
filtered_refs = cg.references(
cb_defn=lambda path, defn: defn.name == 'main',
cb_ref=lambda path, ref: 'test' not in ref.path
)
# Generate call trees (who calls what)
def_obj = definitions[0] # get a specific definition
# Get downstream calls (what this function calls)
downstream = cg.calltree(def_obj, CallTreeType.DOWN, depth=2)
# Get upstream calls (who calls this function)
upstream = cg.calltree(def_obj, CallTreeType.UP, depth=2)
Approximate Callgraph Usage
from scope import ScopeAST, ASTNode, ApproximateCallGraph
from withrepo import repo
# Build an approximate callgraph from a repo
with repo("shobrook", "openlimit") as r:
files = r.tree()
files_to_dto = []
# provide your own File class (just path, abs_path, content)
for file in files:
files_to_dto.append(File(file.path, file.abs_path, content=file.content))
ast = ScopeAST(files_to_dto, timeit=True)
callgraph = ApproximateCallGraph.build(ast, timeit=True, progress_bar=True)
print(callgraph)
# serialize (edges are int tuples, so no need to explicitly serialize)
nodes_to_dict = [node.to_dict() for node in callgraph.nodes]
# deserialize callgraph from nodes and edges
back_to_nodes = [ASTNode.from_dict(d) for d in nodes_to_dict]
callgraph_from_nodes = ApproximateCallGraph.from_nodes_and_edges(
back_to_nodes, callgraph.edges
)
# convert to an igraph for network analysis or visualization
igraph = callgraph.to_igraph()
# more advanced usage
print(callgraph.indexing_queue())
Contribution and Development
We welcome PRs! To run tests: pytest tests/scope
Roadmap
Category | Feature | Status |
---|---|---|
Performance | Async multilspy support | In Progress |
Tree-sitter fallback for unsupported languages | In Progress | |
LSP-free fastpath (only tree-sitter) for approximate callgraphs | In Progress | |
Caching for common operations | In Progress | |
Core Architecture | ID-based indexing and serialization | In Progress |
Enhanced logging system | In Progress | |
Pydantic schema migration | Planned | |
Features | Incremental graph upserting/updating | Planned |
Subgraph extraction | Planned | |
Definition/Reference deduplication | Planned | |
Visualization | GraphViz schema support | Planned |
Mermaid schema support | Planned | |
Documentation | API documentation | In Progress |
CLI documentation | Planned | |
Tools | CLI | Planned |
Interactive debugging mode | Planned | |
Standalone server | Planned | |
Research | Dataflow diagram extraction | Exploring |
Temporal callgraphs | Planned | |
Filegraphs | In-progress | |
Evals for RAG Performance | In-progress |
Architecture
Please see the scope/callgraph/README.md for more information about how both the LSP and tree-sitter callgraphs work under the hood.
Limitations
For LSP Callgraphs ONLY: Scope isn't currently fully optimized the library for indexing performance or Callgraph size yet. Larger codebases may take a 30 seconds to a few minutes to index, but you'll need to do that very infrequently. CallTree
objects are also not true trees, but rather a list of CallStack
objects, e.g each possible path from the root to the leaf. We also don't yet support languages like C/C++, Zig, nor do we support common mobile languages like Swift, Kotlin, or Objective-C. Let us know if you'd like support for your language, and we'll prioritize it.
Acknowledgements
Scope was built in part with the collaboration of Microsoft Research. Adrenaline AI is also contributing to their library, multilspy.
@inproceedings{NEURIPS2023_662b1774,
author = {Agrawal, Lakshya A and Kanade, Aditya and Goyal, Navin and Lahiri, Shuvendu and Rajamani, Sriram},
booktitle = {Advances in Neural Information Processing Systems},
editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
pages = {32270--32298},
publisher = {Curran Associates, Inc.},
title = {Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context},
url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/662b1774ba8845fc1fa3d1fc0177ceeb-Paper-Conference.pdf},
volume = {36},
year = {2023}
}
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file codescope-1.0.1.tar.gz
.
File metadata
- Download URL: codescope-1.0.1.tar.gz
- Upload date:
- Size: 48.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
ad97f4cd9a6990a43a2ede9ab494ebed50c5e5621ed0e89f23bfe1fff1224d36
|
|
MD5 |
eee85664ebbc6b94798b0af2e818bd5b
|
|
BLAKE2b-256 |
459c91f4fca086b9613540172a567833276c60e5766e26dd3a4f529bb77c9dfa
|
Provenance
The following attestation bundles were made for codescope-1.0.1.tar.gz
:
Publisher:
publish.yml
on Adrenaline-AI/scope
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1
-
Predicate type:
https://docs.pypi.org/attestations/publish/v1
-
Subject name:
codescope-1.0.1.tar.gz
-
Subject digest:
ad97f4cd9a6990a43a2ede9ab494ebed50c5e5621ed0e89f23bfe1fff1224d36
- Sigstore transparency entry: 209003212
- Sigstore integration time:
-
Permalink:
Adrenaline-AI/scope@d77412bf8ab20da640cd2002106a8c635018661a
-
Branch / Tag:
refs/tags/v1.0.1
- Owner: https://github.com/Adrenaline-AI
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com
-
Runner Environment:
github-hosted
-
Publication workflow:
publish.yml@d77412bf8ab20da640cd2002106a8c635018661a
-
Trigger Event:
push
-
Statement type:
File details
Details for the file codescope-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: codescope-1.0.1-py3-none-any.whl
- Upload date:
- Size: 63.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
961e399388e9157a1d575fef0674f06634c6e4b91b29d72123d1a2f1f0161a4a
|
|
MD5 |
fdc2c9dafca410514804832020b95f98
|
|
BLAKE2b-256 |
e603a96bd5d4900e3c9c79c25845dd983b16712153154cb2a76a673b1e60870f
|
Provenance
The following attestation bundles were made for codescope-1.0.1-py3-none-any.whl
:
Publisher:
publish.yml
on Adrenaline-AI/scope
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1
-
Predicate type:
https://docs.pypi.org/attestations/publish/v1
-
Subject name:
codescope-1.0.1-py3-none-any.whl
-
Subject digest:
961e399388e9157a1d575fef0674f06634c6e4b91b29d72123d1a2f1f0161a4a
- Sigstore transparency entry: 209003216
- Sigstore integration time:
-
Permalink:
Adrenaline-AI/scope@d77412bf8ab20da640cd2002106a8c635018661a
-
Branch / Tag:
refs/tags/v1.0.1
- Owner: https://github.com/Adrenaline-AI
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com
-
Runner Environment:
github-hosted
-
Publication workflow:
publish.yml@d77412bf8ab20da640cd2002106a8c635018661a
-
Trigger Event:
push
-
Statement type: