No project description provided
Project description
REPOMAP
A Python library and CLI tool for generating repository AST maps and analyzing code structure from GitLab and GitHub repositories. Built on ast-grep for fast, accurate parsing across 10 programming languages.
Features
- Generate repository AST trees from GitLab and GitHub repositories
- Analyze function calls, class structures, and import relationships
- Create call stacks from specific lines in source code
- Smart caching with commit hash comparison to skip unnecessary regeneration
- High-performance local cloning with multiprocessing support
- Output as Pydantic models for type-safe analysis
Supported Languages
| Language | Extensions |
|---|---|
| Python | .py, .pyx, .pyi |
| C | .c, .h |
| C++ | .cpp, .hpp, .cc, .cxx |
| Go | .go |
| Java | .java |
| PHP | .php |
| C# | .cs |
| JavaScript | .js, .jsx, .mjs |
| TypeScript | .ts |
| TSX | .tsx |
Installation
pip install repomap-suite
Usage as a Library
Note: The library uses multiprocessing internally for building repository AST trees. Wrap your code in an
if __name__ == "__main__":block when running scripts directly.
Environment Setup
export GITLAB_TOKEN=your_gitlab_token
export GITHUB_TOKEN=your_github_token
Basic Usage
import os
from repomap import RepoTreeGenerator
def main():
# Initialize generator
generator = RepoTreeGenerator(
token=os.getenv("GITHUB_TOKEN"),
use_multiprocessing=True, # default: True
use_local_clone=True, # default: True (faster via shallow clone)
)
# Generate repository AST tree
tree = generator.generate_repo_tree(
"https://github.com/user/repo",
ref="main" # optional: branch, tag, or commit
)
# Save to file
generator.save_repo_tree(tree, "output.json")
if __name__ == "__main__":
main()
Smart Caching (Skip Unchanged Repositories)
import os
from repomap import RepoTreeGenerator
def main():
generator = RepoTreeGenerator(token=os.getenv("GITHUB_TOKEN"))
repo_url = "https://github.com/user/repo"
output_path = "repo-tree.json"
# Check if existing tree is still current (compares commit hashes)
if generator.is_repo_tree_up_to_date(repo_url, ref="main", output_path=output_path):
print("Tree is up to date, skipping regeneration")
else:
print("Repository changed, regenerating...")
tree = generator.generate_repo_tree(repo_url, ref="main")
generator.save_repo_tree(tree, output_path)
# Or use the convenience method that handles this automatically:
tree = generator.generate_repo_tree_if_needed(
repo_url,
ref="main",
output_path=output_path,
force=False # set True to regenerate regardless
)
if __name__ == "__main__":
main()
Analyzing Repository Structure
import os
from repomap import RepoTreeGenerator
def analyze_functions():
generator = RepoTreeGenerator(token=os.getenv("GITHUB_TOKEN"))
tree = generator.generate_repo_tree("https://github.com/user/repo")
for file_path, file_data in tree["files"].items():
language = file_data["language"]
ast_data = file_data["ast"]
# Get all functions and their calls
for func_name, func_info in ast_data["functions"].items():
print(f"Function: {func_name}")
print(f" Language: {language}")
print(f" Lines: {func_info['start_line']}-{func_info['end_line']}")
print(f" Calls: {func_info['calls']}")
if __name__ == "__main__":
analyze_functions()
Working with Pydantic Models
import os
from repomap import RepoTreeGenerator
from repomap.schemas import RepoStructureModel, FileASTModel
def main():
# Generate and save repository tree
generator = RepoTreeGenerator(token=os.getenv("GITHUB_TOKEN"))
tree = generator.generate_repo_tree("https://github.com/user/repo")
generator.save_repo_tree(tree, "output.json")
# Load as Pydantic model for type-safe access
with open("output.json", "r") as f:
repo_structure = RepoStructureModel.model_validate_json(f.read())
# Check if cross-reference population succeeded
if repo_structure.is_called_by_population_failed:
print("Warning: called_by field population failed")
# Access file AST data
file_ast: FileASTModel = repo_structure.files.get("src/main.py")
if file_ast:
print(f"Language: {file_ast.language}")
print(f"Imports: {file_ast.ast.imports}")
# Iterate functions
for func_name, func_details in file_ast.ast.functions.items():
print(f"Function: {func_name}")
print(f" Lines: {func_details.start_line}-{func_details.end_line}")
print(f" Is method: {func_details.is_method}")
# See where this function is called from
for call_site in func_details.called_by:
print(f" Called by: {call_site.caller_function_name} "
f"in {call_site.file_path}:{call_site.line_number}")
if __name__ == "__main__":
main()
Working with Call Stacks
import os
from repomap import CallStackGenerator
def main():
generator = CallStackGenerator(token=os.getenv("GITHUB_TOKEN"))
# Get function content by line number
file_url = "https://github.com/user/repo/blob/main/src/utils.py"
content = generator.get_function_content_by_line(
file_url=file_url,
line_number=42
)
print(content)
# Get function content by name (searches entire repo tree)
# Returns dict mapping "file_path:class_or_global" to function content
contents = generator.get_function_content_by_name(
ast_tree="repo-tree.json", # path to JSON or dict
function_name="my_function",
file_path="src/utils.py" # optional: limit search scope
)
for location, code in contents.items():
print(f"Found in {location}:")
print(code)
# Generate call stack for a specific line
call_stack = generator.generate_call_stack(file_url, line_number=42)
for call in call_stack:
print(f"Function: {call['function']}")
print(f" File: {call['file']}")
print(f" Line: {call['line']}")
print(f" Calls: {call['calls']}")
if __name__ == "__main__":
main()
CLI Usage
Generate Repository AST Tree
# Basic usage
repomap https://github.com/user/repo --repo-tree -o output.json
# With specific branch/tag/commit
repomap https://github.com/user/repo --repo-tree --ref main -o output.json
# Use API mode instead of local clone (slower but less disk space)
repomap https://github.com/user/repo --repo-tree --no-local-clone -o output.json
The CLI automatically checks if an existing tree is up to date and skips regeneration when the commit hash matches.
Print Function by Line
repomap --print-function \
--target-file https://github.com/user/repo/blob/main/src/utils.py \
--line 42
Print Function by Name
# Search entire repository
repomap --print-function-by-name \
--name my_function \
--repo-tree-path repo-tree.json
# Search in specific file
repomap --print-function-by-name \
--name my_function \
--repo-tree-path repo-tree.json \
--file-path src/utils.py
Generate Call Stack
repomap --call-stack \
--target-file https://github.com/user/repo/blob/main/src/main.py \
--line 42 \
--output-stack call-stack.json
Output format:
[
{
"function": "main",
"file": "https://github.com/user/repo/blob/main/src/main.py",
"line": 42,
"calls": ["helper1", "helper2"]
}
]
CLI Options Reference
| Option | Description |
|---|---|
-t, --token |
GitLab/GitHub access token (overrides env var) |
-o, --output |
Output file path (default: repomap.json) |
-v, --verbose |
Enable debug logging |
--version |
Show version |
--repo-tree |
Generate repository AST tree |
--ref |
Git reference (branch, tag, commit) |
--no-local-clone |
Use API instead of local clone |
--call-stack |
Generate call stack mode |
--print-function |
Print function at line |
--print-function-by-name |
Print function by name |
--target-file |
URL to target file |
--line |
Line number |
--structure-file |
Path to repo structure JSON |
--output-stack |
Output path for call stack |
--name |
Function name to search |
--repo-tree-path |
Path to repo-tree JSON |
--file-path |
Limit search to specific file |
Configuration
Disabling Multiprocessing
For debugging or environments where multiprocessing causes issues:
generator = RepoTreeGenerator(use_multiprocessing=False)
Using API Mode (No Local Clone)
For environments with limited disk space or when you need to avoid git operations:
generator = RepoTreeGenerator(use_local_clone=False)
Note: API mode is slower but doesn't require disk space for cloning.
Development
Setup
git clone https://github.com/StanleyOneG/repomap.git
cd repomap
poetry install
Testing
poetry run pytest
Code Quality
poetry run black repomap/
poetry run isort repomap/
poetry run flake8 repomap/
poetry run mypy repomap/
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and linting
- Submit a pull request
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file repomap_suite-0.2.0.tar.gz.
File metadata
- Download URL: repomap_suite-0.2.0.tar.gz
- Upload date:
- Size: 33.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74e621c95810eff652a20394a71c48c7f3e0f03186716d5574368fc85155b294
|
|
| MD5 |
83d5ce2a813f1fc6702f59206177c133
|
|
| BLAKE2b-256 |
82b5133b8d4f376b4deab715f87cf09c14c8c18207ece861113cb673fca2f0b0
|
File details
Details for the file repomap_suite-0.2.0-py3-none-any.whl.
File metadata
- Download URL: repomap_suite-0.2.0-py3-none-any.whl
- Upload date:
- Size: 36.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad107ca2e85c051394806c6220f6df7fb06165bf0cf20338ffbe289b911f9e4e
|
|
| MD5 |
46c538f47741dfb0b80720f7f8e60e6c
|
|
| BLAKE2b-256 |
b91fe7f76e0090fbf863bb86aaa8086ccbe4d417365fbd74ede582c4eee2f80b
|