Skip to main content

Repository maps for LLMs

Project description

RepoScape

PyPI License Package status Daily downloads Weekly downloads Monthly downloads Distribution format Wheel availability Python version Implementation Releases Github Contributors Github Discussions Github Forks Github Issues Github Issues Github Watchers Github Stars Github Repository size Github last commit Github release date Github language count Github commits this week Github commits this month Github commits this year Package status Code style: black PyUp

Read the documentation!

RepoScape

RepoScape is a Python library for mapping and analyzing repository structures with a focus on understanding code dependencies and importance. It parses code files, builds a graph representation, and helps identify important components through various scoring algorithms.

Installation

pip install reposcape

Requires Python 3.12 or higher.

Quick Start

from reposcape import RepoMapper, DetailLevel

# Create mapper with default settings
mapper = RepoMapper()

# Generate overview of entire repository
overview = mapper.create_overview(
    repo_path="path/to/repo",
    detail=DetailLevel.SIGNATURES,
    token_limit=2000  # Optional token limit for output
)

# Generate focused view of specific files
focused = mapper.create_focused_view(
    files=["main.py", "utils.py"],
    repo_path="path/to/repo",
    detail=DetailLevel.DOCSTRINGS
)

Core Components

RepoMapper

The main entry point for repository analysis. Configurable with custom analyzers, scorers, and serializers.

class RepoMapper:
    def __init__(
        self,
        *,
        analyzers: Sequence[CodeAnalyzer] | None = None,
        scorer: GraphScorer | None = None,
        serializer: CodeSerializer | None = None,
    ): ...

    def create_overview(
        self,
        repo_path: str | PathLike[str],
        *,
        token_limit: int | None = None,
        detail: DetailLevel = DetailLevel.SIGNATURES,
        exclude_patterns: list[str] | None = None,
    ) -> str: ...

    def create_focused_view(
        self,
        files: Sequence[str | PathLike[str]],
        repo_path: str | PathLike[str],
        *,
        token_limit: int | None = None,
        detail: DetailLevel = DetailLevel.SIGNATURES,
        exclude_patterns: list[str] | None = None,
    ) -> str: ...

Detail Levels

Control how much information is included in the output:

class DetailLevel(Enum):
    STRUCTURE   # Just names and hierarchy
    SIGNATURES  # Include function/class signatures
    DOCSTRINGS  # Include signatures + docstrings
    FULL_CODE   # Include complete implementations

Code Analysis

RepoScape includes analyzers for different file types:

PythonAstAnalyzer

Analyzes Python files using AST parsing:

  • Extracts classes, functions, methods, variables
  • Tracks references between symbols
  • Collects docstrings and signatures
analyzer = PythonAstAnalyzer()
nodes = analyzer.analyze_file("main.py")

TextAnalyzer

Basic analyzer for text files:

  • Handles .txt, .md, .rst files
  • Extracts sections from markdown files
  • Preserves file content and first paragraph as docstring

Importance Scoring

RepoScape offers different algorithms for calculating code importance:

ReferenceScorer

Simple reference-based scoring that considers:

  • Number of incoming references (highest weight)
  • Number of outgoing references (medium weight)
  • Being referenced by important files (high boost)
  • Distance from important files (decreasing boost)
from reposcape.importance import ReferenceScorer

scorer = ReferenceScorer(
    ref_weight=1.0,
    outref_weight=0.5,
    important_ref_boost=2.0,
    distance_decay=0.5,
)

PageRankScorer

Uses the PageRank algorithm to score nodes based on the graph structure:

  • Considers connection patterns
  • Handles cycles in dependencies
  • Supports personalization for focused analysis
from reposcape.importance import PageRankScorer

scorer = PageRankScorer()

Output Serialization

Multiple serializers are available for different output formats:

MarkdownSerializer

Generates detailed markdown with:

  • Hierarchical structure using headers
  • Code blocks for signatures/implementations
  • Emojis for different node types
  • Optional details based on importance scores

CompactSerializer

Produces a compact, indented format:

  • Single line per node
  • Indentation shows hierarchy
  • Abbreviated signatures
  • Good for quick overviews

TreeSerializer

ASCII tree-style output:

  • Uses box-drawing characters
  • Shows clear parent-child relationships
  • Similar to tree command output

Example usage:

from reposcape.serializers import MarkdownSerializer, CompactSerializer, TreeSerializer

# Create mapper with specific serializer
mapper = RepoMapper(serializer=TreeSerializer())

Advanced Usage

Custom Analyzers

Implement CodeAnalyzer for custom file analysis:

class CustomAnalyzer(CodeAnalyzer):
    def can_handle(self, path: str | PathLike[str]) -> bool:
        return path.endswith(".custom")

    def analyze_file(
        self,
        path: str | PathLike[str],
        content: str | None = None
    ) -> list[CodeNode]: ...

Focused Analysis

Analyze specific files and their relationships:

mapper = RepoMapper()

# Focus on specific files
focused_view = mapper.create_focused_view(
    files=["src/core.py", "src/utils.py"],
    repo_path=".",
    detail=DetailLevel.DOCSTRINGS,
    exclude_patterns=["**/test_*.py", "**/__pycache__/*"]
)

Token Limits

Control output size for large repositories:

# Limit output to approximately 2000 tokens
overview = mapper.create_overview(
    repo_path=".",
    token_limit=2000,
    detail=DetailLevel.SIGNATURES
)

Models

CodeNode

Immutable representation of code elements:

@dataclass(frozen=True)
class CodeNode:
    name: str
    node_type: NodeType
    path: str
    content: str | None = None
    docstring: str | None = None
    signature: str | None = None
    children: Mapping[str, CodeNode] | None = None
    references_to: Sequence[Reference] | None = None
    referenced_by: Sequence[Reference] | None = None
    importance: float = 0.0

NodeType

Available node types:

  • DIRECTORY
  • FILE
  • CLASS
  • FUNCTION
  • METHOD
  • VARIABLE

Reference

Tracks symbol references:

@dataclass(frozen=True)
class Reference:
    name: str
    path: str
    line: int
    column: int

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reposcape-0.1.0.tar.gz (29.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reposcape-0.1.0-py3-none-any.whl (26.9 kB view details)

Uploaded Python 3

File details

Details for the file reposcape-0.1.0.tar.gz.

File metadata

  • Download URL: reposcape-0.1.0.tar.gz
  • Upload date:
  • Size: 29.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.5

File hashes

Hashes for reposcape-0.1.0.tar.gz
Algorithm Hash digest
SHA256 00f71035e7234694944f38017fb5d5eeae5a3ab6e8501f232c3c76ad204b945d
MD5 e6fdf3e1f89b90ef88f4e2a26627b346
BLAKE2b-256 c4431c5b7515a760e000bd39dd9ff66d074f9c9de0ea6ad95e2f12e5cfb44d38

See more details on using hashes here.

File details

Details for the file reposcape-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: reposcape-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.5

File hashes

Hashes for reposcape-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d8ca21520c039411fc241afa2433cc6dc8198debb70b196b9e9ad1fa745a7a16
MD5 36b07fc98c04f0fc5d56f03877349c0d
BLAKE2b-256 760d2fb5e157568ba3dc5ad1bcee281c17c6a1bf1b69d4b0afef7e816bc29dd6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page