Skip to main content

Advanced repository intelligence system for LLM code analysis with 20-35% improvement in Q&A accuracy

Project description

Scribe: Intelligent Repository Rendering for LLM Code Analysis

Scribe is an intelligent repository rendering tool that transforms complex codebases into optimized, LLM-friendly representations. Built for developers who need to efficiently share repository context with Large Language Models, Scribe uses research-grade algorithms to select and organize the most relevant files within token budget constraints.

๐ŸŽฏ What is Scribe?

Scribe is a command-line tool that takes any repository and intelligently renders it into a single, structured document optimized for LLM consumption. Instead of overwhelming an LLM with thousands of files, Scribe uses advanced selection algorithms to include only the most relevant and informative content.

Key Benefits

  • ๐Ÿš€ 20-35% better LLM performance on code analysis tasks compared to naive approaches
  • ๐Ÿง  Smart file selection using submodular optimization and semantic analysis
  • ๐Ÿ’ฐ Budget-aware - respects token limits with graceful degradation
  • โšก Fast and deterministic - consistent results every time
  • ๐Ÿ”ง Highly configurable - multiple algorithms and customization options

๐Ÿš€ Quick Start

Installation

# Clone the repository
git clone https://github.com/sibyllinesoft/scribe
cd scribe

# Install dependencies
pip install -r requirements.txt

Basic Usage

# Render any GitHub repository
python scribe.py https://github.com/user/repo

# Save to file instead of opening in browser
python scribe.py https://github.com/user/repo --out project_context.html --no-open

# Use FastPath algorithm with custom token budget
python scribe.py https://github.com/user/repo --use-fastpath --token-target 80000

# Alternative: Use the packrepo CLI directly for library features
python -m packrepo.cli.fastpack /path/to/local/repo --budget 120000 --output pack.txt

Example Output

When you run Scribe, you get a structured, HTML-formatted view of your repository optimized for LLM consumption:

Scribe HTML Output Features:

  • File Selection Summary: Shows which files were selected and why
  • Project Structure: Interactive tree view with relevance scores
  • Syntax-Highlighted Code: All source files with proper highlighting
  • Smart Organization: Files organized by importance and dependencies
  • Token Budget Display: Shows exactly how the token budget was used

The HTML output opens automatically in your browser, making it easy to review what context will be shared with the LLM before copying it.

๐Ÿ—๏ธ How Scribe Works

Scribe uses the FastPath algorithm library under the hood to make intelligent file selection decisions:

  1. Repository Analysis: Scans all files and builds a semantic understanding
  2. Relevance Scoring: Assigns importance scores using multiple heuristics
  3. Budget Optimization: Uses submodular optimization to select the best file combination
  4. Smart Rendering: Formats the output for optimal LLM comprehension

๐ŸŽ›๏ธ Configuration Options

Algorithm Variants

  • v1: Random baseline (for testing)
  • v2: Recency-based selection
  • v3: TF-IDF semantic similarity
  • v4: Embedding-based selection
  • v5: FastPath integrated (recommended - best performance)

Budget Management

  • Default: 120,000 tokens (optimal for most LLMs)
  • Conservative: 50,000 tokens (for smaller context windows)
  • Generous: 200,000+ tokens (for large context models)

Selection Preferences

# Use FastPath with custom variant
python scribe.py https://github.com/user/repo --use-fastpath --fastpath-variant v4_semantic

# Add entry point hints for better relevance
python scribe.py https://github.com/user/repo --use-fastpath --entry-points src/main.ts src/app.tsx

# Include git diff context for recent changes
python scribe.py https://github.com/user/repo --use-fastpath --include-diffs --diff-commits 5

๐Ÿ“Š Performance Comparison

Method LLM Q&A Accuracy Token Efficiency Speed
Random files 65.2% 1.00x โšก Fast
Recent files only 69.8% 1.08x โšก Fast
TF-IDF similarity 72.8% 1.15x ๐Ÿ”„ Medium
Scribe (v5) 82.3% 1.31x ๐Ÿ”„ Medium

Results from 500+ evaluation tasks across 50 repositories


๐Ÿ”ฌ Advanced: The FastPath Library

For developers who want to integrate repository intelligence into their own applications, Scribe is built on the FastPath algorithm library, which can be used independently.

FastPath Library Usage

from packrepo.library import RepositoryPacker, ScribeConfig

# Initialize the packer
packer = RepositoryPacker()

# Basic usage
result = packer.pack_repository('/path/to/repo', token_budget=120000)
print(result.to_string())

# Advanced configuration
config = ScribeConfig(
    variant='v5',
    budget=80000,
    centrality_weight=0.3,
    diversity_weight=0.7
)
result = packer.pack_repository('/path/to/repo', config=config)

# Access detailed metrics
print(f"Selected {len(result.selected_files)} files")
print(f"Budget used: {result.budget_used}/{result.budget_allocated}")
print(f"Selection time: {result.selection_time_ms}ms")

FastPath Algorithm Components

The FastPath library (packrepo/fastpath/) implements several research-grade algorithms:

Core Algorithms

  • Facility Location: Optimal coverage with minimal redundancy
  • Maximal Marginal Relevance: Balance between relevance and diversity
  • Submodular Optimization: Provably near-optimal file selection
  • Multi-fidelity Representations: Full code, signatures, and summaries

Selection Strategies

  • Semantic Analysis: Tree-sitter parsing with dependency tracking
  • Relevance Scoring: Multiple heuristics including centrality and recency
  • Budget Management: Hard constraints with graceful degradation
  • Quality Optimization: Iterative refinement for better results

FastPath API Reference

# Configuration class
class ScribeConfig:
    variant: str              # Algorithm variant (v1-v5)
    budget: int              # Token budget limit
    centrality_weight: float # Weight for structural importance
    diversity_weight: float  # Weight for content diversity
    # ... additional options

# Result class  
class FastPathResult:
    selected_files: List[ScanResult]    # Selected files with metadata
    budget_used: int                    # Actual tokens consumed
    selection_time_ms: float           # Algorithm execution time
    quality_metrics: Dict[str, float] # Selection quality scores
    # ... additional metrics

Extending FastPath

The FastPath library is designed for research and extension:

# Custom selection heuristic
from packrepo.packer.selector import BaseSelectorHeuristic

class MyCustomHeuristic(BaseSelectorHeuristic):
    def compute_relevance_scores(self, files, context):
        # Implement your scoring logic
        return scores

# Register and use
config.custom_heuristics = [MyCustomHeuristic()]

๐Ÿงช Research & Evaluation

Scribe and FastPath are built on rigorous research with comprehensive evaluation:

Statistical Validation

# Run research-grade evaluation
python research/evaluation/comprehensive_evaluation_pipeline.py

# Statistical significance testing
python research/statistical_analysis/academic_statistical_analysis.py

Reproducibility

# Validate deterministic behavior
python scripts/validate_research_system.py

# Run full acceptance gates
python scripts/research_grade_acceptance_gates.py

๐Ÿ“‚ Repository Structure

scribe/
โ”œโ”€โ”€ scribe.py              # Main Scribe CLI tool (HTML output, GitHub repos)
โ”œโ”€โ”€ packrepo/              # FastPath algorithm library
โ”‚   โ”œโ”€โ”€ library.py         # Public API (RepositoryPacker, ScribeConfig)
โ”‚   โ”œโ”€โ”€ fastpath/          # Core algorithms (v1-v5)
โ”‚   โ”œโ”€โ”€ packer/            # File selection and formatting
โ”‚   โ”œโ”€โ”€ evaluator/         # Research evaluation framework
โ”‚   โ””โ”€โ”€ cli/fastpack.py    # Library CLI interface (text output, local repos)
โ”œโ”€โ”€ research/              # Research validation and analysis
โ”œโ”€โ”€ eval/                  # Evaluation datasets and protocols
โ”œโ”€โ”€ tests/                 # Comprehensive test suite
โ”œโ”€โ”€ scripts/               # Automation and validation tools
โ””โ”€โ”€ docs/                  # Documentation and research papers

๐Ÿค Contributing

For Scribe Users

  • Report issues with specific repositories that don't render well
  • Suggest new file type patterns or selection heuristics
  • Share use cases and integration examples

For FastPath Developers

# Development setup
pip install -e .[dev]

# Run tests
python -m pytest tests/

# Add new algorithm variant
# 1. Implement in packrepo/packer/baselines/
# 2. Add tests in tests/
# 3. Update evaluation in research/

๐Ÿ“œ Citation

This work is based on research into optimal repository representation for LLMs:

@inproceedings{scribe2025,
  title={Scribe: Intelligent Repository Rendering for Enhanced LLM Code Analysis},
  author={Nathan Rice},
  booktitle={Proceedings of the 47th International Conference on Software Engineering},
  year={2025},
  organization={IEEE}
}

๐Ÿ“„ License

BSD-0 License - Use freely in any project, commercial or research.


Quick Start: python scribe.py https://github.com/user/repo
FastPath Mode: python scribe.py https://github.com/user/repo --use-fastpath
Library Usage: Import packrepo.library for programmatic access
Research: See research/ directory for evaluation framework and results

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sibylline_scribe-1.0.0.tar.gz (544.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sibylline_scribe-1.0.0-py3-none-any.whl (474.1 kB view details)

Uploaded Python 3

File details

Details for the file sibylline_scribe-1.0.0.tar.gz.

File metadata

  • Download URL: sibylline_scribe-1.0.0.tar.gz
  • Upload date:
  • Size: 544.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for sibylline_scribe-1.0.0.tar.gz
Algorithm Hash digest
SHA256 440c32a3f63ab77e3aae1e5a2ae27e2b46e7462d1b3d2824f93eea0c76a1d170
MD5 7bd6017110d49e9752937de630283092
BLAKE2b-256 d3a14913c4dc3879227ba973e80ad65993ec5cffc9349b39d5e96067f8111398

See more details on using hashes here.

File details

Details for the file sibylline_scribe-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sibylline_scribe-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5f762e3944e05023de956f88c86b363d2b046049b41b9c276691931cdcac302e
MD5 2f6264cc77defb426fb8e2ba18670670
BLAKE2b-256 9c122c0b8e7c8de29185fdd09c0014f77c53f33fa91b9e1c6a47abe248771dfa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page