Advanced repository intelligence system for LLM code analysis with 20-35% improvement in Q&A accuracy
Project description
Scribe: Intelligent Repository Rendering for LLM Code Analysis
Scribe is an intelligent repository rendering tool that transforms complex codebases into optimized, LLM-friendly representations. Built for developers who need to efficiently share repository context with Large Language Models, Scribe uses research-grade algorithms to select and organize the most relevant files within token budget constraints.
๐ฏ What is Scribe?
Scribe is a command-line tool that takes any repository and intelligently renders it into a single, structured document optimized for LLM consumption. Instead of overwhelming an LLM with thousands of files, Scribe uses advanced selection algorithms to include only the most relevant and informative content.
Key Benefits
- ๐ 20-35% better LLM performance on code analysis tasks compared to naive approaches
- ๐ง Smart file selection using submodular optimization and semantic analysis
- ๐ฐ Budget-aware - respects token limits with graceful degradation
- โก Fast and deterministic - consistent results every time
- ๐ง Highly configurable - multiple algorithms and customization options
๐ Quick Start
Installation
# Clone the repository
git clone https://github.com/sibyllinesoft/scribe
cd scribe
# Install dependencies
pip install -r requirements.txt
Basic Usage
# Render any GitHub repository
python scribe.py https://github.com/user/repo
# Save to file instead of opening in browser
python scribe.py https://github.com/user/repo --out project_context.html --no-open
# Use FastPath algorithm with custom token budget
python scribe.py https://github.com/user/repo --use-fastpath --token-target 80000
# Alternative: Use the packrepo CLI directly for library features
python -m packrepo.cli.fastpack /path/to/local/repo --budget 120000 --output pack.txt
Example Output
When you run Scribe, you get a structured, HTML-formatted view of your repository optimized for LLM consumption:
Scribe HTML Output Features:
- File Selection Summary: Shows which files were selected and why
- Project Structure: Interactive tree view with relevance scores
- Syntax-Highlighted Code: All source files with proper highlighting
- Smart Organization: Files organized by importance and dependencies
- Token Budget Display: Shows exactly how the token budget was used
The HTML output opens automatically in your browser, making it easy to review what context will be shared with the LLM before copying it.
๐๏ธ How Scribe Works
Scribe uses the FastPath algorithm library under the hood to make intelligent file selection decisions:
- Repository Analysis: Scans all files and builds a semantic understanding
- Relevance Scoring: Assigns importance scores using multiple heuristics
- Budget Optimization: Uses submodular optimization to select the best file combination
- Smart Rendering: Formats the output for optimal LLM comprehension
๐๏ธ Configuration Options
Algorithm Variants
- v1: Random baseline (for testing)
- v2: Recency-based selection
- v3: TF-IDF semantic similarity
- v4: Embedding-based selection
- v5: FastPath integrated (recommended - best performance)
Budget Management
- Default: 120,000 tokens (optimal for most LLMs)
- Conservative: 50,000 tokens (for smaller context windows)
- Generous: 200,000+ tokens (for large context models)
Selection Preferences
# Use FastPath with custom variant
python scribe.py https://github.com/user/repo --use-fastpath --fastpath-variant v4_semantic
# Add entry point hints for better relevance
python scribe.py https://github.com/user/repo --use-fastpath --entry-points src/main.ts src/app.tsx
# Include git diff context for recent changes
python scribe.py https://github.com/user/repo --use-fastpath --include-diffs --diff-commits 5
๐ Performance Comparison
| Method | LLM Q&A Accuracy | Token Efficiency | Speed |
|---|---|---|---|
| Random files | 65.2% | 1.00x | โก Fast |
| Recent files only | 69.8% | 1.08x | โก Fast |
| TF-IDF similarity | 72.8% | 1.15x | ๐ Medium |
| Scribe (v5) | 82.3% | 1.31x | ๐ Medium |
Results from 500+ evaluation tasks across 50 repositories
๐ฌ Advanced: The FastPath Library
For developers who want to integrate repository intelligence into their own applications, Scribe is built on the FastPath algorithm library, which can be used independently.
FastPath Library Usage
from packrepo.library import RepositoryPacker, ScribeConfig
# Initialize the packer
packer = RepositoryPacker()
# Basic usage
result = packer.pack_repository('/path/to/repo', token_budget=120000)
print(result.to_string())
# Advanced configuration
config = ScribeConfig(
variant='v5',
budget=80000,
centrality_weight=0.3,
diversity_weight=0.7
)
result = packer.pack_repository('/path/to/repo', config=config)
# Access detailed metrics
print(f"Selected {len(result.selected_files)} files")
print(f"Budget used: {result.budget_used}/{result.budget_allocated}")
print(f"Selection time: {result.selection_time_ms}ms")
FastPath Algorithm Components
The FastPath library (packrepo/fastpath/) implements several research-grade algorithms:
Core Algorithms
- Facility Location: Optimal coverage with minimal redundancy
- Maximal Marginal Relevance: Balance between relevance and diversity
- Submodular Optimization: Provably near-optimal file selection
- Multi-fidelity Representations: Full code, signatures, and summaries
Selection Strategies
- Semantic Analysis: Tree-sitter parsing with dependency tracking
- Relevance Scoring: Multiple heuristics including centrality and recency
- Budget Management: Hard constraints with graceful degradation
- Quality Optimization: Iterative refinement for better results
FastPath API Reference
# Configuration class
class ScribeConfig:
variant: str # Algorithm variant (v1-v5)
budget: int # Token budget limit
centrality_weight: float # Weight for structural importance
diversity_weight: float # Weight for content diversity
# ... additional options
# Result class
class FastPathResult:
selected_files: List[ScanResult] # Selected files with metadata
budget_used: int # Actual tokens consumed
selection_time_ms: float # Algorithm execution time
quality_metrics: Dict[str, float] # Selection quality scores
# ... additional metrics
Extending FastPath
The FastPath library is designed for research and extension:
# Custom selection heuristic
from packrepo.packer.selector import BaseSelectorHeuristic
class MyCustomHeuristic(BaseSelectorHeuristic):
def compute_relevance_scores(self, files, context):
# Implement your scoring logic
return scores
# Register and use
config.custom_heuristics = [MyCustomHeuristic()]
๐งช Research & Evaluation
Scribe and FastPath are built on rigorous research with comprehensive evaluation:
Statistical Validation
# Run research-grade evaluation
python research/evaluation/comprehensive_evaluation_pipeline.py
# Statistical significance testing
python research/statistical_analysis/academic_statistical_analysis.py
Reproducibility
# Validate deterministic behavior
python scripts/validate_research_system.py
# Run full acceptance gates
python scripts/research_grade_acceptance_gates.py
๐ Repository Structure
scribe/
โโโ scribe.py # Main Scribe CLI tool (HTML output, GitHub repos)
โโโ packrepo/ # FastPath algorithm library
โ โโโ library.py # Public API (RepositoryPacker, ScribeConfig)
โ โโโ fastpath/ # Core algorithms (v1-v5)
โ โโโ packer/ # File selection and formatting
โ โโโ evaluator/ # Research evaluation framework
โ โโโ cli/fastpack.py # Library CLI interface (text output, local repos)
โโโ research/ # Research validation and analysis
โโโ eval/ # Evaluation datasets and protocols
โโโ tests/ # Comprehensive test suite
โโโ scripts/ # Automation and validation tools
โโโ docs/ # Documentation and research papers
๐ค Contributing
For Scribe Users
- Report issues with specific repositories that don't render well
- Suggest new file type patterns or selection heuristics
- Share use cases and integration examples
For FastPath Developers
# Development setup
pip install -e .[dev]
# Run tests
python -m pytest tests/
# Add new algorithm variant
# 1. Implement in packrepo/packer/baselines/
# 2. Add tests in tests/
# 3. Update evaluation in research/
๐ Citation
This work is based on research into optimal repository representation for LLMs:
@inproceedings{scribe2025,
title={Scribe: Intelligent Repository Rendering for Enhanced LLM Code Analysis},
author={Nathan Rice},
booktitle={Proceedings of the 47th International Conference on Software Engineering},
year={2025},
organization={IEEE}
}
๐ License
BSD-0 License - Use freely in any project, commercial or research.
Quick Start: python scribe.py https://github.com/user/repo
FastPath Mode: python scribe.py https://github.com/user/repo --use-fastpath
Library Usage: Import packrepo.library for programmatic access
Research: See research/ directory for evaluation framework and results
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sibylline_scribe-1.0.0.tar.gz.
File metadata
- Download URL: sibylline_scribe-1.0.0.tar.gz
- Upload date:
- Size: 544.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
440c32a3f63ab77e3aae1e5a2ae27e2b46e7462d1b3d2824f93eea0c76a1d170
|
|
| MD5 |
7bd6017110d49e9752937de630283092
|
|
| BLAKE2b-256 |
d3a14913c4dc3879227ba973e80ad65993ec5cffc9349b39d5e96067f8111398
|
File details
Details for the file sibylline_scribe-1.0.0-py3-none-any.whl.
File metadata
- Download URL: sibylline_scribe-1.0.0-py3-none-any.whl
- Upload date:
- Size: 474.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f762e3944e05023de956f88c86b363d2b046049b41b9c276691931cdcac302e
|
|
| MD5 |
2f6264cc77defb426fb8e2ba18670670
|
|
| BLAKE2b-256 |
9c122c0b8e7c8de29185fdd09c0014f77c53f33fa91b9e1c6a47abe248771dfa
|