Advanced repository intelligence system for LLM code analysis with 20-35% improvement in Q&A accuracy

These details have not been verified by PyPI

Project links

Project description

Scribe: Intelligent Repository Rendering for LLM Code Analysis

Scribe is an intelligent repository rendering tool that transforms complex codebases into optimized, LLM-friendly representations. Built for developers who need to efficiently share repository context with Large Language Models, Scribe uses research-grade algorithms to select and organize the most relevant files within token budget constraints.

🎯 What is Scribe?

Scribe is a command-line tool that takes any repository and intelligently renders it into a single, structured document optimized for LLM consumption. Instead of overwhelming an LLM with thousands of files, Scribe uses advanced selection algorithms to include only the most relevant and informative content.

Key Benefits

🚀 20-35% better LLM performance on code analysis tasks compared to naive approaches
🧠 Smart file selection using submodular optimization and semantic analysis
💰 Budget-aware - respects token limits with graceful degradation
⚡ Fast and deterministic - consistent results every time
🔧 Highly configurable - multiple algorithms and customization options

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/sibyllinesoft/scribe
cd scribe

# Install dependencies
pip install -r requirements.txt

Basic Usage

# Render any GitHub repository
python scribe.py https://github.com/user/repo

# Save to file instead of opening in browser
python scribe.py https://github.com/user/repo --out project_context.html --no-open

# Use FastPath algorithm with custom token budget
python scribe.py https://github.com/user/repo --use-fastpath --token-target 80000

# Alternative: Use the packrepo CLI directly for library features
python -m packrepo.cli.fastpack /path/to/local/repo --budget 120000 --output pack.txt

Example Output

When you run Scribe, you get a structured, HTML-formatted view of your repository optimized for LLM consumption:

Scribe HTML Output Features:

File Selection Summary: Shows which files were selected and why
Project Structure: Interactive tree view with relevance scores
Syntax-Highlighted Code: All source files with proper highlighting
Smart Organization: Files organized by importance and dependencies
Token Budget Display: Shows exactly how the token budget was used

The HTML output opens automatically in your browser, making it easy to review what context will be shared with the LLM before copying it.

🏗️ How Scribe Works

Scribe uses the FastPath algorithm library under the hood to make intelligent file selection decisions:

Repository Analysis: Scans all files and builds a semantic understanding
Relevance Scoring: Assigns importance scores using multiple heuristics
Budget Optimization: Uses submodular optimization to select the best file combination
Smart Rendering: Formats the output for optimal LLM comprehension

🎛️ Configuration Options

Algorithm Variants

v1: Random baseline (for testing)
v2: Recency-based selection
v3: TF-IDF semantic similarity
v4: Embedding-based selection
v5: FastPath integrated (recommended - best performance)

Budget Management

Default: 120,000 tokens (optimal for most LLMs)
Conservative: 50,000 tokens (for smaller context windows)
Generous: 200,000+ tokens (for large context models)

Selection Preferences

# Use FastPath with custom variant
python scribe.py https://github.com/user/repo --use-fastpath --fastpath-variant v4_semantic

# Add entry point hints for better relevance
python scribe.py https://github.com/user/repo --use-fastpath --entry-points src/main.ts src/app.tsx

# Include git diff context for recent changes
python scribe.py https://github.com/user/repo --use-fastpath --include-diffs --diff-commits 5

📊 Performance Comparison

Method	LLM Q&A Accuracy	Token Efficiency	Speed
Random files	65.2%	1.00x	⚡ Fast
Recent files only	69.8%	1.08x	⚡ Fast
TF-IDF similarity	72.8%	1.15x	🔄 Medium
Scribe (v5)	82.3%	1.31x	🔄 Medium

Results from 500+ evaluation tasks across 50 repositories

🔬 Advanced: The FastPath Library

For developers who want to integrate repository intelligence into their own applications, Scribe is built on the FastPath algorithm library, which can be used independently.

FastPath Library Usage

from packrepo.library import RepositoryPacker, ScribeConfig

# Initialize the packer
packer = RepositoryPacker()

# Basic usage
result = packer.pack_repository('/path/to/repo', token_budget=120000)
print(result.to_string())

# Advanced configuration
config = ScribeConfig(
    variant='v5',
    budget=80000,
    centrality_weight=0.3,
    diversity_weight=0.7
)
result = packer.pack_repository('/path/to/repo', config=config)

# Access detailed metrics
print(f"Selected {len(result.selected_files)} files")
print(f"Budget used: {result.budget_used}/{result.budget_allocated}")
print(f"Selection time: {result.selection_time_ms}ms")

FastPath Algorithm Components

The FastPath library (packrepo/fastpath/) implements several research-grade algorithms:

Core Algorithms

Facility Location: Optimal coverage with minimal redundancy
Maximal Marginal Relevance: Balance between relevance and diversity
Submodular Optimization: Provably near-optimal file selection
Multi-fidelity Representations: Full code, signatures, and summaries

Selection Strategies

Semantic Analysis: Tree-sitter parsing with dependency tracking
Relevance Scoring: Multiple heuristics including centrality and recency
Budget Management: Hard constraints with graceful degradation
Quality Optimization: Iterative refinement for better results

FastPath API Reference

# Configuration class
class ScribeConfig:
    variant: str              # Algorithm variant (v1-v5)
    budget: int              # Token budget limit
    centrality_weight: float # Weight for structural importance
    diversity_weight: float  # Weight for content diversity
    # ... additional options

# Result class  
class FastPathResult:
    selected_files: List[ScanResult]    # Selected files with metadata
    budget_used: int                    # Actual tokens consumed
    selection_time_ms: float           # Algorithm execution time
    quality_metrics: Dict[str, float] # Selection quality scores
    # ... additional metrics

Extending FastPath

The FastPath library is designed for research and extension:

# Custom selection heuristic
from packrepo.packer.selector import BaseSelectorHeuristic

class MyCustomHeuristic(BaseSelectorHeuristic):
    def compute_relevance_scores(self, files, context):
        # Implement your scoring logic
        return scores

# Register and use
config.custom_heuristics = [MyCustomHeuristic()]

🧪 Research & Evaluation

Scribe and FastPath are built on rigorous research with comprehensive evaluation:

Statistical Validation

# Run research-grade evaluation
python research/evaluation/comprehensive_evaluation_pipeline.py

# Statistical significance testing
python research/statistical_analysis/academic_statistical_analysis.py

Reproducibility

# Validate deterministic behavior
python scripts/validate_research_system.py

# Run full acceptance gates
python scripts/research_grade_acceptance_gates.py

📂 Repository Structure

scribe/
├── scribe.py              # Main Scribe CLI tool (HTML output, GitHub repos)
├── packrepo/              # FastPath algorithm library
│   ├── library.py         # Public API (RepositoryPacker, ScribeConfig)
│   ├── fastpath/          # Core algorithms (v1-v5)
│   ├── packer/            # File selection and formatting
│   ├── evaluator/         # Research evaluation framework
│   └── cli/fastpack.py    # Library CLI interface (text output, local repos)
├── research/              # Research validation and analysis
├── eval/                  # Evaluation datasets and protocols
├── tests/                 # Comprehensive test suite
├── scripts/               # Automation and validation tools
└── docs/                  # Documentation and research papers

🤝 Contributing

For Scribe Users

Report issues with specific repositories that don't render well
Suggest new file type patterns or selection heuristics
Share use cases and integration examples

For FastPath Developers

# Development setup
pip install -e .[dev]

# Run tests
python -m pytest tests/

# Add new algorithm variant
# 1. Implement in packrepo/packer/baselines/
# 2. Add tests in tests/
# 3. Update evaluation in research/

📜 Citation

This work is based on research into optimal repository representation for LLMs:

@inproceedings{scribe2025,
  title={Scribe: Intelligent Repository Rendering for Enhanced LLM Code Analysis},
  author={Nathan Rice},
  booktitle={Proceedings of the 47th International Conference on Software Engineering},
  year={2025},
  organization={IEEE}
}

📄 License

BSD-0 License - Use freely in any project, commercial or research.

Quick Start: python scribe.py https://github.com/user/repo
FastPath Mode: python scribe.py https://github.com/user/repo --use-fastpath
Library Usage: Import packrepo.library for programmatic access
Research: See research/ directory for evaluation framework and results

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Aug 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sibylline_scribe-1.0.0.tar.gz (544.8 kB view details)

Uploaded Aug 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sibylline_scribe-1.0.0-py3-none-any.whl (474.1 kB view details)

Uploaded Aug 30, 2025 Python 3

File details

Details for the file sibylline_scribe-1.0.0.tar.gz.

File metadata

Download URL: sibylline_scribe-1.0.0.tar.gz
Upload date: Aug 30, 2025
Size: 544.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for sibylline_scribe-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`440c32a3f63ab77e3aae1e5a2ae27e2b46e7462d1b3d2824f93eea0c76a1d170`
MD5	`7bd6017110d49e9752937de630283092`
BLAKE2b-256	`d3a14913c4dc3879227ba973e80ad65993ec5cffc9349b39d5e96067f8111398`

See more details on using hashes here.

File details

Details for the file sibylline_scribe-1.0.0-py3-none-any.whl.

File metadata

Download URL: sibylline_scribe-1.0.0-py3-none-any.whl
Upload date: Aug 30, 2025
Size: 474.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for sibylline_scribe-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5f762e3944e05023de956f88c86b363d2b046049b41b9c276691931cdcac302e`
MD5	`2f6264cc77defb426fb8e2ba18670670`
BLAKE2b-256	`9c122c0b8e7c8de29185fdd09c0014f77c53f33fa91b9e1c6a47abe248771dfa`

See more details on using hashes here.

sibylline-scribe 1.0.0

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Scribe: Intelligent Repository Rendering for LLM Code Analysis

🎯 What is Scribe?

Key Benefits

🚀 Quick Start

Installation

Basic Usage

Example Output

🏗️ How Scribe Works

🎛️ Configuration Options

Algorithm Variants

Budget Management

Selection Preferences

📊 Performance Comparison

🔬 Advanced: The FastPath Library

FastPath Library Usage

FastPath Algorithm Components

Core Algorithms

Selection Strategies

FastPath API Reference

Extending FastPath

🧪 Research & Evaluation

Statistical Validation

Reproducibility

📂 Repository Structure

🤝 Contributing

For Scribe Users

For FastPath Developers

📜 Citation

📄 License

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes