A Python package that optimizes codebase representations for LLMs by generating compact, context-rich summaries
Project description
repominify
A Python package that optimizes codebase representations for Large Language Models (LLMs) by generating compact, context-rich summaries that minimize token usage while preserving essential structural information.
Overview
repominify helps you provide detailed context about your codebase to LLMs without consuming excessive space in their context windows. It processes Repomix output to create optimized representations that maintain critical structural information while significantly reducing token usage. This enables more efficient and effective code-related conversations with AI models by maximizing the amount of useful context that can fit within token limits.
โ ๏ธ Warning: repominify performs deep code analysis which can be resource-intensive for large codebases. Please start with a small subset of your code to understand the process and resource requirements.
Features
-
Automatic Dependency Management
- Checks and installs Node.js and npm dependencies
- Automatically installs Repomix if not present
- Handles version compatibility checks
-
Code Analysis
- Parses and analyzes code structure
- Extracts imports, classes, and functions
- Captures function signatures and docstrings
- Identifies and extracts constants and environment variables
- Builds dependency graphs
- Performance optimized for large codebases
-
Multiple Output Formats
- GraphML for visualization tools
- JSON for web-based tools
- YAML for statistics
- Text for human-readable analysis
-
Rich Code Context
- Complete function/method signatures
- Full docstrings with parameter descriptions
- Constants and their values
- Environment variables and configurations
- Module-level documentation
- Import relationships
- Class hierarchies and dependencies
-
Size Optimization
- Generates minified code structure representation
- Provides detailed size reduction statistics
- Shows character and token reduction percentages
- Maintains semantic meaning while reducing size
-
Security Awareness
- Detects potentially sensitive patterns
- Provides security recommendations
- Flags suspicious file content
- Helps maintain security best practices
-
Debug Support
- Comprehensive logging
- Performance tracking
- Detailed error messages
Installation
pip install repominify
Requirements
- Python 3.7 or higher
- Node.js 12+ (will be checked during runtime)
- npm 6+ (will be checked during runtime)
- Repomix (will be installed automatically if not present)
Usage
Command Line
# Basic usage
repominify path/to/repomix-output.txt
# Specify output directory
repominify path/to/repomix-output.txt -o output_dir
# Enable debug logging
repominify path/to/repomix-output.txt --debug
Python API
from repominify import CodeGraphBuilder, ensure_dependencies, configure_logging
# Enable debug logging (optional)
configure_logging(debug=True)
# Check dependencies
if ensure_dependencies():
# Create graph builder
builder = CodeGraphBuilder()
# Parse the Repomix output file
file_entries = builder.parser.parse_file("repomix-output.txt")
# Build the graph
graph = builder.build_graph(file_entries)
# Save outputs and get comparison
text_content, comparison = builder.save_graph(
"output_directory",
input_file="repomix-output.txt"
)
# Print comparison
print(comparison)
Example Output
Analysis Complete!
๐ File Stats:
โโโโโโโโโโโโโโโโ
Total Files: 29
Total Chars: 143,887
Total Tokens: 14,752
Output: input.txt
Security: โ No suspicious files detected
๐ File Stats:
โโโโโโโโโโโโโโโโ
Total Files: 29
Total Chars: 26,254
Total Tokens: 3,254
Output: code_graph.txt
Security: โ No suspicious files detected
๐ Comparison:
โโโโโโโโโโโโโโโโ
Char Reduction: 81.8%
Token Reduction: 77.9%
Security Notes: โ No issues found
Output Files
When you run repominify, it generates several files in your output directory:
code_graph.graphml: Graph representation in GraphML formatcode_graph.json: Graph data in JSON format for web visualizationgraph_statistics.yaml: Statistical analysis of the codebasecode_graph.txt: Human-readable text representation including:- Module structure and dependencies
- Function signatures and docstrings
- Class definitions and hierarchies
- Constants and their values
- Environment variables
- Import relationships
Project Structure
repominify/
โโโ repominify/ # Source code
โ โโโ graph.py # Graph building and analysis
โ โโโ parser.py # Repomix file parsing
โ โโโ types.py # Core types and data structures
โ โโโ exporters.py # Graph export functionality
โ โโโ formatters.py # Text representation formatting
โ โโโ dependencies.py # Dependency management
โ โโโ logging.py # Logging configuration
โ โโโ stats.py # Statistics and comparison
โ โโโ constants.py # Shared constants
โ โโโ exceptions.py # Custom exceptions
โ โโโ cli.py # Command-line interface
โ โโโ __init__.py # Package initialization
โโโ tests/ # Test suite
โ โโโ test_end2end.py # End-to-end tests
โ โโโ data/ # Test data files
โโโ setup.py # Package configuration
โโโ LICENSE # MIT License
โโโ README.md # This file
Code Style
The project follows these coding standards for consistency and maintainability:
- Comprehensive docstrings with Examples sections for all public APIs
- Type hints for all functions, methods, and class attributes
- Custom exceptions for proper error handling and reporting
- Clear separation of concerns between modules
- Consistent code formatting and naming conventions
- Detailed logging with configurable debug support
Development
To set up for development:
# Clone the repository
git clone https://github.com/mikewcasale/repominify.git
cd repominify
# Install in development mode with test dependencies
pip install -e '.[dev]'
# Run tests
pytest tests/
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. By contributing to this project, you agree to abide by its terms.
Please ensure your code follows the project's coding standards, including proper docstrings, type hints, and error handling.
Authors
Mike Casale
- Email: mike@casale.xyz
- GitHub: @mikewcasale
- Website: casale.xyz
License
MIT License - see the LICENSE file for details.
Acknowledgments
This project makes use of or was influenced by several excellent open source projects:
- Repomix - Our analysis pipeline integrates with this Node.js tool for initial code scanning
- NetworkX - Core graph algorithms and data structures
- PyYAML - YAML file handling
- GraphRAG Accelerator - Graph-based code analysis patterns and implementation concepts
How to Get Help
- For bugs and feature requests, please open an issue
- For usage questions, please start a discussion
- For security concerns, please email security@casale.xyz directly
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file repominify-0.1.6.tar.gz.
File metadata
- Download URL: repominify-0.1.6.tar.gz
- Upload date:
- Size: 28.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3551312eaaf636016ac9c0e66f1608ebdb004af63ea1522739b1da0284191b8c
|
|
| MD5 |
3d725e16899be5ec854094669d36737b
|
|
| BLAKE2b-256 |
257ca55df9d47f377dbdb47b4200855a4c5ca2aefdcb32772f88a15a0ba34b68
|
File details
Details for the file repominify-0.1.6-py3-none-any.whl.
File metadata
- Download URL: repominify-0.1.6-py3-none-any.whl
- Upload date:
- Size: 30.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b78367755578c84165c25bc457295f75d462762376ab1a825a59a5200c55c7d0
|
|
| MD5 |
3dd4807cbae8fec017b1accd89238367
|
|
| BLAKE2b-256 |
28040289751ee2bcb9a3592380b83493c4faeaff84594b10cf4ca8979c1bada0
|