A tool to convert a repository into flattened files for easier LLM upload
Project description
Repo Flattener
A Python package to convert a repository into flattened files for easier uploading to Large Language Models (LLMs).
Features
- Flattens repository structure by creating single files with path information
- Creates a manifest file showing the original structure
- Configurable ignore lists for directories and file extensions
- Interactive mode for selective file processing
- Type-safe with full type hints
- Robust error handling with custom exceptions
- Configurable logging with verbose and quiet modes
- Progress bar for visual feedback during processing
- Parallel processing for faster performance on large repositories
- Memory optimization with configurable file size limits
- Intelligent caching for instant manifest generation on unchanged repositories
- Configuration file support (.repo-flattener.yml)
- Simple command-line interface
- Clean Python API for programmatic access
Installation
From PyPI
pip install repo-flattener
From Source
git clone https://github.com/CruiseDevice/repo-flattener.git
cd repo-flattener
pip install -e .
Usage
Command Line
# Basic usage
repo-flattener /path/to/repository
# Specify output directory
repo-flattener /path/to/repository --output flattened_files
# Interactive mode - select files interactively
repo-flattener /path/to/repository --interactive
# Add custom directories to ignore
repo-flattener /path/to/repository --ignore-dirs build,dist
# Add custom file extensions to ignore
repo-flattener /path/to/repository --ignore-exts .log,.tmp
# Verbose output (DEBUG level)
repo-flattener /path/to/repository --verbose
# Quiet mode (errors only)
repo-flattener /path/to/repository --quiet
# Disable progress bar
repo-flattener /path/to/repository --no-progress
# Parallel processing with 4 workers
repo-flattener /path/to/repository --workers 4
# Auto-detect optimal number of workers
repo-flattener /path/to/repository --workers 0
# Set maximum file size (10MB = 10485760 bytes)
repo-flattener /path/to/repository --max-file-size 10485760
Progress Bar
By default, repo-flattener shows a progress bar when processing files:
Processing files: 100%|██████████| 1523/1523 [00:02<00:00, 615.24file/s]
The progress bar is automatically disabled in:
- Quiet mode (
--quiet) - When explicitly disabled (
--no-progress) - Non-interactive environments (e.g., CI/CD pipelines)
# With progress bar (default)
repo-flattener /path/to/repository
# Without progress bar
repo-flattener /path/to/repository --no-progress
Parallel Processing
For large repositories, parallel processing can significantly speed up file processing:
# Use 4 parallel workers
repo-flattener /path/to/repository --workers 4
# Auto-detect optimal number of workers
repo-flattener /path/to/repository --workers 0
# Combine with other options
repo-flattener /path/to/repository --workers 4 --verbose
Performance Tips:
- Use 2-8 workers for best performance on most systems
--workers 0auto-detects:min(32, CPU_count + 4)- More workers = faster for I/O-bound operations (reading/writing files)
- Single worker (default) has lowest memory overhead
Memory Optimization
For repositories with very large files, you can set a maximum file size to prevent loading huge files into memory:
# Skip files larger than 10MB
repo-flattener /path/to/repository --max-file-size 10485760
# Skip files larger than 50MB
repo-flattener /path/to/repository --max-file-size 52428800
# Combine with parallel processing
repo-flattener /path/to/repository --workers 4 --max-file-size 10485760
Usage Tips:
--max-file-sizeaccepts size in bytes (e.g., 10485760 for 10MB)- Default is 0 (no limit) - all files will be processed
- Files exceeding the limit are skipped and logged as warnings
- Skipped files still appear in the manifest but are not flattened
Manifest Caching
Repo-flattener automatically caches manifest generation to speed up repeated runs on unchanged repositories. The cache uses file modification times and sizes to detect changes.
# Default behavior - caching enabled
repo-flattener /path/to/repository
# Disable caching
repo-flattener /path/to/repository --no-cache
# Use custom cache directory
repo-flattener /path/to/repository --cache-dir /path/to/custom/cache
How Caching Works:
- On first run, the manifest is generated and cached with a signature based on file paths, modification times, and sizes
- On subsequent runs, if the repository hasn't changed (same files with same modification times), the cached manifest is used instantly
- If any file is modified, added, or removed, the cache is invalidated and the manifest is regenerated
- Cache is stored in
.repo_flattener_cache/by default (ignored by git) - Each repository/output directory combination has its own cache entry
Performance Benefits:
- Instant manifest generation for unchanged repositories (no file scanning needed)
- Particularly useful when running repo-flattener multiple times during development
- Cache automatically invalidates when files change, ensuring accuracy
Cache Management:
- Cache files are small (typically a few KB)
- No manual cache clearing needed - cache auto-invalidates on changes
- Use
--no-cacheto bypass cache for debugging or one-time runs - Add
.repo_flattener_cache/to your.gitignore(recommended)
Interactive Mode
Interactive mode allows you to manually select which files to process. This is useful when you want fine-grained control over which files to include.
repo-flattener /path/to/repository --interactive
In interactive mode, you'll see a list of all files and can use commands to select/deselect them:
allora- Select all filesnoneorn- Deselect all filestoggle Nort N- Toggle selection for file #Nrange N-Morr N-M- Toggle selection for files #N through #Mshowors- Show current selectiondoneord- Finish selection and proceedquitorq- Cancel and exit
Example session:
> none # Deselect all files
> range 1-5 # Select files 1 through 5
> toggle 10 # Also select file 10
> show # Review selection
> done # Process selected files
Python API
from repo_flattener import export, process_repository, scan_repository
# Simplest usage with export function
count, skipped, manifest = export('/path/to/repository', 'output')
print(f"Processed {count} files, skipped {skipped}")
# Export with options
count, skipped, manifest = export(
'/path/to/repository',
output_dir='flattened_files',
ignore_dirs=['build', 'dist'],
ignore_exts=['.log', '.tmp']
)
# Export with interactive mode
count, skipped, manifest = export(
'/path/to/repository',
'output',
interactive=True # Opens interactive file selector
)
# Export without progress bar
count, skipped, manifest = export(
'/path/to/repository',
'output',
show_progress=False
)
# Parallel processing with 4 workers
count, skipped, manifest = export(
'/path/to/repository',
'output',
max_workers=4
)
# Auto-detect optimal number of workers
count, skipped, manifest = export(
'/path/to/repository',
'output',
max_workers=0 # Auto-detect
)
# Skip files larger than 10MB
count, skipped, manifest = export(
'/path/to/repository',
'output',
max_file_size=10_000_000 # 10MB in bytes
)
# Combine parallel processing with file size limit
count, skipped, manifest = export(
'/path/to/repository',
'output',
max_workers=4,
max_file_size=10_000_000
)
# Disable caching
count, skipped, manifest = export(
'/path/to/repository',
'output',
use_cache=False
)
# Custom cache directory
count, skipped, manifest = export(
'/path/to/repository',
'output',
cache_dir='/path/to/custom/cache'
)
# Using process_repository (lower-level API)
process_repository('/path/to/repository', 'flattened_files', max_workers=4)
# Scan repository to get list of files
files = scan_repository('/path/to/repository')
print(f"Found {len(files)} files")
# Interactive selection (in a script)
files = scan_repository('/path/to/repository')
selected_files = interactive_file_selection(files)
process_repository('/path/to/repository', 'output', file_list=selected_files)
# Process specific files only
process_repository(
'/path/to/repository',
'flattened_files',
file_list=['README.md', 'src/main.py', 'src/utils.py']
)
# Error handling
from repo_flattener import InvalidRepositoryError, OutputDirectoryError
try:
export('/path/to/repository', 'output')
except InvalidRepositoryError as e:
print(f"Invalid repository: {e}")
except OutputDirectoryError as e:
print(f"Cannot create output: {e}")
Output
The tool creates a directory with:
- Flattened files named according to their original path (with path separators replaced by underscores)
- A
file_manifest.txtshowing the original repository structure
Configuration File
You can create a .repo-flattener.yml configuration file in your repository for default settings:
# .repo-flattener.yml
ignore_dirs:
- build
- dist
- coverage
ignore_exts:
- .log
- .tmp
- .cache
output_dir: flattened_output
The CLI will automatically load this file if present. Command-line arguments override configuration file settings.
Development
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=repo_flattener --cov-report=html
# Run in verbose mode
pytest -v
Installing Development Dependencies
pip install -e ".[dev]"
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file repo_flattener-0.2.1.tar.gz.
File metadata
- Download URL: repo_flattener-0.2.1.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eec1230157c872e2c46081e1a9026454a7a12a26113bb8c29ac1855734a46989
|
|
| MD5 |
beb1c14743878b0b856f6d1d69ad4366
|
|
| BLAKE2b-256 |
27933cc4bbfa45be4e81e3957cfc2546dbbfbfaa51b055727e71f60f48fb1e47
|
File details
Details for the file repo_flattener-0.2.1-py3-none-any.whl.
File metadata
- Download URL: repo_flattener-0.2.1-py3-none-any.whl
- Upload date:
- Size: 18.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58054cbd9e17fc7f722b0ff7bb2a2dadc0f00b47bb54abeab7db587a87f512b3
|
|
| MD5 |
e42b2ba26f57b828009e7e2c46468b23
|
|
| BLAKE2b-256 |
45ac4b8f99b47db966d2c4bb419be12e9b65fcd5cd554a61db1a792fdc518494
|