A tool to convert a repository into flattened files for easier LLM upload

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Repo Flattener

A Python package to convert a repository into flattened files for easier uploading to Large Language Models (LLMs).

Features

Flattens repository structure by creating single files with path information
Creates a manifest file showing the original structure
Configurable ignore lists for directories and file extensions
Interactive mode for selective file processing
Type-safe with full type hints
Robust error handling with custom exceptions
Configurable logging with verbose and quiet modes
Progress bar for visual feedback during processing
Parallel processing for faster performance on large repositories
Memory optimization with configurable file size limits
Intelligent caching for instant manifest generation on unchanged repositories
Configuration file support (.repo-flattener.yml)
Simple command-line interface
Clean Python API for programmatic access

Installation

From PyPI

pip install repo-flattener

From Source

git clone https://github.com/CruiseDevice/repo-flattener.git
cd repo-flattener
pip install -e .

Usage

Command Line

# Basic usage
repo-flattener /path/to/repository

# Specify output directory
repo-flattener /path/to/repository --output flattened_files

# Interactive mode - select files interactively
repo-flattener /path/to/repository --interactive

# Add custom directories to ignore
repo-flattener /path/to/repository --ignore-dirs build,dist

# Add custom file extensions to ignore
repo-flattener /path/to/repository --ignore-exts .log,.tmp

# Verbose output (DEBUG level)
repo-flattener /path/to/repository --verbose

# Quiet mode (errors only)
repo-flattener /path/to/repository --quiet

# Disable progress bar
repo-flattener /path/to/repository --no-progress

# Parallel processing with 4 workers
repo-flattener /path/to/repository --workers 4

# Auto-detect optimal number of workers
repo-flattener /path/to/repository --workers 0

# Set maximum file size (10MB = 10485760 bytes)
repo-flattener /path/to/repository --max-file-size 10485760

Progress Bar

By default, repo-flattener shows a progress bar when processing files:

Processing files: 100%|██████████| 1523/1523 [00:02<00:00, 615.24file/s]

The progress bar is automatically disabled in:

Quiet mode (--quiet)
When explicitly disabled (--no-progress)
Non-interactive environments (e.g., CI/CD pipelines)

# With progress bar (default)
repo-flattener /path/to/repository

# Without progress bar
repo-flattener /path/to/repository --no-progress

Parallel Processing

For large repositories, parallel processing can significantly speed up file processing:

# Use 4 parallel workers
repo-flattener /path/to/repository --workers 4

# Auto-detect optimal number of workers
repo-flattener /path/to/repository --workers 0

# Combine with other options
repo-flattener /path/to/repository --workers 4 --verbose

Performance Tips:

Use 2-8 workers for best performance on most systems
--workers 0 auto-detects: min(32, CPU_count + 4)
More workers = faster for I/O-bound operations (reading/writing files)
Single worker (default) has lowest memory overhead

Memory Optimization

For repositories with very large files, you can set a maximum file size to prevent loading huge files into memory:

# Skip files larger than 10MB
repo-flattener /path/to/repository --max-file-size 10485760

# Skip files larger than 50MB
repo-flattener /path/to/repository --max-file-size 52428800

# Combine with parallel processing
repo-flattener /path/to/repository --workers 4 --max-file-size 10485760

Usage Tips:

--max-file-size accepts size in bytes (e.g., 10485760 for 10MB)
Default is 0 (no limit) - all files will be processed
Files exceeding the limit are skipped and logged as warnings
Skipped files still appear in the manifest but are not flattened

Manifest Caching

Repo-flattener automatically caches manifest generation to speed up repeated runs on unchanged repositories. The cache uses file modification times and sizes to detect changes.

# Default behavior - caching enabled
repo-flattener /path/to/repository

# Disable caching
repo-flattener /path/to/repository --no-cache

# Use custom cache directory
repo-flattener /path/to/repository --cache-dir /path/to/custom/cache

How Caching Works:

On first run, the manifest is generated and cached with a signature based on file paths, modification times, and sizes
On subsequent runs, if the repository hasn't changed (same files with same modification times), the cached manifest is used instantly
If any file is modified, added, or removed, the cache is invalidated and the manifest is regenerated
Cache is stored in .repo_flattener_cache/ by default (ignored by git)
Each repository/output directory combination has its own cache entry

Performance Benefits:

Instant manifest generation for unchanged repositories (no file scanning needed)
Particularly useful when running repo-flattener multiple times during development
Cache automatically invalidates when files change, ensuring accuracy

Cache Management:

Cache files are small (typically a few KB)
No manual cache clearing needed - cache auto-invalidates on changes
Use --no-cache to bypass cache for debugging or one-time runs
Add .repo_flattener_cache/ to your .gitignore (recommended)

Interactive Mode

Interactive mode allows you to manually select which files to process. This is useful when you want fine-grained control over which files to include.

repo-flattener /path/to/repository --interactive

In interactive mode, you'll see a list of all files and can use commands to select/deselect them:

all or a - Select all files
none or n - Deselect all files
toggle N or t N - Toggle selection for file #N
range N-M or r N-M - Toggle selection for files #N through #M
show or s - Show current selection
done or d - Finish selection and proceed
quit or q - Cancel and exit

Example session:

> none          # Deselect all files
> range 1-5     # Select files 1 through 5
> toggle 10     # Also select file 10
> show          # Review selection
> done          # Process selected files

Python API

from repo_flattener import export, process_repository, scan_repository

# Simplest usage with export function
count, skipped, manifest = export('/path/to/repository', 'output')
print(f"Processed {count} files, skipped {skipped}")

# Export with options
count, skipped, manifest = export(
    '/path/to/repository',
    output_dir='flattened_files',
    ignore_dirs=['build', 'dist'],
    ignore_exts=['.log', '.tmp']
)

# Export with interactive mode
count, skipped, manifest = export(
    '/path/to/repository',
    'output',
    interactive=True  # Opens interactive file selector
)

# Export without progress bar
count, skipped, manifest = export(
    '/path/to/repository',
    'output',
    show_progress=False
)

# Parallel processing with 4 workers
count, skipped, manifest = export(
    '/path/to/repository',
    'output',
    max_workers=4
)

# Auto-detect optimal number of workers
count, skipped, manifest = export(
    '/path/to/repository',
    'output',
    max_workers=0  # Auto-detect
)

# Skip files larger than 10MB
count, skipped, manifest = export(
    '/path/to/repository',
    'output',
    max_file_size=10_000_000  # 10MB in bytes
)

# Combine parallel processing with file size limit
count, skipped, manifest = export(
    '/path/to/repository',
    'output',
    max_workers=4,
    max_file_size=10_000_000
)

# Disable caching
count, skipped, manifest = export(
    '/path/to/repository',
    'output',
    use_cache=False
)

# Custom cache directory
count, skipped, manifest = export(
    '/path/to/repository',
    'output',
    cache_dir='/path/to/custom/cache'
)

# Using process_repository (lower-level API)
process_repository('/path/to/repository', 'flattened_files', max_workers=4)

# Scan repository to get list of files
files = scan_repository('/path/to/repository')
print(f"Found {len(files)} files")

# Interactive selection (in a script)
files = scan_repository('/path/to/repository')
selected_files = interactive_file_selection(files)
process_repository('/path/to/repository', 'output', file_list=selected_files)

# Process specific files only
process_repository(
    '/path/to/repository',
    'flattened_files',
    file_list=['README.md', 'src/main.py', 'src/utils.py']
)

# Error handling
from repo_flattener import InvalidRepositoryError, OutputDirectoryError

try:
    export('/path/to/repository', 'output')
except InvalidRepositoryError as e:
    print(f"Invalid repository: {e}")
except OutputDirectoryError as e:
    print(f"Cannot create output: {e}")

Output

The tool creates a directory with:

Flattened files named according to their original path (with path separators replaced by underscores)
A file_manifest.txt showing the original repository structure

Configuration File

You can create a .repo-flattener.yml configuration file in your repository for default settings:

# .repo-flattener.yml
ignore_dirs:
  - build
  - dist
  - coverage
ignore_exts:
  - .log
  - .tmp
  - .cache
output_dir: flattened_output

The CLI will automatically load this file if present. Command-line arguments override configuration file settings.

Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=repo_flattener --cov-report=html

# Run in verbose mode
pytest -v

Installing Development Dependencies

pip install -e ".[dev]"

License

MIT License

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.2.1

Nov 13, 2025

0.1.1

Nov 13, 2025

0.1.0

Mar 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repo_flattener-0.2.1.tar.gz (19.1 kB view details)

Uploaded Nov 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

repo_flattener-0.2.1-py3-none-any.whl (18.7 kB view details)

Uploaded Nov 13, 2025 Python 3

File details

Details for the file repo_flattener-0.2.1.tar.gz.

File metadata

Download URL: repo_flattener-0.2.1.tar.gz
Upload date: Nov 13, 2025
Size: 19.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for repo_flattener-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`eec1230157c872e2c46081e1a9026454a7a12a26113bb8c29ac1855734a46989`
MD5	`beb1c14743878b0b856f6d1d69ad4366`
BLAKE2b-256	`27933cc4bbfa45be4e81e3957cfc2546dbbfbfaa51b055727e71f60f48fb1e47`

See more details on using hashes here.

File details

Details for the file repo_flattener-0.2.1-py3-none-any.whl.

File metadata

Download URL: repo_flattener-0.2.1-py3-none-any.whl
Upload date: Nov 13, 2025
Size: 18.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for repo_flattener-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`58054cbd9e17fc7f722b0ff7bb2a2dadc0f00b47bb54abeab7db587a87f512b3`
MD5	`e42b2ba26f57b828009e7e2c46468b23`
BLAKE2b-256	`45ac4b8f99b47db966d2c4bb419be12e9b65fcd5cd554a61db1a792fdc518494`

See more details on using hashes here.

repo-flattener 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Repo Flattener

Features

Installation

From PyPI

From Source

Usage

Command Line

Progress Bar

Parallel Processing

Memory Optimization

Manifest Caching

Interactive Mode

Python API

Output

Configuration File

Development

Running Tests

Installing Development Dependencies

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes