Convert directory structure into a single text file while respecting .gitignore rules
Project description
c2c
A Python package that converts a directory structure into a single text file, preserving file contents and directory hierarchy while respecting .gitignore rules. Perfect for sharing codebase context with AI language models or creating project snapshots.
Features
- Smart Directory Scanning: Recursively scans directories and outputs contents as a single well-formatted text file with clear delimiters between files
- Git-Aware:
- Fully respects
.gitignorerules at both root and subdirectory levels - Handles negative patterns (patterns starting with
!) correctly - Supports multiple
.gitignorefiles in subdirectories, just like Git
- Fully respects
- Intelligent File Handling:
- Automatically detects and excludes binary files to maintain output integrity
- Full UTF-8 encoding support with proper error handling
- Generates unique, collision-free delimiters to clearly separate files
- Memory-efficient processing for large files
- Flexible Configuration:
- Custom exclude patterns via command line arguments or Python API
- Debug mode for troubleshooting pattern matching
- Easy integration with both CLI and Python applications
- AI-Ready Output:
- Generates output specifically formatted for optimal use with AI language models
- Supports large language models like Claude, GPT-4, etc.
- Preserves directory structure and file relationships for better context
Installation
Install from PyPI:
pip install c2c
Or install from source:
git clone https://github.com/kawataki-yoshika/c2c.git
cd c2c
pip install .
Usage
Command Line Interface
Basic usage - scan current directory:
c2c .
Scan specific directory:
c2c /path/to/directory
Exclude specific patterns:
c2c . -e "*.log" -e "temp/*"
Enable debug mode to see pattern matching details:
c2c . --debug
Save output to file:
c2c . > project_snapshot.txt
Python API
The package provides a flexible Python API for integration into your tools:
from c2c import scan_directory, create_delimiter
# Generate a unique delimiter
delimiter = create_delimiter()
# Create a file to write the output
with open('output.txt', 'w', encoding='utf-8') as output_file:
# Basic usage with default excludes
scan_directory(
directory=".",
exclude_patterns=[".git"], # Default exclude pattern
delimiter=delimiter,
output_file=output_file
)
# With custom exclude patterns
with open('output.txt', 'w', encoding='utf-8') as output_file:
scan_directory(
directory="/path/to/project",
exclude_patterns=[
".git", # Default
"*.log",
"temp/*"
],
delimiter=delimiter,
output_file=output_file,
debug=True
)
Using with AI Language Models
- Generate a snapshot of your project:
c2c . > context.txt
- Use in your prompts:
Here's my project structure and contents:
[paste contents of context.txt]
Could you help me understand the code structure and suggest improvements?
The output format is specifically designed to help AI models understand:
- Project structure and hierarchical relationships
- File contents with clear, unambiguous boundaries
- Complete directory hierarchy and organization
- Metadata about excluded files and patterns
Output Format
The generated output follows this structure:
# Project Directory Contents
# Format: Files are separated by a delimiter line starting with "### FILE_[uuid]"
# Each delimiter line is followed by the file path, then the file contents.
# Note: Binary files and patterns matching any .gitignore are excluded.
# DELIMITER=### FILE_[uuid]
### FILE_[uuid] src/main.py
[contents of main.py]
### FILE_[uuid] src/utils/helper.py
[contents of helper.py]
Default Excludes
By default, c2c excludes:
.gitdirectories and all Git-related files- Binary files (automatically detected)
- Files matching any
.gitignorepatterns
You can add additional patterns using the -e flag or through the Python API.
Advanced Features
GitignoreRule Handling
The GitignoreRule system provides full Git-compatible pattern matching:
- Base directory-specific patterns for scoped ignores
- Negative patterns with
!for pattern negation - Path matching with
/prefix for root-relative patterns - Directory-only patterns (ending with
/) - Pattern normalization and
**/pattern support
Binary File Detection
- Smart UTF-8 decoding attempt to detect binary files
- Configurable detection threshold
- Ensures output integrity by excluding non-text content
- Proper handling of various text encodings
Gitignore Processing
- Multiple
.gitignorefiles support with proper precedence rules - Pattern processing order matches Git behavior
- Scoped rules based on
.gitignorefile location - Full support for pattern negation and complex rule combinations
Memory Efficiency
- Efficient file handling using buffered I/O
- Streaming output for large files
- Minimal memory footprint even with large codebases
- Proper resource cleanup
Contributing
We welcome contributions! Here's how you can help:
- Submit pull requests for bug fixes or new features
- Report bugs and suggest improvements
- Improve documentation and examples
- Share use cases and feature ideas
Please feel free to open issues or submit pull requests on GitHub.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file c2c-1.0.4.tar.gz.
File metadata
- Download URL: c2c-1.0.4.tar.gz
- Upload date:
- Size: 11.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fb76f8ee55ef765c2e151fb424659c20c2f05958d97b69cc73eb492a28490f8
|
|
| MD5 |
ac92418e4a8c1a4aaeecd48fe25e4825
|
|
| BLAKE2b-256 |
603c8994b8454654a4a409da833b1a1bc8b5b54be61b4ba80e12057236504f41
|
File details
Details for the file c2c-1.0.4-py3-none-any.whl.
File metadata
- Download URL: c2c-1.0.4-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a935b3965b1fff58e9fa5819161a7e230a841da837375a71d68484e82587ff5a
|
|
| MD5 |
1d66fbf448ee9054457415e91ac2a964
|
|
| BLAKE2b-256 |
fc13888ad3496cef784f6b0b26ff6349f7301749086eeac6006f2e46e7ec86d2
|