Skip to main content

Convert directory structure into a single text file while respecting .gitignore rules

Project description

c2c

A Python package that converts a directory structure into a single text file, preserving file contents and directory hierarchy while respecting .gitignore rules. Perfect for sharing codebase context with AI language models or creating project snapshots.

Features

  • Smart Directory Scanning: Recursively scans directories and outputs contents as a single well-formatted text file with clear delimiters between files
  • Git-Aware:
    • Fully respects .gitignore rules at both root and subdirectory levels
    • Handles negative patterns (patterns starting with !) correctly
    • Supports multiple .gitignore files in subdirectories, just like Git
  • Intelligent File Handling:
    • Automatically detects and excludes binary files to maintain output integrity
    • Full UTF-8 encoding support with proper error handling
    • Generates unique, collision-free delimiters to clearly separate files
    • Memory-efficient processing for large files
  • Flexible Configuration:
    • Custom exclude patterns via command line arguments or Python API
    • Debug mode for troubleshooting pattern matching
    • Easy integration with both CLI and Python applications
  • AI-Ready Output:
    • Generates output specifically formatted for optimal use with AI language models
    • Supports large language models like Claude, GPT-4, etc.
    • Preserves directory structure and file relationships for better context

Installation

Install from PyPI:

pip install c2c

Or install from source:

git clone https://github.com/kawataki-yoshika/c2c.git
cd c2c
pip install .

Usage

Command Line Interface

Basic usage - scan current directory:

c2c .

Scan specific directory:

c2c /path/to/directory

Exclude specific patterns:

c2c . -e "*.log" -e "temp/*"

Enable debug mode to see pattern matching details:

c2c . --debug

Save output to file:

c2c . > project_snapshot.txt

Python API

The package provides a flexible Python API for integration into your tools:

from c2c import scan_directory, create_delimiter

# Generate a unique delimiter
delimiter = create_delimiter()

# Create a file to write the output
with open('output.txt', 'w', encoding='utf-8') as output_file:
    # Basic usage with default excludes
    scan_directory(
        directory=".",
        exclude_patterns=[".git"],  # Default exclude pattern
        delimiter=delimiter,
        output_file=output_file
    )

# With custom exclude patterns
with open('output.txt', 'w', encoding='utf-8') as output_file:
    scan_directory(
        directory="/path/to/project",
        exclude_patterns=[
            ".git",  # Default
            "*.log",
            "temp/*"
        ],
        delimiter=delimiter,
        output_file=output_file,
        debug=True
    )

Using with AI Language Models

  1. Generate a snapshot of your project:
c2c . > context.txt
  1. Use in your prompts:
Here's my project structure and contents:

[paste contents of context.txt]

Could you help me understand the code structure and suggest improvements?

The output format is specifically designed to help AI models understand:

  • Project structure and hierarchical relationships
  • File contents with clear, unambiguous boundaries
  • Complete directory hierarchy and organization
  • Metadata about excluded files and patterns

Output Format

The generated output follows this structure:

# Project Directory Contents
# Format: Files are separated by a delimiter line starting with "### FILE_[uuid]"
# Each delimiter line is followed by the file path, then the file contents.
# Note: Binary files and patterns matching any .gitignore are excluded.

# DELIMITER=### FILE_[uuid]

### FILE_[uuid] src/main.py
[contents of main.py]

### FILE_[uuid] src/utils/helper.py
[contents of helper.py]

Default Excludes

By default, c2c excludes:

  • .git directories and all Git-related files
  • Binary files (automatically detected)
  • Files matching any .gitignore patterns

You can add additional patterns using the -e flag or through the Python API.

Advanced Features

GitignoreRule Handling

The GitignoreRule system provides full Git-compatible pattern matching:

  • Base directory-specific patterns for scoped ignores
  • Negative patterns with ! for pattern negation
  • Path matching with / prefix for root-relative patterns
  • Directory-only patterns (ending with /)
  • Pattern normalization and **/ pattern support

Binary File Detection

  • Smart UTF-8 decoding attempt to detect binary files
  • Configurable detection threshold
  • Ensures output integrity by excluding non-text content
  • Proper handling of various text encodings

Gitignore Processing

  • Multiple .gitignore files support with proper precedence rules
  • Pattern processing order matches Git behavior
  • Scoped rules based on .gitignore file location
  • Full support for pattern negation and complex rule combinations

Memory Efficiency

  • Efficient file handling using buffered I/O
  • Streaming output for large files
  • Minimal memory footprint even with large codebases
  • Proper resource cleanup

Contributing

We welcome contributions! Here's how you can help:

  • Submit pull requests for bug fixes or new features
  • Report bugs and suggest improvements
  • Improve documentation and examples
  • Share use cases and feature ideas

Please feel free to open issues or submit pull requests on GitHub.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

c2c-1.0.4.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

c2c-1.0.4-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file c2c-1.0.4.tar.gz.

File metadata

  • Download URL: c2c-1.0.4.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.11

File hashes

Hashes for c2c-1.0.4.tar.gz
Algorithm Hash digest
SHA256 7fb76f8ee55ef765c2e151fb424659c20c2f05958d97b69cc73eb492a28490f8
MD5 ac92418e4a8c1a4aaeecd48fe25e4825
BLAKE2b-256 603c8994b8454654a4a409da833b1a1bc8b5b54be61b4ba80e12057236504f41

See more details on using hashes here.

File details

Details for the file c2c-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: c2c-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.11

File hashes

Hashes for c2c-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a935b3965b1fff58e9fa5819161a7e230a841da837375a71d68484e82587ff5a
MD5 1d66fbf448ee9054457415e91ac2a964
BLAKE2b-256 fc13888ad3496cef784f6b0b26ff6349f7301749086eeac6006f2e46e7ec86d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page