Skip to main content

Convert directory structure into a single text file while respecting .gitignore rules

Project description

c2c

A Python package that converts a directory structure into a single text file, preserving file contents and directory hierarchy while respecting .gitignore rules. Perfect for sharing codebase context with AI language models or creating project snapshots.

Features

  • Smart Directory Scanning: Recursively scans directories and outputs contents as a single well-formatted text file with clear delimiters between files
  • Git-Aware:
    • Fully respects .gitignore rules at both root and subdirectory levels
    • Handles negative patterns (patterns starting with !) correctly
    • Supports multiple .gitignore files in subdirectories, just like Git
  • Intelligent File Handling:
    • Automatically detects and excludes binary files to maintain output integrity
    • Full UTF-8 encoding support with proper error handling
    • Generates unique, collision-free delimiters to clearly separate files
  • Flexible Configuration:
    • Custom exclude patterns via command line arguments or Python API
    • Debug mode for troubleshooting pattern matching
    • Easy integration with both CLI and Python applications
  • AI-Ready Output:
    • Generates output specifically formatted for optimal use with AI language models
    • Supports large language models like Claude, GPT-4, etc.
    • Preserves directory structure and file relationships for better context

Installation

Install from PyPI:

pip install c2c

Or install from source:

git clone https://github.com/kawataki-yoshika/c2c.git
cd c2c
pip install .

Usage

Command Line Interface

Basic usage - scan current directory:

c2c .

Scan specific directory:

c2c /path/to/directory

Exclude specific patterns:

c2c . -e "*.log" -e "temp/*"

Enable debug mode to see pattern matching details:

c2c . --debug

Save output to file:

c2c . > project_snapshot.txt

Python API

The package provides a flexible Python API for integration into your tools:

from c2c import scan_directory, create_delimiter

# Generate a unique delimiter
delimiter = create_delimiter()

# Basic usage with default excludes
scan_directory(
    directory=".",
    exclude_patterns=[".git"],  # Default exclude pattern
    delimiter=delimiter
)

# With custom exclude patterns
scan_directory(
    directory="/path/to/project",
    exclude_patterns=[
        ".git",  # Default
        "*.log",
        "temp/*"
    ],
    delimiter=delimiter,
    debug=True
)

Using with AI Language Models

  1. Generate a snapshot of your project:
c2c . > context.txt
  1. Use in your prompts:
Here's my project structure and contents:

[paste contents of context.txt]

Could you help me understand the code structure and suggest improvements?

The output format is specifically designed to help AI models understand:

  • Project structure and hierarchical relationships
  • File contents with clear, unambiguous boundaries
  • Complete directory hierarchy and organization
  • Metadata about excluded files and patterns

Output Format

The generated output follows this structure:

# Project Directory Contents
# Format: Files are separated by a delimiter line starting with "### FILE_[uuid]"
# Each delimiter line is followed by the file path, then the file contents.
# Note: Binary files and patterns matching any .gitignore are excluded.

# DELIMITER=### FILE_[uuid]

### FILE_[uuid] src/main.py
[contents of main.py]

### FILE_[uuid] src/utils/helper.py
[contents of helper.py]

Default Excludes

By default, c2c excludes:

  • .git directories and all Git-related files
  • Binary files (automatically detected)
  • Files matching any .gitignore patterns

You can add additional patterns using the -e flag or through the Python API.

Advanced Features

GitignoreRule Handling

The GitignoreRule system provides full Git-compatible pattern matching:

  • Base directory-specific patterns for scoped ignores
  • Negative patterns with ! for pattern negation
  • Path matching with / prefix for root-relative patterns
  • Directory-only patterns (ending with /)
  • Pattern normalization and **/ pattern support

Binary File Detection

  • Smart UTF-8 decoding attempt to detect binary files
  • Configurable detection threshold
  • Ensures output integrity by excluding non-text content
  • Proper handling of various text encodings

Gitignore Processing

  • Multiple .gitignore files support with proper precedence rules
  • Pattern processing order matches Git behavior
  • Scoped rules based on .gitignore file location
  • Full support for pattern negation and complex rule combinations

Contributing

We welcome contributions! Here's how you can help:

  • Submit pull requests for bug fixes or new features
  • Report bugs and suggest improvements
  • Improve documentation and examples
  • Share use cases and feature ideas

Please feel free to open issues or submit pull requests on GitHub.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

c2c-1.0.3.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

c2c-1.0.3-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file c2c-1.0.3.tar.gz.

File metadata

  • Download URL: c2c-1.0.3.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.11

File hashes

Hashes for c2c-1.0.3.tar.gz
Algorithm Hash digest
SHA256 9a419dcc899f25230464b51c9ab46c45765f92df459390a439679c8b161d4b90
MD5 1f3ce94345a0ddb3aad6f5cabbe6f6c3
BLAKE2b-256 cab58d2f9cb285af31e9bb14b91873956e947c8125235ecee00df9f701a0cd98

See more details on using hashes here.

File details

Details for the file c2c-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: c2c-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.11

File hashes

Hashes for c2c-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5c6cc7fb6bc20c1dddcd4151f6a68be62336ef54f31d726d0beb8274cb2c1e99
MD5 4095bb3b0feb6816d0ab065cc3131ae6
BLAKE2b-256 c4c77d87d1e5551c5717110441386c41131994eb4bb4dc2301e7203f7a24a4b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page