Skip to main content

collect-context: Makes the process of collecting and sending context to an LLM like ChatGPT-4o as easy as possible.

Project description

ccontext

ccontext (collect-context) is a cross-platform utility designed to streamline the process of gathering and sending the context of a directory to large language models (LLMs) like ChatGPT-4o. Our mission is to make collecting and sending context to an LLM as easy as possible.

🚀 Demo: Witness ccontext in Action! 🎥

⚠️ Warning: You May Be Amazed! 🤯

https://github.com/user-attachments/assets/c0a98dbc-d971-41dc-abe1-dad4be42e1ee

Features

Features

  • 🌟 Easy Setup: Quick installation and configuration.
  • 🌍 Cross-Platform Support: Supports Windows, macOS, and Linux.
  • 💾 Binary File Support: Handle various binary files including PDFs, Word documents, images, audio, and video files.
  • 📄 Markdown and PDF Generation: Generate detailed Markdown and PDF files of the directory structure and file contents.
  • 🌐 Crawling of (documentation) Sites: Crawl and gather data from multiple sites using a specified list of URLs.
  • ✂️ Tokenization and Chunking: Automatically handles tokenization and chunking to stay within LLM token limits.
  • 🔧 Configurable Exclusions and Inclusions: Flexibly specify which files and directories to include or exclude.
  • 🗣️ Verbose Output: Optional verbose mode for detailed output and debugging.
  • 📝 Prompt Templates (Upcoming): Create and use custom templates for different types of prompts.

Table of Contents

Installation

Using pipx (Recommended)

We recommend installing ccontext using pipx. pipx is a tool that lets you install and run Python applications in isolated environments, ensuring clean installation and easy management of CLI applications.

  1. First, install pipx if you haven't already:

    # On macOS
    brew install pipx
    pipx ensurepath
    
    # On Ubuntu/Debian
    sudo apt install pipx
    pipx ensurepath
    
    # On Windows
    python -m pip install --user pipx
    python -m pipx ensurepath
    # or read https://pipx.pypa.io/stable/installation/#on-windows
    
  2. Install ccontext using pipx:

    pipx install ccontext
    

Why use pipx?

  • Isolated Environment: Each application runs in its own virtual environment
  • No Dependency Conflicts: Avoids conflicts with other Python packages
  • Easy Updates: Simple command to upgrade (pipx upgrade ccontext)
  • Clean Uninstallation: Remove everything with one command (pipx uninstall ccontext)
  • Global Access: Installed applications are available system-wide

Alternative: Installing from Source

If you prefer to install from source:

  1. Clone the repository:

    git clone https://github.com/oxillix/ccontext.git
    cd ccontext
    
  2. Set up a virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Install the package:

    pip install .
    

Usage

Basic Usage

  1. Run ccontext in the folder to ccollect with default settings defined in ~/.ccontext/config.json:

    ccontext
    
  2. Specify a root path, exclusions, and inclusions:

    ccontext -p /path/to/directory -e ".git|node_modules" -i "important_file.txt|docs"
    

Command-Line Arguments

  • -h, --help: Show help message.
  • -p, --root_path: The root path to start the directory tree (default: current directory).
  • -e, --excludes: Additional files or directories to exclude, separated by |, e.g., node_modules|.git.
  • -i, --includes: Files or directories to include, separated by |, e.g., important_file.txt|docs.
  • -m, --max_tokens: Maximum number of tokens allowed before chunking.
  • -c, --config: Path to a custom configuration file.
  • -v, --verbose: Enable verbose output to stdout.
  • -ig, --ignore_gitignore: Ignore the .gitignore file for exclusions.
  • -g, --generate-pdf: Generate a PDF of the directory tree and file contents.
  • -gm, --generate-md: Generate a Markdown file of the directory tree and file contents.
  • --crawl: Crawls the sites specified in the config.

Example

ccontext -p /home/user/project -e ".git|build" -i "README.md|src"

Configuration

Configuration File Location

ccontext looks for configuration in the following order:

  1. Custom config file specified via -c argument
  2. .ccontext-config.json in the current directory
    • If present, ccontext will automatically detect and use this local configuration file
    • Create this file in the same directory where you run the ccontext command
  3. ~/.ccontext/config.json (default user configuration)

Configuration Options

{
  "verbose": false, // Enable detailed output
  "max_tokens": 115000, // Maximum tokens before chunking
  "model_type": "gpt-4o", // LLM model type for tokenization
  "buffer_size": 0.05, // Token buffer size (0-1)

  // System prompt for LLM context
  "context_prompt": "[[SYSTEM INSTRUCTIONS]] The following output represents...",

  // Web crawler configuration
  "urls_to_crawl": [
    {
      "url": "https://www.django-rest-framework.org/",
      "match": ["https://www.django-rest-framework.org/**"],
      "exclude": ["https://www.django-rest-framework.org/community/**"],
      "selector": "",
      "maxPagesToCrawl": 100,
      "outputFileName": "django-rest-framework.org.json",
      "maxTokens": 10000000
    }
  ],

  // Files/folders to explicitly include
  "included_folders_files": [],

  // Files/folders to exclude (supports glob patterns)
  "excluded_folders_files": [
    "**/.git",
    "**/bin",
    "**/build",
    "**/node_modules/**",
    "**/venv",
    "**/__pycache__",
    "**/package-lock.json",
    "**/ccontext.egg-info",
    "**/dist",
    "**/__tests__",
    "**/coverage",
    "**/.next",
    "**/pnpm-lock.yaml",
    "**/poetry.lock",
    "**/ccontext-output.pdf",
    "**/ccontext-output.md",
    "**/*.phpstorm.meta.php",
    "**/*.min.js",
    "**/composer.lock",
    "**/*.lock",
    "**/vendor",
    "**/laravel_access.log",
    "**/gpt-crawler",
    "**/*.DS_Store",
    "**/*.tox"
  ],

  // File extensions that can be uploaded to LLMs
  "uploadable_extensions": [
    // Documents
    ".pdf",
    ".doc",
    ".docx",
    ".xls",
    ".xlsx",
    ".ppt",
    ".pptx",

    // Images
    ".jpg",
    ".jpeg",
    ".png",
    ".gif",
    ".bmp",
    ".tiff",
    ".webp",
    ".heic",

    // Audio
    ".mp3",
    ".wav",
    ".ogg",
    ".flac",
    ".aac",
    ".m4a",

    // Video
    ".mp4",
    ".mkv",
    ".avi",
    ".mov",
    ".wmv",
    ".webm",

    // Archives
    ".zip",
    ".rar",
    ".7z",
    ".tar",
    ".gz",

    // Binary/System
    ".exe",
    ".dll",
    ".iso",
    ".dmg",
    ".bin",
    ".dat",
    ".apk",
    ".img",
    ".so",
    ".swf",
    ".psd"
  ]
}

Understanding Glob Patterns

ccontext uses the wcmatch library for glob pattern matching, which gives you powerful but easy-to-use file matching capabilities. Here's a simple guide to using glob patterns:

  1. Important Wildcards Explained:

    • * (single star): Matches anything in the current folder only

      "*.txt"      # Matches: a.txt, b.txt  (in current folder)
      "*.txt"      # Won't match: sub/a.txt, deep/sub/b.txt
      
    • ** (double star): Matches any number of folders

      "**/temp"    # Matches: temp, sub/temp, deep/sub/temp
      "**/temp"    # Won't match: temp/file.txt
      
    • **/* (double star slash star): Matches everything in all folders

      "**/*.txt"   # Matches: a.txt, sub/b.txt, very/deep/c.txt
      "**/*"       # Matches everything, everywhere
      
    • ? matches any single character

    • .txt matches exact file extension

  2. Simple Examples:

    {
      "excluded_folders_files": [
        // Basic matching
        "temp.txt", // Matches exact file temp.txt
        "*.txt", // Matches all .txt files in root folder
        "**/*.txt", // Matches all .txt files in any folder
    
        // Folder matching
        "temp/*", // Matches everything in temp folder
        "**/temp", // Matches temp folder anywhere
        "**/temp/**", // Matches everything in any temp folder
    
        // Common use cases
        "**/node_modules", // Matches node_modules folders anywhere
        "**/__pycache__", // Matches Python cache folders
        "**/*.pyc", // Matches Python compiled files
        "build/*" // Matches everything in build folder
      ]
    }
    
  3. Tips for Beginners:

    • Start simple! Use *.ext for file extensions
    • Use **/ when you want to match in any folder
    • Test your patterns with a small folder first
    • When in doubt, be more specific
    • Remember, patterns are case-sensitive

The glob system is very forgiving - if you make a mistake, it usually just won't match anything rather than causing errors. Feel free to experiment!

Configuration Options Explained

Option Description Default
verbose Enable detailed output false
max_tokens Maximum tokens before chunking 115000
model_type LLM model type for tokenization "gpt-4o"
buffer_size Token buffer size (0-1) 0.05
excluded_folders_files Glob patterns for exclusion [".git", ...]
included_folders_files Glob patterns for inclusion []
uploadable_extensions File extensions to upload [".pdf", ...]

Binary File Handling

ccontext supports handling binary files through the uploadable_extensions configuration.

Supported Binary Files

  • Documents: .pdf, .doc, .docx, .xls, .xlsx, .ppt, .pptx
  • Images: .jpg, .jpeg, .png, .gif, .bmp, .tiff, .webp, .heic
  • Audio: .mp3, .wav, .ogg, .flac, .aac, .m4a
  • Video: .mp4, .mkv, .avi, .mov, .wmv, .webm
  • Archives: .zip, .rar, .7z, .tar, .gz
  • Binary/System: .exe, .dll, .iso, .dmg, .bin, .dat, .apk, .img, .so, .swf, .psd

Binary File Processing

  • Binary files matching uploadable_extensions are prepared for upload to LLMs
  • File references are automatically copied to clipboard
  • Most LLM providers limit maximum of X binary files per prompt
  • Rate limits may apply based on your LLM provider

Example configuration for handling specific file types:

{
  "uploadable_extensions": [".pdf", ".jpg", ".png", ".xlsx"]
}

Document Crawling

The crawling feature allows you to gather documentation from websites for context.

Crawler Configuration

{
  "urls_to_crawl": [
    {
      "url": "https://docs.example.com",
      "match": ["https://docs.example.com/**"],
      "exclude": ["https://docs.example.com/internal/**"],
      "selector": "",
      "maxPagesToCrawl": 100,
      "outputFileName": "docs.json",
      "maxTokens": 2000000
    }
  ]
}

Crawler Options

  • url: Starting URL for crawling
  • match: Glob patterns for URLs to include
  • exclude: Glob patterns for URLs to exclude
  • selector: CSS selector for content extraction
  • maxPagesToCrawl: Limit on pages to crawl
  • outputFileName: Name of output file
  • maxTokens: Maximum tokens to collect

Best Practices

  • Use specific match patterns
  • Respect robots.txt and site policies

Use Cases and Examples

Common Usage Patterns

  1. Analyzing a Python Project
ccontext -p /path/to/project -e "venv|__pycache__|*.pyc"
  1. Processing Documentation
ccontext -p ./docs --crawl -gm
  1. Including Specific Files
ccontext -i "README.md|docs/*|*.py"
  1. Generating PDF and Markdown
ccontext -g -gm  # Generates both PDF and Markdown

Integration Examples

  1. With GitHub Copilot
ccontext -p . -e "node_modules|dist" -i "src/**/*.ts"
  1. **With ChatGPT (webapp has max 32k) **
ccontext -p . --max_tokens 32000

Troubleshooting

Common Issues

  1. Clipboard Issues in SSH

    • Issue: Cannot copy to clipboard in SSH session
    • Solution:
      • Use SSH with X11 forwarding (ssh -X user@host), test using xeyes
      • On Mac, install XQuartz (brew install --cask xquartz)
  2. Token Limit Exceeded

    • Issue: Content too large for LLM
    • Solution: Adjust max_tokens or use chunking feature
  3. Binary File Handling

    • Issue: Binary files not being processed
    • Solution: Check uploadable_extensions configuration

Platform-Specific Issues

Windows: Use WSL if possible!

Otherwise:

  • Issue: Path separators in configuration
  • Solution: Use forward slashes or escaped backslashes

Linux

  • Issue: X11 clipboard support
  • Solution: Install xclip or xsel

macOS

  • Issue: Clipboard permissions
  • Solution: Grant terminal app accessibility permissions

Development Guide

Project Structure

ccontext/
├── ccontext/           # Main package directory
│   ├── __init__.py
│   ├── main.py         # Entry point
│   ├── file_tree.py    # Tree operations
│   └── ...
├── tests/              # Test directory
├── docs/               # Documentation
└── examples/           # Example configurations

Development Setup

  1. Clone the repository
  2. Create a virtual environment
  3. Install development dependencies
  4. Run tests
git clone https://github.com/oxillix/ccontext.git
# or
git clone git@github.com:NicolasArnouts/ccontext.git
cd ccontext
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
pip3 install -e .

Contributing Guidelines

  1. Fork the repository
  2. Create a feature branch
  3. Write tests for new features
  4. Submit a pull request

Code Style

  • Follow PEP 8 guidelines
  • use isort and black
  • Use type hints
  • Keep functions focused and small

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Thanks to all contributors! 😊
  • Inspired by the need for better context handling in AI interactions.
  • Built with love and passion for the developer community! 💖

Feel free to raise issues or contribute to the project. We appreciate your support!

Happy coding adventures! 🚀 Nicolas Arnouts

Looking for a skilled freelancer? I’m available for hire! Let’s collaborate — reach out to me at: arnouts.software@gmail.com


Badges

PyPI version MIT License Platform

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ccontext-0.3.7.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ccontext-0.3.7-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file ccontext-0.3.7.tar.gz.

File metadata

  • Download URL: ccontext-0.3.7.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ccontext-0.3.7.tar.gz
Algorithm Hash digest
SHA256 5a5bb07574439575b21e03c0c142a294584be7c89f04fc7d3c41033c48b38ace
MD5 a3512093d26a0e56c7d0a2f6af45378b
BLAKE2b-256 214ddff89fbb628e02733114f3e7591507a4e64c142c3114f7a41c82365ebdb3

See more details on using hashes here.

Provenance

The following attestation bundles were made for ccontext-0.3.7.tar.gz:

Publisher: publish-to-pypi.yml on NicolasArnouts/ccontext

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ccontext-0.3.7-py3-none-any.whl.

File metadata

  • Download URL: ccontext-0.3.7-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ccontext-0.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 108169ba8c949c071fb220c3a4211cdfdb4b455f0025e26198595790cf50cbec
MD5 7078712dc6ceae3ce5b9880db62e026b
BLAKE2b-256 9c269fa734b3ead8a50cc1d65eae86bb56ad7062dd2deabe965260e1aad7e62e

See more details on using hashes here.

Provenance

The following attestation bundles were made for ccontext-0.3.7-py3-none-any.whl:

Publisher: publish-to-pypi.yml on NicolasArnouts/ccontext

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page