collect-context: Makes the process of collecting and sending context to an LLM like ChatGPT-4o as easy as possible.
Project description
ccontext
ccontext (collect-context) is a cross-platform utility designed to streamline the process of gathering and sending the context of a directory to large language models (LLMs) like ChatGPT-4o. Our mission is to make collecting and sending context to an LLM as easy as possible.
🚀 Demo: Witness ccontext in Action! 🎥
⚠️ Warning: You May Be Amazed! 🤯
https://github.com/user-attachments/assets/c0a98dbc-d971-41dc-abe1-dad4be42e1ee
Features
Features
- 🌟 Easy Setup: Quick installation and configuration.
- 🌍 Cross-Platform Support: Supports Windows, macOS, and Linux.
- 💾 Binary File Support: Handle various binary files including PDFs, Word documents, images, audio, and video files.
- 📄 Markdown and PDF Generation: Generate detailed Markdown and PDF files of the directory structure and file contents.
- 🌐 Crawling of (documentation) Sites: Crawl and gather data from multiple sites using a specified list of URLs.
- ✂️ Tokenization and Chunking: Automatically handles tokenization and chunking to stay within LLM token limits.
- 🔧 Configurable Exclusions and Inclusions: Flexibly specify which files and directories to include or exclude.
- 🗣️ Verbose Output: Optional verbose mode for detailed output and debugging.
- 📝 Prompt Templates (Upcoming): Create and use custom templates for different types of prompts.
Table of Contents
- Installation
- Usage
- Configuration
- Binary File Handling
- Document Crawling
- Use Cases and Examples
- Troubleshooting
- Development Guide
Installation
Using pipx (Recommended)
We recommend installing ccontext using pipx. pipx is a tool that lets you install and run Python applications in isolated environments, ensuring clean installation and easy management of CLI applications.
-
First, install pipx if you haven't already:
# On macOS brew install pipx pipx ensurepath # On Ubuntu/Debian sudo apt install pipx pipx ensurepath # On Windows python -m pip install --user pipx python -m pipx ensurepath # or read https://pipx.pypa.io/stable/installation/#on-windows
-
Install ccontext using pipx:
pipx install ccontext
Why use pipx?
- Isolated Environment: Each application runs in its own virtual environment
- No Dependency Conflicts: Avoids conflicts with other Python packages
- Easy Updates: Simple command to upgrade (
pipx upgrade ccontext) - Clean Uninstallation: Remove everything with one command (
pipx uninstall ccontext) - Global Access: Installed applications are available system-wide
Alternative: Installing from Source
If you prefer to install from source:
-
Clone the repository:
git clone https://github.com/oxillix/ccontext.git cd ccontext
-
Set up a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Install the package:
pip install .
Usage
Basic Usage
-
Run
ccontextin the folder to ccollect with default settings defined in~/.ccontext/config.json:ccontext
-
Specify a root path, exclusions, and inclusions:
ccontext -p /path/to/directory -e ".git|node_modules" -i "important_file.txt|docs"
Command-Line Arguments
-h, --help: Show help message.-p, --root_path: The root path to start the directory tree (default: current directory).-e, --excludes: Additional files or directories to exclude, separated by|, e.g.,node_modules|.git.-i, --includes: Files or directories to include, separated by|, e.g.,important_file.txt|docs.-m, --max_tokens: Maximum number of tokens allowed before chunking.-c, --config: Path to a custom configuration file.-v, --verbose: Enable verbose output to stdout.-ig, --ignore_gitignore: Ignore the.gitignorefile for exclusions.-g, --generate-pdf: Generate a PDF of the directory tree and file contents.-gm, --generate-md: Generate a Markdown file of the directory tree and file contents.--crawl: Crawls the sites specified in the config.
Example
ccontext -p /home/user/project -e ".git|build" -i "README.md|src"
Configuration
Configuration File Location
ccontext looks for configuration in the following order:
- Custom config file specified via
-cargument .ccontext-config.jsonin the current directory- If present, ccontext will automatically detect and use this local configuration file
- Create this file in the same directory where you run the ccontext command
~/.ccontext/config.json(default user configuration)
Configuration Options
{
"verbose": false, // Enable detailed output
"max_tokens": 115000, // Maximum tokens before chunking
"model_type": "gpt-4o", // LLM model type for tokenization
"buffer_size": 0.05, // Token buffer size (0-1)
// System prompt for LLM context
"context_prompt": "[[SYSTEM INSTRUCTIONS]] The following output represents...",
// Web crawler configuration
"urls_to_crawl": [
{
"url": "https://www.django-rest-framework.org/",
"match": ["https://www.django-rest-framework.org/**"],
"exclude": ["https://www.django-rest-framework.org/community/**"],
"selector": "",
"maxPagesToCrawl": 100,
"outputFileName": "django-rest-framework.org.json",
"maxTokens": 10000000
}
],
// Files/folders to explicitly include
"included_folders_files": [],
// Files/folders to exclude (supports glob patterns)
"excluded_folders_files": [
"**/.git",
"**/bin",
"**/build",
"**/node_modules/**",
"**/venv",
"**/__pycache__",
"**/package-lock.json",
"**/ccontext.egg-info",
"**/dist",
"**/__tests__",
"**/coverage",
"**/.next",
"**/pnpm-lock.yaml",
"**/poetry.lock",
"**/ccontext-output.pdf",
"**/ccontext-output.md",
"**/*.phpstorm.meta.php",
"**/*.min.js",
"**/composer.lock",
"**/*.lock",
"**/vendor",
"**/laravel_access.log",
"**/gpt-crawler",
"**/*.DS_Store",
"**/*.tox"
],
// File extensions that can be uploaded to LLMs
"uploadable_extensions": [
// Documents
".pdf",
".doc",
".docx",
".xls",
".xlsx",
".ppt",
".pptx",
// Images
".jpg",
".jpeg",
".png",
".gif",
".bmp",
".tiff",
".webp",
".heic",
// Audio
".mp3",
".wav",
".ogg",
".flac",
".aac",
".m4a",
// Video
".mp4",
".mkv",
".avi",
".mov",
".wmv",
".webm",
// Archives
".zip",
".rar",
".7z",
".tar",
".gz",
// Binary/System
".exe",
".dll",
".iso",
".dmg",
".bin",
".dat",
".apk",
".img",
".so",
".swf",
".psd"
]
}
Understanding Glob Patterns
ccontext uses the wcmatch library for glob pattern matching, which gives you powerful but easy-to-use file matching capabilities. Here's a simple guide to using glob patterns:
-
Important Wildcards Explained:
-
*(single star): Matches anything in the current folder only"*.txt" # Matches: a.txt, b.txt (in current folder) "*.txt" # Won't match: sub/a.txt, deep/sub/b.txt -
**(double star): Matches any number of folders"**/temp" # Matches: temp, sub/temp, deep/sub/temp "**/temp" # Won't match: temp/file.txt -
**/*(double star slash star): Matches everything in all folders"**/*.txt" # Matches: a.txt, sub/b.txt, very/deep/c.txt "**/*" # Matches everything, everywhere -
?matches any single character -
.txtmatches exact file extension
-
-
Simple Examples:
{ "excluded_folders_files": [ // Basic matching "temp.txt", // Matches exact file temp.txt "*.txt", // Matches all .txt files in root folder "**/*.txt", // Matches all .txt files in any folder // Folder matching "temp/*", // Matches everything in temp folder "**/temp", // Matches temp folder anywhere "**/temp/**", // Matches everything in any temp folder // Common use cases "**/node_modules", // Matches node_modules folders anywhere "**/__pycache__", // Matches Python cache folders "**/*.pyc", // Matches Python compiled files "build/*" // Matches everything in build folder ] }
-
Tips for Beginners:
- Start simple! Use
*.extfor file extensions - Use
**/when you want to match in any folder - Test your patterns with a small folder first
- When in doubt, be more specific
- Remember, patterns are case-sensitive
- Start simple! Use
The glob system is very forgiving - if you make a mistake, it usually just won't match anything rather than causing errors. Feel free to experiment!
Configuration Options Explained
| Option | Description | Default |
|---|---|---|
| verbose | Enable detailed output | false |
| max_tokens | Maximum tokens before chunking | 115000 |
| model_type | LLM model type for tokenization | "gpt-4o" |
| buffer_size | Token buffer size (0-1) | 0.05 |
| excluded_folders_files | Glob patterns for exclusion | [".git", ...] |
| included_folders_files | Glob patterns for inclusion | [] |
| uploadable_extensions | File extensions to upload | [".pdf", ...] |
Binary File Handling
ccontext supports handling binary files through the uploadable_extensions configuration.
Supported Binary Files
- Documents:
.pdf,.doc,.docx,.xls,.xlsx,.ppt,.pptx - Images:
.jpg,.jpeg,.png,.gif,.bmp,.tiff,.webp,.heic - Audio:
.mp3,.wav,.ogg,.flac,.aac,.m4a - Video:
.mp4,.mkv,.avi,.mov,.wmv,.webm - Archives:
.zip,.rar,.7z,.tar,.gz - Binary/System:
.exe,.dll,.iso,.dmg,.bin,.dat,.apk,.img,.so,.swf,.psd
Binary File Processing
- Binary files matching
uploadable_extensionsare prepared for upload to LLMs - File references are automatically copied to clipboard
- Most LLM providers limit maximum of X binary files per prompt
- Rate limits may apply based on your LLM provider
Example configuration for handling specific file types:
{
"uploadable_extensions": [".pdf", ".jpg", ".png", ".xlsx"]
}
Document Crawling
The crawling feature allows you to gather documentation from websites for context.
Crawler Configuration
{
"urls_to_crawl": [
{
"url": "https://docs.example.com",
"match": ["https://docs.example.com/**"],
"exclude": ["https://docs.example.com/internal/**"],
"selector": "",
"maxPagesToCrawl": 100,
"outputFileName": "docs.json",
"maxTokens": 2000000
}
]
}
Crawler Options
- url: Starting URL for crawling
- match: Glob patterns for URLs to include
- exclude: Glob patterns for URLs to exclude
- selector: CSS selector for content extraction
- maxPagesToCrawl: Limit on pages to crawl
- outputFileName: Name of output file
- maxTokens: Maximum tokens to collect
Best Practices
- Use specific
matchpatterns - Respect robots.txt and site policies
Use Cases and Examples
Common Usage Patterns
- Analyzing a Python Project
ccontext -p /path/to/project -e "venv|__pycache__|*.pyc"
- Processing Documentation
ccontext -p ./docs --crawl -gm
- Including Specific Files
ccontext -i "README.md|docs/*|*.py"
- Generating PDF and Markdown
ccontext -g -gm # Generates both PDF and Markdown
Integration Examples
- With GitHub Copilot
ccontext -p . -e "node_modules|dist" -i "src/**/*.ts"
- **With ChatGPT (webapp has max 32k) **
ccontext -p . --max_tokens 32000
Troubleshooting
Common Issues
-
Clipboard Issues in SSH
- Issue: Cannot copy to clipboard in SSH session
- Solution:
- Use SSH with X11 forwarding (
ssh -X user@host), test using xeyes - On Mac, install XQuartz (
brew install --cask xquartz)
- Use SSH with X11 forwarding (
-
Token Limit Exceeded
- Issue: Content too large for LLM
- Solution: Adjust
max_tokensor use chunking feature
-
Binary File Handling
- Issue: Binary files not being processed
- Solution: Check
uploadable_extensionsconfiguration
Platform-Specific Issues
Windows: Use WSL if possible!
Otherwise:
- Issue: Path separators in configuration
- Solution: Use forward slashes or escaped backslashes
Linux
- Issue: X11 clipboard support
- Solution: Install xclip or xsel
macOS
- Issue: Clipboard permissions
- Solution: Grant terminal app accessibility permissions
Development Guide
Project Structure
ccontext/
├── ccontext/ # Main package directory
│ ├── __init__.py
│ ├── main.py # Entry point
│ ├── file_tree.py # Tree operations
│ └── ...
├── tests/ # Test directory
├── docs/ # Documentation
└── examples/ # Example configurations
Development Setup
- Clone the repository
- Create a virtual environment
- Install development dependencies
- Run tests
git clone https://github.com/oxillix/ccontext.git
# or
git clone git@github.com:NicolasArnouts/ccontext.git
cd ccontext
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
pip3 install -e .
Contributing Guidelines
- Fork the repository
- Create a feature branch
- Write tests for new features
- Submit a pull request
Code Style
- Follow PEP 8 guidelines
- use isort and black
- Use type hints
- Keep functions focused and small
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Thanks to all contributors! 😊
- Inspired by the need for better context handling in AI interactions.
- Built with love and passion for the developer community! 💖
Feel free to raise issues or contribute to the project. We appreciate your support!
Happy coding adventures! 🚀 Nicolas Arnouts
Looking for a skilled freelancer? I’m available for hire! Let’s collaborate — reach out to me at: arnouts.software@gmail.com
Badges
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ccontext-0.3.7.tar.gz.
File metadata
- Download URL: ccontext-0.3.7.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a5bb07574439575b21e03c0c142a294584be7c89f04fc7d3c41033c48b38ace
|
|
| MD5 |
a3512093d26a0e56c7d0a2f6af45378b
|
|
| BLAKE2b-256 |
214ddff89fbb628e02733114f3e7591507a4e64c142c3114f7a41c82365ebdb3
|
Provenance
The following attestation bundles were made for ccontext-0.3.7.tar.gz:
Publisher:
publish-to-pypi.yml on NicolasArnouts/ccontext
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ccontext-0.3.7.tar.gz -
Subject digest:
5a5bb07574439575b21e03c0c142a294584be7c89f04fc7d3c41033c48b38ace - Sigstore transparency entry: 273405259
- Sigstore integration time:
-
Permalink:
NicolasArnouts/ccontext@51e34d572283ef59c48fd5dfd9e5ddf01e6a8b31 -
Branch / Tag:
refs/tags/v0.3.7 - Owner: https://github.com/NicolasArnouts
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@51e34d572283ef59c48fd5dfd9e5ddf01e6a8b31 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ccontext-0.3.7-py3-none-any.whl.
File metadata
- Download URL: ccontext-0.3.7-py3-none-any.whl
- Upload date:
- Size: 1.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
108169ba8c949c071fb220c3a4211cdfdb4b455f0025e26198595790cf50cbec
|
|
| MD5 |
7078712dc6ceae3ce5b9880db62e026b
|
|
| BLAKE2b-256 |
9c269fa734b3ead8a50cc1d65eae86bb56ad7062dd2deabe965260e1aad7e62e
|
Provenance
The following attestation bundles were made for ccontext-0.3.7-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on NicolasArnouts/ccontext
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ccontext-0.3.7-py3-none-any.whl -
Subject digest:
108169ba8c949c071fb220c3a4211cdfdb4b455f0025e26198595790cf50cbec - Sigstore transparency entry: 273405262
- Sigstore integration time:
-
Permalink:
NicolasArnouts/ccontext@51e34d572283ef59c48fd5dfd9e5ddf01e6a8b31 -
Branch / Tag:
refs/tags/v0.3.7 - Owner: https://github.com/NicolasArnouts
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@51e34d572283ef59c48fd5dfd9e5ddf01e6a8b31 -
Trigger Event:
push
-
Statement type: