Convert your codebase into a single LLM prompt
Project description
CodeToPrompt
codetoprompt is a powerful command-line tool that transforms local codebases, GitHub repositories, web pages, and online documents into a single, context-rich prompt optimized for Large Language Models (LLMs).
It streamlines the process of providing comprehensive context to LLMs by intelligently selecting, compressing, and formatting project files and remote content.
โจ Key Features
- Universal Context Sources: Ingests code from local directories, GitHub repos, web pages, YouTube transcripts, ArXiv papers, and PDFs.
- Intelligent Code Compression: Uses
tree-sitterto parse code into a structural summary, drastically reducing token count while preserving the high-level architecture. - Interactive TUI Mode: Launch a fast, lazy-loaded terminal UI to visually select the exact files and directories you need.
- Flexible Output Formats: Generate prompts in a simple default format, as a single Markdown file, or in Claude-friendly XML.
- Automatic File Handling: Natively processes Jupyter Notebooks, samples large data files (like
.csvor.json), and respects your.gitignorerules. - Powerful Filtering: Fine-tune your context with
--includeand--excludeglob patterns. - In-Depth Analysis: Run the
analysecommand to get a full breakdown of your project's languages, token counts, and file sizes before generating a prompt. - Snapshots and Diffs: Save a JSON snapshot of a project and generate a unified diff against it. Diff is copied to clipboard by default (summary only shown in terminal), or written to a file with
--output.
๐ง Installation
Install from PyPI:
pip install codetoprompt
For clipboard functionality on Linux, you may need to install xclip or wl-clipboard:
# Debian/Ubuntu
sudo apt-get install xclip
# Arch Linux
sudo pacman -S xclip
๐ Quick Start
The two core commands are codetoprompt (or ctp) for generating prompts and analyse for inspecting your project.
1. Generate a Prompt from a Local Codebase
Scan your current project, respect .gitignore, and copy a context-rich prompt to your clipboard.
# Long version
codetoprompt .
# Short version
ctp .
2. Generate a Prompt from any URL
Pass a supported URL to fetch and process remote content automatically.
# From a GitHub Repository
ctp https://github.com/yash9439/codetoprompt
# From a documentation page
ctp https://python-poetry.org/docs/
# From a YouTube video transcript
ctp https://www.youtube.com/watch?v=cAkMcPfY_Ns
3. Create a Snapshot (local only)
# Save a JSON snapshot of the current project
codetoprompt snapshot . --output snap.json
4. Diff Against a Snapshot (local only)
# Copies the full diff to the clipboard; terminal shows only a summary
codetoprompt diff . --snapshot snap.json
# Save the full diff to a file instead of copying to clipboard
codetoprompt diff . --snapshot snap.json --output diff.txt
๐ง Features in Detail
1. Universal Context Gathering
codetoprompt can pull in context from almost anywhere.
| Source Type | Example Command | Description |
|---|---|---|
| Local Directory | ctp path/to/your/project |
Scans a local codebase, respecting .gitignore and applying filters. |
| GitHub Repo | ctp https://github.com/user/repo |
Fetches all text-based files and builds a complete project prompt. |
| Web Page | ctp https://en.wikipedia.org/wiki/API |
Strips boilerplate and extracts the core text content. |
| YouTube Video | ctp <youtube_url> |
Automatically extracts the full video transcript. |
| ArXiv Paper | ctp https://arxiv.org/abs/2203.02155 |
Downloads the full PDF from an abstract page and extracts its text. |
| PDF Document | ctp <url_to_pdf> |
Directly downloads and extracts text from any public PDF link. |
| Jupyter Notebook | (automatic) |
.ipynb files in a local project are automatically converted to Python code. |
2. Interactive File Selection (--interactive or -i)
For ultimate control, the --interactive flag launches a Terminal User Interface (TUI) allowing you to manually select which files and directories to include. It's perfect for cherry-picking specific features or excluding noisy test files.
ctp . --interactive
Optimized for Large Projects: The interactive file tree uses lazy loading, meaning it only loads a directory's contents when you expand it. This keeps the interface fast and responsive, even in massive codebases.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ |
โ FileSelectorApp |
โ Navigate: โ/โ/w/s | Expand/Collapse: โ/a/d |
| Toggle Select: Space | Confirm: Enter |
โ โ = All selected | - = Some selected | โฆ = None selected |
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ |
โ โถ ๐ .github |
โ โผ - ๐ codetoprompt |
โ > โผ โ ๐ compressor |
โ โ ๐ analysers |
โ โ ๐ formatters |
โ โ ๐ __init__.py |
โ โ ๐ compressor.py |
โ โฆ ๐ __init__.py |
โ โฆ ๐ analysis.py |
โ โฆ ๐ arg_parser.py |
โ โฆ ๐ cli.py |
โ โฆ ๐ config.py |
โ โฆ ๐ core.py |
โ โฆ ๐ interactive.py |
โ โฆ ๐ utils.py |
โ โฆ ๐ version.py |
โ โถ ๐ codetoprompt.egg-info |
โ โถ ๐ tests |
โ โฆ ๐ .gitignore |
โ โฆ ๐ CHANGELOG.md |
โ โฆ ๐ CONTRIBUTING.md |
โ โฆ ๐ LICENSE |
โ โฆ ๐ MANIFEST.in |
โ โฆ ๐ pyproject.toml |
โ โฆ ๐ pytest.ini |
โ โฆ ๐ README.md |
โ |
โ q Quit |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
3. Smart Code Compression (--compress)
For large codebases, the --compress flag is essential. It analyzes supported code files and generates a high-level summary instead of including the full code, drastically reducing the final token count.
ctp . --compress
Supported Languages: Python, JavaScript, TypeScript, Java, C, C++, and Rust. Other files (like README.md) are included in full.
Example Compressed Output for a Python File:
# File: codetoprompt/core.py
# Language: python
## Imports:
- import platform
- from pathlib import Path
- ...
## Classes:
### class CodeToPrompt:
"""Convert code files to prompt format."""
def __init__(self, root_dir, ...): ...
def generate_prompt(self, progress): ...
def analyse(self, progress, top_n): ...
Smart Data File Handling:
codetoprompttreats.csv,.json, and.jsonlfiles specially to keep prompts useful without exploding token counts:
- Small datasets pass through in full. A file is only truncated if it exceeds
data_file_threshold_lines(default 50), so a 12-row config or sample dataset is included verbatim.- Large datasets are cut to the first 5 lines with a truncation note appended.
- Repetitive per-record layouts are clustered. When a single directory contains more than
data_file_max_per_dirfiles of the same data extension (default 5), only that many are kept and the rest are summarized in anOmitted Data File Groupssection. This stops 50 near-identical JSON record files from drowning out the rest of your code.- Toggle off entirely with
--no-truncate-data-files(or settruncate_data_files = falsein config) to always include data files in full. Set--data-file-max-per-dir 0to disable clustering.
4. Adaptable Output Formats
Tailor the output for different LLMs or use cases using format flags.
| Format | Flag | Description |
|---|---|---|
| Default | (none) | Clean, human-readable format with file paths and fenced code blocks. |
| Markdown | --markdown or -m |
A single Markdown document, great for viewing or sharing. |
| Claude XML | --cxml or -c |
Wraps each file in <document> tags, a format Claude models handle exceptionally well. |
Example CXML Output (-c):
<documents>
<document index="1">
<source>main.py</source>
<document_content>
def main():
print("Hello, World!")
</document_content>
</document>
</documents>
5. In-Depth Project Analysis (analyse)
Before generating a prompt, get a high-level overview of your local project's composition and token count. This helps you decide which filters or compression strategies to apply.
ctp analyse .
Example Analysis:
โญโโโโโโโโโโโโโโโโโโ Codebase Analysis โโโโโโโโโโโโโโโโโโฎ
โ Configuration for this run: โ
โ Root Directory: . โ
โ Include Patterns: ['*'] โ
โ Exclude Patterns: [] โ
โ Respect .gitignore: True โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Overall Project Summary โโฎ
โ Total Files: 47 โ
โ Total Lines: 6,033 โ
โ Total Tokens: 49,834 โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Analysis by File Type (Top 10)
โโโโโโโโโโโโโณโโโโโโโโณโโโโโโโโโณโโโโโโโโณโโโโโโโโโโโโโโโโโโ
โ Extension โ Files โ Tokens โ Lines โ Avg Tokens/File โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ .py โ 32 โ 37,901 โ 4,650 โ 1,184 โ
โ .md โ 3 โ 5,069 โ 544 โ 1,690 โ
โ .<no_ext> โ 3 โ 4,807 โ 559 โ 1,602 โ
โ .toml โ 1 โ 827 โ 117 โ 827 โ
โ .txt โ 4 โ 582 โ 68 โ 146 โ
โ .yml โ 1 โ 361 โ 56 โ 361 โ
โ .yaml โ 1 โ 229 โ 30 โ 229 โ
โ .ini โ 1 โ 45 โ 6 โ 45 โ
โ .in โ 1 โ 13 โ 3 โ 13 โ
โโโโโโโโโโโโโดโโโโโโโโดโโโโโโโโโดโโโโโโโโดโโโโโโโโโโโโโโโโโโ
Largest Files by Tokens (Top 10)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโณโโโโโโโโ
โ File Path โ Tokens โ Lines โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ codetoprompt/core.py โ 5,037 โ 538 โ
โ codetoprompt.egg-info/PKG-INFO โ 4,453 โ 493 โ
โ README.md โ 2,910 โ 285 โ
โ codetoprompt/compressor/analysers/cpp.py โ 2,276 โ 272 โ
โ codetoprompt/compressor/analysers/rust.py โ 2,214 โ 271 โ
โ codetoprompt/interactive.py โ 2,125 โ 270 โ
โ codetoprompt/compressor/analysers/java.py โ 2,090 โ 245 โ
โ codetoprompt/cli.py โ 1,877 โ 207 โ
โ tests/test_core.py โ 1,743 โ 179 โ
โ CHANGELOG.md โ 1,701 โ 183 โ
๐๏ธ Command-Line Reference
Here is the full list of options for the main codetoprompt command.
| Option | Alias | Description | Scope |
|---|---|---|---|
--output <file> |
Save the prompt to a file instead of the clipboard. | All | |
--markdown |
-m |
Format output as a single Markdown document. | All |
--cxml |
-c |
Format output using Claude-friendly XML tags. | All |
--max-tokens <num> |
Warn if token count exceeds this limit. Does not truncate. | All | |
--include <pats> |
Comma-separated glob patterns for files to include (e.g., ".py,.js"). | Local | |
--exclude <pats> |
Comma-separated glob patterns for files to exclude (e.g., ".pyc,dist/"). | Local | |
--interactive |
-i |
Launch an interactive TUI to select files. | Local |
--compress |
Use smart code compression to summarize files. | Local | |
--show-line-numbers |
Prepend line numbers to code. | Local | |
--file-max-lines <num> |
Truncate any file exceeding this many lines. | Local | |
--file-max-bytes <num> |
Truncate any file exceeding this many bytes. | Local | |
--no-truncate-data-files |
Always include data files (.csv/.json/.jsonl) in full. |
Local | |
--data-file-threshold-lines <num> |
Only truncate data files larger than this. Default: 50. | Local | |
--data-file-max-per-dir <num> |
Cap on same-extension data files kept per directory; 0 disables. Default: 5. | Local | |
--respect-gitignore |
Respect .gitignore rules (default). Use --no-respect-gitignore to disable. |
Local | |
--tree-depth <num> |
Set the maximum depth for the project structure tree. | Local | |
--version |
-v |
Display the installed version number. | N/A |
--help |
-h |
Show the help message and exit. | N/A |
Subcommands
- Analyse:
codetoprompt analyse <PATH> [--include ...] [--exclude ...] - Snapshot:
codetoprompt snapshot <PATH> --output <snapshot.json> [--include ...] [--exclude ...] [--respect-gitignore|--no-respect-gitignore] - Diff:
codetoprompt diff <PATH> --snapshot <snapshot.json> [--use-snapshot-filters] [--include ...] [--exclude ...] [--output <file>]
โ๏ธ Configuration
Set your preferred defaults once using the config command. Settings are saved in ~/.config/codetoprompt/config.toml.
- Interactive Wizard:
ctp config - Show Current Config:
ctp config --show - Reset to Defaults:
ctp config --reset
Additional data-file settings (configurable via the wizard or directly in config.toml):
- Truncate Data Files:
truncate_data_files(default:true). Master toggle. - Data File Threshold Lines:
data_file_threshold_lines(default:50). Files at or below this size are kept in full even when truncation is on. - Data File Max Per Dir:
data_file_max_per_dir(default:5). When a directory contains more same-extension data files than this, only the first N (sorted) are processed and the rest are summarized in the prompt. Set to0to disable.
Additional snapshot-related settings:
- Snapshot Max Bytes:
snapshot_max_bytes(default: 3 MB). If a text file exceeds this size, its content is not inlined into the snapshot. - Snapshot Max Lines:
snapshot_max_lines(default: 20,000). If a text file exceeds this line count, its content is not inlined into the snapshot.
Snapshot always requires
--output. Diff copies to clipboard by default (summary only printed to terminal). Provide--output <file>to write the diff to a file instead of copying.
๐ Python API
Use codetoprompt programmatically in your own Python scripts for maximum flexibility.
from codetoprompt import CodeToPrompt
# conceptual imports for a full use-case
# from some_llm_library import LlmClient
# 1. Process a local directory with compression and XML format
ctp_local = CodeToPrompt(
target="path/to/your/project",
compress=True,
output_format="cxml",
exclude_patterns=["tests/*", "docs/*"]
)
prompt = ctp_local.generate_prompt()
analysis = ctp_local.analyse()
print(f"Project Analysis: {analysis['overall']}")
print(f"Generated a prompt with {ctp_local.get_token_count()} tokens.")
# 2. Conceptually, you'd then use this with an LLM client
# client = LlmClient(api_key="...")
# response = client.completions.create(
# model="claude-3-opus-20240229", # CXML is great for Claude
# messages=[
# {"role": "user", "content": f"Here is a codebase:\n{prompt}\nPlease explain the main purpose of the `core.py` file."},
# ]
# )
# print(response)
# 3. Process a remote URL
ctp_remote = CodeToPrompt(target="https://github.com/yash9439/codetoprompt")
remote_prompt = ctp_remote.generate_prompt()
print(f"Generated a prompt from GitHub with {ctp_remote.get_token_count()} tokens.")
๐ค Contributing
We welcome contributions! Please see CONTRIBUTING.md for development setup and guidelines. Feel free to open a PR or issue to get started.
๐ License
This project is licensed under the MIT License. See the LICENSE file for full details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codetoprompt-0.7.3.tar.gz.
File metadata
- Download URL: codetoprompt-0.7.3.tar.gz
- Upload date:
- Size: 60.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21b3a2435648e90d0f1c1dabe6b3ff80de5d7ba2b9d7b73a1f34d19986507891
|
|
| MD5 |
2337efe2a97d329f11e0e05171cfa8bf
|
|
| BLAKE2b-256 |
6f14da146d0c34544df6da2e5dacd3a09806220d3a92c77f700d51a6c432a015
|
File details
Details for the file codetoprompt-0.7.3-py3-none-any.whl.
File metadata
- Download URL: codetoprompt-0.7.3-py3-none-any.whl
- Upload date:
- Size: 61.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
279ba8c113a80bef1cb7bc832ab877981e0b20da9f25727f39a16a39c9b30e27
|
|
| MD5 |
703a0fb6f723af1bfa981fbbcf0d4b43
|
|
| BLAKE2b-256 |
a5a567660f1aad9ab612f2d2f842132e150c85cc52d4351fa18042bf1e660eed
|