Skip to main content

Convert your codebase into a single LLM prompt

Project description

CodeToPrompt

CI PyPI version PyPI Downloads PyPI - Python Version License: MIT

codetoprompt is a powerful command-line tool that transforms local codebases, GitHub repositories, web pages, and online documents into a single, context-rich prompt optimized for Large Language Models (LLMs).

It streamlines the process of providing comprehensive context to LLMs by intelligently selecting, compressing, and formatting project files and remote content.


โœจ Key Features

  • Universal Context Sources: Ingests code from local directories, GitHub repos, web pages, YouTube transcripts, ArXiv papers, and PDFs.
  • Intelligent Code Compression: Uses tree-sitter to parse code into a structural summary, drastically reducing token count while preserving the high-level architecture.
  • Interactive TUI Mode: Launch a fast, lazy-loaded terminal UI to visually select the exact files and directories you need.
  • Flexible Output Formats: Generate prompts in a simple default format, as a single Markdown file, or in Claude-friendly XML.
  • Automatic File Handling: Natively processes Jupyter Notebooks, samples large data files (like .csv or .json), and respects your .gitignore rules.
  • Powerful Filtering: Fine-tune your context with --include and --exclude glob patterns.
  • In-Depth Analysis: Run the analyse command to get a full breakdown of your project's languages, token counts, and file sizes before generating a prompt.
  • Snapshots and Diffs: Save a JSON snapshot of a project and generate a unified diff against it. Diff is copied to clipboard by default (summary only shown in terminal), or written to a file with --output.

๐Ÿ”ง Installation

Install from PyPI:

pip install codetoprompt

For clipboard functionality on Linux, you may need to install xclip or wl-clipboard:

# Debian/Ubuntu
sudo apt-get install xclip

# Arch Linux
sudo pacman -S xclip

๐Ÿš€ Quick Start

The two core commands are codetoprompt (or ctp) for generating prompts and analyse for inspecting your project.

1. Generate a Prompt from a Local Codebase

Scan your current project, respect .gitignore, and copy a context-rich prompt to your clipboard.

# Long version
codetoprompt .

# Short version
ctp .

2. Generate a Prompt from any URL

Pass a supported URL to fetch and process remote content automatically.

# From a GitHub Repository
ctp https://github.com/yash9439/codetoprompt

# From a documentation page
ctp https://python-poetry.org/docs/

# From a YouTube video transcript
ctp https://www.youtube.com/watch?v=cAkMcPfY_Ns

3. Create a Snapshot (local only)

# Save a JSON snapshot of the current project
codetoprompt snapshot . --output snap.json

4. Diff Against a Snapshot (local only)

# Copies the full diff to the clipboard; terminal shows only a summary
codetoprompt diff . --snapshot snap.json

# Save the full diff to a file instead of copying to clipboard
codetoprompt diff . --snapshot snap.json --output diff.txt

๐Ÿง  Features in Detail

1. Universal Context Gathering

codetoprompt can pull in context from almost anywhere.

Source Type Example Command Description
Local Directory ctp path/to/your/project Scans a local codebase, respecting .gitignore and applying filters.
GitHub Repo ctp https://github.com/user/repo Fetches all text-based files and builds a complete project prompt.
Web Page ctp https://en.wikipedia.org/wiki/API Strips boilerplate and extracts the core text content.
YouTube Video ctp <youtube_url> Automatically extracts the full video transcript.
ArXiv Paper ctp https://arxiv.org/abs/2203.02155 Downloads the full PDF from an abstract page and extracts its text.
PDF Document ctp <url_to_pdf> Directly downloads and extracts text from any public PDF link.
Jupyter Notebook (automatic) .ipynb files in a local project are automatically converted to Python code.

2. Interactive File Selection (--interactive or -i)

For ultimate control, the --interactive flag launches a Terminal User Interface (TUI) allowing you to manually select which files and directories to include. It's perfect for cherry-picking specific features or excluding noisy test files.

ctp . --interactive

Optimized for Large Projects: The interactive file tree uses lazy loading, meaning it only loads a directory's contents when you expand it. This keeps the interface fast and responsive, even in massive codebases.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                                                                     |
โ”‚                        FileSelectorApp                              |
โ”‚           Navigate: โ†‘/โ†“/w/s  | Expand/Collapse: โ†/a/d               |
|               Toggle Select: Space | Confirm: Enter                 |
โ”‚       โœ“ = All selected | - = Some selected | โ—ฆ = None selected      |
โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚
โ”‚                                                                     |
โ”‚ โ–ถ ๐Ÿ“ .github                                                        |
โ”‚ โ–ผ - ๐Ÿ“ codetoprompt                                                 |
โ”‚ >   โ–ผ โœ“ ๐Ÿ“ compressor                                               |
โ”‚         โœ“ ๐Ÿ“ analysers                                              |
โ”‚         โœ“ ๐Ÿ“ formatters                                             |
โ”‚         โœ“ ๐Ÿ“„ __init__.py                                            |
โ”‚         โœ“ ๐Ÿ“„ compressor.py                                          |
โ”‚     โ—ฆ ๐Ÿ“„ __init__.py                                                |
โ”‚     โ—ฆ ๐Ÿ“„ analysis.py                                                |
โ”‚     โ—ฆ ๐Ÿ“„ arg_parser.py                                              |
โ”‚     โ—ฆ ๐Ÿ“„ cli.py                                                     |
โ”‚     โ—ฆ ๐Ÿ“„ config.py                                                  |
โ”‚     โ—ฆ ๐Ÿ“„ core.py                                                    |
โ”‚     โ—ฆ ๐Ÿ“„ interactive.py                                             |
โ”‚     โ—ฆ ๐Ÿ“„ utils.py                                                   |
โ”‚     โ—ฆ ๐Ÿ“„ version.py                                                 |
โ”‚ โ–ถ ๐Ÿ“ codetoprompt.egg-info                                          |
โ”‚ โ–ถ ๐Ÿ“ tests                                                          |
โ”‚ โ—ฆ ๐Ÿ“„ .gitignore                                                     |
โ”‚ โ—ฆ ๐Ÿ“„ CHANGELOG.md                                                   |
โ”‚ โ—ฆ ๐Ÿ“„ CONTRIBUTING.md                                                |
โ”‚ โ—ฆ ๐Ÿ“„ LICENSE                                                        |
โ”‚ โ—ฆ ๐Ÿ“„ MANIFEST.in                                                    |
โ”‚ โ—ฆ ๐Ÿ“„ pyproject.toml                                                 |
โ”‚ โ—ฆ ๐Ÿ“„ pytest.ini                                                     |
โ”‚ โ—ฆ ๐Ÿ“„ README.md                                                      |
โ”‚                                                                     |
โ”‚ q Quit                                                              |
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

3. Smart Code Compression (--compress)

For large codebases, the --compress flag is essential. It analyzes supported code files and generates a high-level summary instead of including the full code, drastically reducing the final token count.

ctp . --compress

Supported Languages: Python, JavaScript, TypeScript, Java, C, C++, and Rust. Other files (like README.md) are included in full.

Example Compressed Output for a Python File:

# File: codetoprompt/core.py
# Language: python

## Imports:
- import platform
- from pathlib import Path
- ...

## Classes:
### class CodeToPrompt:
    """Convert code files to prompt format."""
    def __init__(self, root_dir, ...): ...
    def generate_prompt(self, progress): ...
    def analyse(self, progress, top_n): ...

Smart Data File Handling: codetoprompt treats .csv, .json, and .jsonl files specially to keep prompts useful without exploding token counts:

  • Small datasets pass through in full. A file is only truncated if it exceeds data_file_threshold_lines (default 50), so a 12-row config or sample dataset is included verbatim.
  • Large datasets are cut to the first 5 lines with a truncation note appended.
  • Repetitive per-record layouts are clustered. When a single directory contains more than data_file_max_per_dir files of the same data extension (default 5), only that many are kept and the rest are summarized in an Omitted Data File Groups section. This stops 50 near-identical JSON record files from drowning out the rest of your code.
  • Toggle off entirely with --no-truncate-data-files (or set truncate_data_files = false in config) to always include data files in full. Set --data-file-max-per-dir 0 to disable clustering.

4. Adaptable Output Formats

Tailor the output for different LLMs or use cases using format flags.

Format Flag Description
Default (none) Clean, human-readable format with file paths and fenced code blocks.
Markdown --markdown or -m A single Markdown document, great for viewing or sharing.
Claude XML --cxml or -c Wraps each file in <document> tags, a format Claude models handle exceptionally well.

Example CXML Output (-c):

<documents>
  <document index="1">
    <source>main.py</source>
    <document_content>
def main():
    print("Hello, World!")
    </document_content>
  </document>
</documents>

5. In-Depth Project Analysis (analyse)

Before generating a prompt, get a high-level overview of your local project's composition and token count. This helps you decide which filters or compression strategies to apply.

ctp analyse .

Example Analysis:

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Codebase Analysis โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Configuration for this run:                          โ”‚
โ”‚ Root Directory: .                                    โ”‚
โ”‚ Include Patterns: ['*']                              โ”‚
โ”‚ Exclude Patterns: []                                 โ”‚
โ”‚ Respect .gitignore: True                             โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
โ•ญโ”€ Overall Project Summary โ”€โ•ฎ
โ”‚ Total Files: 47           โ”‚
โ”‚ Total Lines: 6,033        โ”‚
โ”‚ Total Tokens: 49,834      โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
             Analysis by File Type (Top 10)             
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Extension โ”ƒ Files โ”ƒ Tokens โ”ƒ Lines โ”ƒ Avg Tokens/File โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ .py       โ”‚    32 โ”‚ 37,901 โ”‚ 4,650 โ”‚           1,184 โ”‚
โ”‚ .md       โ”‚     3 โ”‚  5,069 โ”‚   544 โ”‚           1,690 โ”‚
โ”‚ .<no_ext> โ”‚     3 โ”‚  4,807 โ”‚   559 โ”‚           1,602 โ”‚
โ”‚ .toml     โ”‚     1 โ”‚    827 โ”‚   117 โ”‚             827 โ”‚
โ”‚ .txt      โ”‚     4 โ”‚    582 โ”‚    68 โ”‚             146 โ”‚
โ”‚ .yml      โ”‚     1 โ”‚    361 โ”‚    56 โ”‚             361 โ”‚
โ”‚ .yaml     โ”‚     1 โ”‚    229 โ”‚    30 โ”‚             229 โ”‚
โ”‚ .ini      โ”‚     1 โ”‚     45 โ”‚     6 โ”‚              45 โ”‚
โ”‚ .in       โ”‚     1 โ”‚     13 โ”‚     3 โ”‚              13 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               Largest Files by Tokens (Top 10)               
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ File Path                                 โ”ƒ Tokens โ”ƒ Lines โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ codetoprompt/core.py                      โ”‚  5,037 โ”‚   538 โ”‚
โ”‚ codetoprompt.egg-info/PKG-INFO            โ”‚  4,453 โ”‚   493 โ”‚
โ”‚ README.md                                 โ”‚  2,910 โ”‚   285 โ”‚
โ”‚ codetoprompt/compressor/analysers/cpp.py  โ”‚  2,276 โ”‚   272 โ”‚
โ”‚ codetoprompt/compressor/analysers/rust.py โ”‚  2,214 โ”‚   271 โ”‚
โ”‚ codetoprompt/interactive.py               โ”‚  2,125 โ”‚   270 โ”‚
โ”‚ codetoprompt/compressor/analysers/java.py โ”‚  2,090 โ”‚   245 โ”‚
โ”‚ codetoprompt/cli.py                       โ”‚  1,877 โ”‚   207 โ”‚
โ”‚ tests/test_core.py                        โ”‚  1,743 โ”‚   179 โ”‚
โ”‚ CHANGELOG.md                              โ”‚  1,701 โ”‚   183 โ”‚

๐ŸŽ›๏ธ Command-Line Reference

Here is the full list of options for the main codetoprompt command.

Option Alias Description Scope
--output <file> Save the prompt to a file instead of the clipboard. All
--markdown -m Format output as a single Markdown document. All
--cxml -c Format output using Claude-friendly XML tags. All
--max-tokens <num> Warn if token count exceeds this limit. Does not truncate. All
--include <pats> Comma-separated glob patterns for files to include (e.g., ".py,.js"). Local
--exclude <pats> Comma-separated glob patterns for files to exclude (e.g., ".pyc,dist/"). Local
--interactive -i Launch an interactive TUI to select files. Local
--compress Use smart code compression to summarize files. Local
--show-line-numbers Prepend line numbers to code. Local
--file-max-lines <num> Truncate any file exceeding this many lines. Local
--file-max-bytes <num> Truncate any file exceeding this many bytes. Local
--no-truncate-data-files Always include data files (.csv/.json/.jsonl) in full. Local
--data-file-threshold-lines <num> Only truncate data files larger than this. Default: 50. Local
--data-file-max-per-dir <num> Cap on same-extension data files kept per directory; 0 disables. Default: 5. Local
--respect-gitignore Respect .gitignore rules (default). Use --no-respect-gitignore to disable. Local
--tree-depth <num> Set the maximum depth for the project structure tree. Local
--version -v Display the installed version number. N/A
--help -h Show the help message and exit. N/A

Subcommands

  • Analyse: codetoprompt analyse <PATH> [--include ...] [--exclude ...]
  • Snapshot: codetoprompt snapshot <PATH> --output <snapshot.json> [--include ...] [--exclude ...] [--respect-gitignore|--no-respect-gitignore]
  • Diff: codetoprompt diff <PATH> --snapshot <snapshot.json> [--use-snapshot-filters] [--include ...] [--exclude ...] [--output <file>]

โš™๏ธ Configuration

Set your preferred defaults once using the config command. Settings are saved in ~/.config/codetoprompt/config.toml.

  • Interactive Wizard: ctp config
  • Show Current Config: ctp config --show
  • Reset to Defaults: ctp config --reset

Additional data-file settings (configurable via the wizard or directly in config.toml):

  • Truncate Data Files: truncate_data_files (default: true). Master toggle.
  • Data File Threshold Lines: data_file_threshold_lines (default: 50). Files at or below this size are kept in full even when truncation is on.
  • Data File Max Per Dir: data_file_max_per_dir (default: 5). When a directory contains more same-extension data files than this, only the first N (sorted) are processed and the rest are summarized in the prompt. Set to 0 to disable.

Additional snapshot-related settings:

  • Snapshot Max Bytes: snapshot_max_bytes (default: 3 MB). If a text file exceeds this size, its content is not inlined into the snapshot.
  • Snapshot Max Lines: snapshot_max_lines (default: 20,000). If a text file exceeds this line count, its content is not inlined into the snapshot.

Snapshot always requires --output. Diff copies to clipboard by default (summary only printed to terminal). Provide --output <file> to write the diff to a file instead of copying.


๐Ÿ Python API

Use codetoprompt programmatically in your own Python scripts for maximum flexibility.

from codetoprompt import CodeToPrompt
# conceptual imports for a full use-case
# from some_llm_library import LlmClient 

# 1. Process a local directory with compression and XML format
ctp_local = CodeToPrompt(
    target="path/to/your/project",
    compress=True,
    output_format="cxml",
    exclude_patterns=["tests/*", "docs/*"]
)
prompt = ctp_local.generate_prompt()
analysis = ctp_local.analyse()

print(f"Project Analysis: {analysis['overall']}")
print(f"Generated a prompt with {ctp_local.get_token_count()} tokens.")

# 2. Conceptually, you'd then use this with an LLM client
# client = LlmClient(api_key="...")
# response = client.completions.create(
#     model="claude-3-opus-20240229", # CXML is great for Claude
#     messages=[
#         {"role": "user", "content": f"Here is a codebase:\n{prompt}\nPlease explain the main purpose of the `core.py` file."},
#     ]
# )
# print(response)

# 3. Process a remote URL
ctp_remote = CodeToPrompt(target="https://github.com/yash9439/codetoprompt")
remote_prompt = ctp_remote.generate_prompt()
print(f"Generated a prompt from GitHub with {ctp_remote.get_token_count()} tokens.")

๐Ÿค Contributing

We welcome contributions! Please see CONTRIBUTING.md for development setup and guidelines. Feel free to open a PR or issue to get started.

๐Ÿ“„ License

This project is licensed under the MIT License. See the LICENSE file for full details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codetoprompt-0.7.3.tar.gz (60.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codetoprompt-0.7.3-py3-none-any.whl (61.9 kB view details)

Uploaded Python 3

File details

Details for the file codetoprompt-0.7.3.tar.gz.

File metadata

  • Download URL: codetoprompt-0.7.3.tar.gz
  • Upload date:
  • Size: 60.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for codetoprompt-0.7.3.tar.gz
Algorithm Hash digest
SHA256 21b3a2435648e90d0f1c1dabe6b3ff80de5d7ba2b9d7b73a1f34d19986507891
MD5 2337efe2a97d329f11e0e05171cfa8bf
BLAKE2b-256 6f14da146d0c34544df6da2e5dacd3a09806220d3a92c77f700d51a6c432a015

See more details on using hashes here.

File details

Details for the file codetoprompt-0.7.3-py3-none-any.whl.

File metadata

  • Download URL: codetoprompt-0.7.3-py3-none-any.whl
  • Upload date:
  • Size: 61.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for codetoprompt-0.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 279ba8c113a80bef1cb7bc832ab877981e0b20da9f25727f39a16a39c9b30e27
MD5 703a0fb6f723af1bfa981fbbcf0d4b43
BLAKE2b-256 a5a567660f1aad9ab612f2d2f842132e150c85cc52d4351fa18042bf1e660eed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page