Convert code repositories into structured PDF collections for LLM collaboration.
Project description
pixcode
📉 SAVE UP TO 90% TOKENS
Turn Codebases into Visual Context for Multimodal LLMs
According to DeepSeek-OCR research and local benchmarking, visual encoding (PDF) outperforms plain-text ingestion for massive repositories.
📖 Introduction
pixcode is a developer tool designed to bridge the gap between large code repositories and Multimodal Large Language Models.
Instead of feeding raw text that consumes massive context windows, pixcode converts your repository into a structured, hierarchical set of PDFs. This allows you to:
- Save 90% Tokens: Visual encoding is far more efficient than text tokenization.
- Test for Free: Easily share your entire codebase with premium models (like Claude Opus 4.6) on platforms like arena.ai without hitting text limits.
🚀 Why Visual Code? (The 90% Claim)
Traditional RAG (Retrieval-Augmented Generation) relies on raw text. However, recent research (including the DeepSeek-OCR paper) indicates that visual encoders can represent dense information more efficiently than textual tokenizers.
- Text Tokenization: 1 page of dense code ≈ 500-800 text tokens.
- Visual Tokenization: 1 page of code (PDF image) ≈ Fixed patch count (e.g., 85-256 tokens depending on the model).
pixcode creates a layered PDF structure:
- Macro View (
00_INDEX.pdf): A visual map of the directory tree and project statistics. - Micro View (File PDFs): Syntax-highlighted, line-numbered renderings of individual code files.
This approach enables an Agentic workflow: Read the Index -> Identify relevant files -> Ingest only specific PDFs.
✨ Features
- 📉 High Efficiency: Drastically reduces context window usage for large repos.
- 🎨 Syntax Highlighting: Supports 50+ languages (Python, JS, Rust, Go, C++, etc.) with a "One Dark" inspired theme.
- 🗂️ Hierarchical Output: Generates a clean
00_INDEX.pdfsummary and separate files for granular access. - 🌏 CJK Support: Built-in font fallback for Chinese/Japanese/Korean characters (Auto-detects OS fonts).
- 🛡️ Smart Filtering: Respects
.gitignorepatterns and supports custom ignore rules. - 📊 Insightful Stats: Calculates line counts and language distribution automatically.
📦 Installation
pip install pixcode
🛠️ Usage
Quick Start
Convert the current directory to PDFs in the default output folder (./pixcode_output/<repo_name>):
pixcode .
Common Commands
Generate PDFs for a specific repo:
pixcode generate /path/to/my-project -o ./my-project-pdfs
Preview structure and stats (without generating PDFs):
pixcode list /path/to/my-project
Show only top 5 languages in the summary:
pixcode list . --top-languages 5
CLI Reference
| Argument | Description | Default |
|---|---|---|
repo |
Path to the code repository. | . (Current Dir) |
-o, --output |
Directory to save the generated PDFs. | ./pixcode_output/<repo> |
--max-size |
Max file size to process (in KB). Files larger than this are skipped. | 512 KB |
--ignore |
Additional glob patterns to ignore (e.g., *.json test/*). |
[] |
--index-only |
Generate only the 00_INDEX.pdf (Directory tree & stats). |
False |
--list-only |
Print the directory tree and stats to console, then exit. | False |
-V, --version |
Show version information. | - |
📂 Output Structure
After running pixcode ., you will get a folder structure optimized for LLM upload:
pixcode_output/pixcode/
├── 00_INDEX.pdf # <--- Upload this first! Contains tree & stats
├── 001_LICENSE.pdf
├── 002_README.md.pdf
├── 003_pixcode___init__.py.pdf
├── 005_pixcode_cli.py.pdf
└── ...
🧩 Supported Languages
Pixcode automatically detects and highlights syntax for:
- Core: Python, C, C++, Java, Rust, Go
- Web: HTML, CSS, JavaScript, TypeScript, Vue, Svelte
- Config: JSON, YAML, TOML, XML, Dockerfile, Ini
- Scripting: Bash, Lua, Perl, Ruby, PHP
- And more: Swift, Kotlin, Scala, Haskell, OCaml, etc.
🤝 Contributing
We welcome contributions! Please feel free to submit a Pull Request.
- Fork the repository.
- Create your feature branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
📄 License
Distributed under the MIT License. See LICENSE for more information.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pixcode-0.1.5.tar.gz.
File metadata
- Download URL: pixcode-0.1.5.tar.gz
- Upload date:
- Size: 19.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
697b08a77685d15ae662a7e328a58fd0c65856dbbbc0e5bb83f7e5c1cc8cf198
|
|
| MD5 |
431ff528c4b9caf4f9dd19082d65d14c
|
|
| BLAKE2b-256 |
abbb0ff370031df1d9d1ad2ee03e6b8370cd4ec31d871af8099c1f07d42a77ff
|
Provenance
The following attestation bundles were made for pixcode-0.1.5.tar.gz:
Publisher:
publish.yml on TingjiaInFuture/pixcode
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pixcode-0.1.5.tar.gz -
Subject digest:
697b08a77685d15ae662a7e328a58fd0c65856dbbbc0e5bb83f7e5c1cc8cf198 - Sigstore transparency entry: 942870664
- Sigstore integration time:
-
Permalink:
TingjiaInFuture/pixcode@9be1a46706e67a68a50dd32f0b5f6f1f906e6357 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/TingjiaInFuture
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9be1a46706e67a68a50dd32f0b5f6f1f906e6357 -
Trigger Event:
release
-
Statement type:
File details
Details for the file pixcode-0.1.5-py3-none-any.whl.
File metadata
- Download URL: pixcode-0.1.5-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a50d18473865c6ec0767f2f31cf972f919d8c8877959c4f6fa03b262656a819f
|
|
| MD5 |
a94aca5578425d2bd7058af8b7978746
|
|
| BLAKE2b-256 |
be03cf9079c36ee7b04b76a4d74feffd0ad76a7bfc0b5c2dc1e90cec1560e93f
|
Provenance
The following attestation bundles were made for pixcode-0.1.5-py3-none-any.whl:
Publisher:
publish.yml on TingjiaInFuture/pixcode
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pixcode-0.1.5-py3-none-any.whl -
Subject digest:
a50d18473865c6ec0767f2f31cf972f919d8c8877959c4f6fa03b262656a819f - Sigstore transparency entry: 942870666
- Sigstore integration time:
-
Permalink:
TingjiaInFuture/pixcode@9be1a46706e67a68a50dd32f0b5f6f1f906e6357 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/TingjiaInFuture
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9be1a46706e67a68a50dd32f0b5f6f1f906e6357 -
Trigger Event:
release
-
Statement type: