Skip to main content

Convert code repositories into structured PDF collections for LLM collaboration.

Project description

pixcode

📉 SAVE UP TO 90% TOKENS

Turn Codebases into Visual Context for Multimodal LLMs

According to DeepSeek-OCR research and local benchmarking, visual encoding (PDF) outperforms plain-text ingestion for massive repositories.

PyPI version

License: MIT


📖 Introduction

pixcode is a developer tool designed to bridge the gap between large code repositories and Multimodal Large Language Models.

Instead of feeding raw text that consumes massive context windows, pixcode converts your repository into a structured, hierarchical set of PDFs. This allows you to:

  • Save 90% Tokens: Visual encoding is far more efficient than text tokenization.
  • Test for Free: Easily share your entire codebase with premium models (like Claude Opus 4.6) on platforms like arena.ai without hitting text limits.

🚀 Why Visual Code? (The 90% Claim)

Traditional RAG (Retrieval-Augmented Generation) relies on raw text. However, recent research (including the DeepSeek-OCR paper) indicates that visual encoders can represent dense information more efficiently than textual tokenizers.

  • Text Tokenization: 1 page of dense code ≈ 500-800 text tokens.
  • Visual Tokenization: 1 page of code (PDF image) ≈ Fixed patch count (e.g., 85-256 tokens depending on the model).

pixcode creates a layered PDF structure:

  1. Macro View (00_INDEX.pdf): A visual map of the directory tree and project statistics.
  2. Micro View (File PDFs): Syntax-highlighted, line-numbered renderings of individual code files.

This approach enables an Agentic workflow: Read the Index -> Identify relevant files -> Ingest only specific PDFs.

✨ Features

  • 📉 High Efficiency: Drastically reduces context window usage for large repos.
  • ⚡ Faster Scanning: Single-pass file loading (binary check + line count + optional content decode) to reduce I/O overhead.
  • 🎨 Syntax Highlighting: Supports 50+ languages (Python, JS, Rust, Go, C++, etc.) with a "One Dark" inspired theme.
  • 🧠 Semantic Minimap: Auto-generates per-file micro UML / call graph summaries to expose structure at a glance.
  • 🔥 Linter Heatmap: Integrates ruff / eslint findings and marks risky lines with red/yellow visual overlays.
  • 🗂️ Hierarchical Output: Generates a clean 00_INDEX.pdf summary and separate files for granular access.
  • 🌏 CJK Support: Built-in font fallback for Chinese/Japanese/Korean characters (Auto-detects OS fonts).
  • 🛡️ Smart Filtering: Respects .gitignore patterns and supports custom ignore rules.
  • 📊 Insightful Stats: Calculates line counts and language distribution automatically.
  • 🧾 Scan Diagnostics: Prints scan summary (seen/loaded/ignored/binary/errors) for faster troubleshooting.

📦 Installation

pip install pixcode

🛠️ Usage

Quick Start

Convert the current directory to PDFs in the default output folder (./pixcode_output/<repo_name>):

pixcode .

Common Commands

Generate PDFs for a specific repo:

pixcode generate /path/to/my-project -o ./my-project-pdfs

Pack core code into a single minimized PDF (all-in-one):

pixcode onepdf /path/to/my-project -o ./ONEPDF_CORE.pdf

Notes:

  • Defaults to git ls-files (tracked files) when available.
  • Defaults to "core-only" filtering (skips docs/tests); use --no-core-only to include them.

Preview structure and stats (without generating PDFs):

pixcode list /path/to/my-project

list mode now uses lightweight scanning (no file content decode), so large repos respond significantly faster.

Show only top 5 languages in the summary:

pixcode list . --top-languages 5

CLI Reference

Argument Description Default
repo Path to the code repository. . (Current Dir)
-o, --output Directory to save the generated PDFs. ./pixcode_output/<repo>
--max-size Max file size to process (in KB). Files larger than this are skipped. 512 KB
--ignore Additional glob patterns to ignore (e.g., *.json test/*). []
--index-only Generate only the 00_INDEX.pdf (Directory tree & stats). False
--disable-semantic-minimap Turn off per-file semantic UML/callgraph panel. False
--disable-lint-heatmap Turn off linter-based line heatmap background. False
--linter-timeout Timeout seconds for each linter command. 20
--list-only Print the directory tree and stats to console, then exit. False
-V, --version Show version information. -

⚙️ Performance Notes

pixcode now applies two execution paths:

  1. Light scan path (pixcode list, pixcode generate --index-only, --list-only): only metadata and line counts are collected; file content is not loaded.
  2. Full scan path (regular pixcode generate): file content is decoded only when needed for PDF rendering.

This reduces memory pressure and disk I/O for repository exploration workflows.

📂 Output Structure

After running pixcode ., you will get a folder structure optimized for LLM upload:

pixcode_output/pixcode/
├── 00_INDEX.pdf             # <--- Upload this first! Contains tree & stats
├── 001_LICENSE.pdf
├── 002_README.md.pdf
├── 003_pixcode___init__.py.pdf
├── 005_pixcode_cli.py.pdf
└── ...

🧩 Supported Languages

Pixcode automatically detects and highlights syntax for:

  • Core: Python, C, C++, Java, Rust, Go
  • Web: HTML, CSS, JavaScript, TypeScript, Vue, Svelte
  • Config: JSON, YAML, TOML, XML, Dockerfile, Ini
  • Scripting: Bash, Lua, Perl, Ruby, PHP
  • And more: Swift, Kotlin, Scala, Haskell, OCaml, etc.

🤝 Contributing

We welcome contributions! Please feel free to submit a Pull Request.

  1. Fork the repository.
  2. Create your feature branch (git checkout -b feature/AmazingFeature).
  3. Commit your changes (git commit -m 'Add some AmazingFeature').
  4. Push to the branch (git push origin feature/AmazingFeature).
  5. Open a Pull Request.

📄 License

Distributed under the MIT License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pixcode-0.1.7.tar.gz (34.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pixcode-0.1.7-py3-none-any.whl (34.8 kB view details)

Uploaded Python 3

File details

Details for the file pixcode-0.1.7.tar.gz.

File metadata

  • Download URL: pixcode-0.1.7.tar.gz
  • Upload date:
  • Size: 34.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pixcode-0.1.7.tar.gz
Algorithm Hash digest
SHA256 cb08baadb462e745a0117a5324824887c7f67e5c759d4530430d1e8915da2a4f
MD5 7887863ab58e4706804188a1449ab246
BLAKE2b-256 4b4964f51ed271e42cd324641d39f8965433df0fe9b1001ae4b3409053f06bd9

See more details on using hashes here.

Provenance

The following attestation bundles were made for pixcode-0.1.7.tar.gz:

Publisher: publish.yml on TingjiaInFuture/pixcode

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pixcode-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: pixcode-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 34.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pixcode-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 2b1d00bb7d6408085e614962161a3fef80443050f6f34dacb457b565396aabd3
MD5 60880551f89f76e08420cff9967d5e74
BLAKE2b-256 47faf5750e2cabc8c903ef791cd39082eabbaa4b11ec590ada52ffa3354fcc69

See more details on using hashes here.

Provenance

The following attestation bundles were made for pixcode-0.1.7-py3-none-any.whl:

Publisher: publish.yml on TingjiaInFuture/pixcode

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page