Skip to main content

A tool for processing code repositories into semantic chunks for analysis with LLMs, especiallyNotebookLM.

Project description

Pyragify

Pyragify turns a code repository into plain-text chunks that are easier to load into NotebookLM and other LLM tools. It extracts semantic units from source files, writes .txt output grouped by file type, and stores metadata for incremental re-runs.

What It Does

  • Chunks Python code into functions, classes, and comments
  • Splits Markdown files by header sections
  • Processes common repository files into LLM-friendly text output
  • Respects .gitignore and .dockerignore patterns
  • Tracks file hashes so unchanged files can be skipped on later runs

Supported Inputs

Pyragify has dedicated handling for:

  • Python: .py
  • Markdown: .md, .markdown
  • HTML: .html
  • CSS: .css
  • Other common repository files are included as plain text when they can be read as UTF-8

Installation

Install From PyPI

uv pip install pyragify

or

pip install pyragify

Install From Source

git clone https://github.com/ThomasBury/pyragify.git
cd pyragify
uv sync --group dev

Quick Start

Run With A Config File

The default entrypoint is pyragify.

uv run pyragify --config-file config.yaml

You can also run it as a module:

python -m pyragify --config-file config.yaml

Run Without A Config File

If you do not use config.yaml, pass every setting you want to rely on directly on the command line.

uv run pyragify \
  --repo-path /path/to/repository \
  --output-dir /path/to/output \
  --max-words 200000 \
  --max-file-size 10485760 \
  --skip-patterns "*.log" \
  --skip-patterns "*.tmp" \
  --skip-dirs "__pycache__" \
  --skip-dirs "node_modules" \
  --verbose

CLI Notes

  • Use pyragify --help for the full option list
  • Command-line options override values loaded from config.yaml
  • Repeat --skip-patterns once per pattern
  • Repeat --skip-dirs once per directory name

Configuration

Example config.yaml:

repo_path: /path/to/repository
output_dir: /path/to/output
max_words: 200000
max_file_size: 10485760  # 10 MB
skip_patterns:
  - "*.log"
  - "*.tmp"
skip_dirs:
  - "__pycache__"
  - "node_modules"
verbose: false

Example Workflow

  1. Point repo_path at the repository you want to process.
  2. Choose an output_dir where generated chunks and metadata should be written.
  3. Run uv run pyragify --config-file config.yaml or pass the same settings on the command line.
  4. Open the generated files in output/, especially output/remaining/chunk_0.txt, in NotebookLM or another LLM workflow.

Output Structure

The generated output is grouped by content type:

  • python/: Python functions, classes, and comment chunks
  • markdown/: Markdown sections split by headers
  • html/: HTML script and style chunks
  • css/: CSS rule chunks
  • other/: Readable files that do not have a dedicated parser
  • remaining/: Overflow chunks once grouped outputs reach the word limit
  • metadata.json: Summary of processed files
  • hashes.json: MD5 hashes used for incremental processing

NotebookLM Workflow

  1. Run Pyragify on the repository you care about.
  2. Upload one or more generated .txt chunks to a NotebookLM notebook.
  3. Ask questions about the codebase and use the generated citations to trace answers back to the source text.

code_chat

Development

Set up the local environment:

uv sync --group dev

Run the test suite:

uv run pytest

Run a focused test slice while iterating:

uv run pytest tests/test_processor.py -k markdown

Contributing

Contributions are welcome. Open an issue for bugs or feature requests, then send a pull request with focused changes and matching tests.

License

This project is licensed under the MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyragify-0.2.0.tar.gz (153.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyragify-0.2.0-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file pyragify-0.2.0.tar.gz.

File metadata

  • Download URL: pyragify-0.2.0.tar.gz
  • Upload date:
  • Size: 153.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for pyragify-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9f8fd6ad7ef0e8e515883e1b6faf33fbd1fb8b1a4e3d5a822b1d812ebf904a81
MD5 4133a8fc11971c4c0f496fb7cfd9a6c8
BLAKE2b-256 ebdf8cee923cc17f08196590155c5ceb5e7d6f71ec5483cad2ddf69ad8601e00

See more details on using hashes here.

File details

Details for the file pyragify-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pyragify-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for pyragify-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c40069eb43751d6d937c2e687747ec14c8cd25740c4889152a4f3e685b821e8
MD5 44f4bdc36db2fef223705d5125730bb1
BLAKE2b-256 2ac85220f96dd0f19bd3326ad733afefa7566d18f123388f9213215ac475c311

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page