A tool for processing code repositories into semantic chunks for analysis with LLMs, especiallyNotebookLM.
Project description
Pyragify
Pyragify turns a code repository into plain-text chunks that are easier to load into NotebookLM and other LLM tools. It extracts semantic units from source files, writes .txt output grouped by file type, and stores metadata for incremental re-runs.
What It Does
- Chunks Python code into functions, classes, and comments
- Splits Markdown files by header sections
- Processes common repository files into LLM-friendly text output
- Respects
.gitignoreand.dockerignorepatterns - Tracks file hashes so unchanged files can be skipped on later runs
Supported Inputs
Pyragify has dedicated handling for:
- Python:
.py - Markdown:
.md,.markdown - HTML:
.html - CSS:
.css - Other common repository files are included as plain text when they can be read as UTF-8
Installation
Install From PyPI
uv pip install pyragify
or
pip install pyragify
Install From Source
git clone https://github.com/ThomasBury/pyragify.git
cd pyragify
uv sync --group dev
Quick Start
Run With A Config File
The default entrypoint is pyragify.
uv run pyragify --config-file config.yaml
You can also run it as a module:
python -m pyragify --config-file config.yaml
Run Without A Config File
If you do not use config.yaml, pass every setting you want to rely on directly on the command line.
uv run pyragify \
--repo-path /path/to/repository \
--output-dir /path/to/output \
--max-words 200000 \
--max-file-size 10485760 \
--skip-patterns "*.log" \
--skip-patterns "*.tmp" \
--skip-dirs "__pycache__" \
--skip-dirs "node_modules" \
--verbose
CLI Notes
- Use
pyragify --helpfor the full option list - Command-line options override values loaded from
config.yaml - Repeat
--skip-patternsonce per pattern - Repeat
--skip-dirsonce per directory name
Configuration
Example config.yaml:
repo_path: /path/to/repository
output_dir: /path/to/output
max_words: 200000
max_file_size: 10485760 # 10 MB
skip_patterns:
- "*.log"
- "*.tmp"
skip_dirs:
- "__pycache__"
- "node_modules"
verbose: false
Example Workflow
- Point
repo_pathat the repository you want to process. - Choose an
output_dirwhere generated chunks and metadata should be written. - Run
uv run pyragify --config-file config.yamlor pass the same settings on the command line. - Open the generated files in
output/, especiallyoutput/remaining/chunk_0.txt, in NotebookLM or another LLM workflow.
Output Structure
The generated output is grouped by content type:
python/: Python functions, classes, and comment chunksmarkdown/: Markdown sections split by headershtml/: HTML script and style chunkscss/: CSS rule chunksother/: Readable files that do not have a dedicated parserremaining/: Overflow chunks once grouped outputs reach the word limitmetadata.json: Summary of processed fileshashes.json: MD5 hashes used for incremental processing
NotebookLM Workflow
- Run Pyragify on the repository you care about.
- Upload one or more generated
.txtchunks to a NotebookLM notebook. - Ask questions about the codebase and use the generated citations to trace answers back to the source text.
Development
Set up the local environment:
uv sync --group dev
Run the test suite:
uv run pytest
Run a focused test slice while iterating:
uv run pytest tests/test_processor.py -k markdown
Contributing
Contributions are welcome. Open an issue for bugs or feature requests, then send a pull request with focused changes and matching tests.
License
This project is licensed under the MIT License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyragify-0.2.0.tar.gz.
File metadata
- Download URL: pyragify-0.2.0.tar.gz
- Upload date:
- Size: 153.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f8fd6ad7ef0e8e515883e1b6faf33fbd1fb8b1a4e3d5a822b1d812ebf904a81
|
|
| MD5 |
4133a8fc11971c4c0f496fb7cfd9a6c8
|
|
| BLAKE2b-256 |
ebdf8cee923cc17f08196590155c5ceb5e7d6f71ec5483cad2ddf69ad8601e00
|
File details
Details for the file pyragify-0.2.0-py3-none-any.whl.
File metadata
- Download URL: pyragify-0.2.0-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c40069eb43751d6d937c2e687747ec14c8cd25740c4889152a4f3e685b821e8
|
|
| MD5 |
44f4bdc36db2fef223705d5125730bb1
|
|
| BLAKE2b-256 |
2ac85220f96dd0f19bd3326ad733afefa7566d18f123388f9213215ac475c311
|