Semantic chunking utilities for scientific code and documentation corpora.
Project description
Chunky
Chunky is a python package for intelligently chunking scientific and technical repositories. It provides a modular pipeline that powers the Nancy Brain knowledge base and MCP services, while remaining useful as a standalone library for retrieval systems that need deterministic, metadata-rich chunks.
Documentation lives on Read the Docs: https://chunky.readthedocs.io
Installation
Install from source using the pyproject.toml metadata:
# clone the repo (if you haven't already)
git clone https://github.com/AmberLee2427/chunky.git
cd chunky
# install the library
pip install .
For development and documentation builds, install the optional extras:
pip install -e ".[dev,docs]"
-eperforms an editable install so local changes reflect immediately..[dev,docs]installs the tooling declared under thedevanddocsextras inpyproject.toml.
Tooling
- Code style: Ruff (
ruff check src testsorruff check src tests --fix) - Tests: Pytest (
pytest --cov=chunky) - Docs: Sphinx + MyST + Furo (
sphinx-build -b html docs docs/_build/html) - Packaging: Hatchling build backend
- Versioning: bump-my-version (driven by tags and the release workflow)
Workflows
- CI tests run on Linux, macOS, and Windows for Python 3.8 through 3.12.
- Pushing a tag that matches the form
vX.Y.Ztriggers the release workflow. It validates that the tag matches the version inpyproject.toml, builds the distribution, and publishes to PyPI using thePYPI_API_TOKENsecret. - Read the Docs builds the documentation automatically for pushes to the default branch. Local
builds use
sphinx-build -b html docs docs/_build/html.
Release checklist:
- Review and update
CHANGELOG.md, keeping the[Unreleased]section accurate. - Run
bump-my-version bump <part>to update version metadata and append a dated entry in the changelog. - Build distributions locally (
rm -rf dist && python -m build) and verify metadata withpython -m twine check dist/*. - Commit the changes and push to
main. - Tag the commit (
git tag vX.Y.Z && git push origin vX.Y.Z) to trigger the Release workflow. - Verify the PyPI publish job and Read the Docs build succeed.
Contributing
- Know your audience: most contributors will be scientific coders. Write docs assuming limited familiarity with packaging internals.
- Use Ruff for style checks and keep numpy-style docstrings on all non-test functions.
- Target test coverage above 70% and ensure existing CI jobs pass before opening a PR.
- In pull requests, summarise code changes, testing/validation, doc updates, and provide a brief TL;DR when the description runs long.
License
Chunky is released under the MIT License.
Glossary
| Term | Meaning |
|---|---|
| PR | GitHub pull request – a request to merge one branch or fork with another |
| Release | Publishing a tagged version of the project to PyPI |
| ChangeLog | A document describing changes between releases |
| PyPI | Python Package Index – where published distributions live |
| Ruff | A fast Python linter/formatter used for style enforcement |
| origin | The upstream GitHub repository |
| fork | A downstream copy of the origin repo used for contributing |
| master/main | The default branch |
| CI | Continuous Integration – automated checks that run on every push/PR |
| GitHub Workflows | GitHub’s automation runner configured via YAML files |
pyproject.toml |
Core metadata and build configuration for the package |
| bump-my-version | CLI used to bump version numbers consistently |
| Read the Docs | Hosted documentation service that builds from the repo |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chunky_files-0.2.2.tar.gz.
File metadata
- Download URL: chunky_files-0.2.2.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5ac25b98ae3d1fc079eec9d9aed4b945fdcd98fa6a39de4c196dadaa78440a9
|
|
| MD5 |
bc5d0b5def15829fe123276b98754883
|
|
| BLAKE2b-256 |
a8d806634a6e4256c8846f81308a4f1297a52b234f9de769cf8352caa833695e
|
File details
Details for the file chunky_files-0.2.2-py3-none-any.whl.
File metadata
- Download URL: chunky_files-0.2.2-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06419a5c6958e5e08b15393131444e0b1f9b909ea59bae4721125d66e9b3cd5f
|
|
| MD5 |
6abba3ffc3a5045826bdedb666badfae
|
|
| BLAKE2b-256 |
965bc84ebfd6d3f7e1c49946b2d212214c87043b4b818fcefac68a3d4332d2b1
|