Chunkipy is an easy-to-use library for chunking text based on the size estimator function you provide.
Project description
Chunkipy
chunkipy is a modular and extensible text chunking library for Python, built for NLP and LLM pipelines.
Why Chunkipy?
- ✅ Lightweight core with optional extras
- ✅ Configurable overlap support via
overlap_ratio - ✅ Composable architecture (chunkers + splitters + size estimators + language detectors)
- ✅ Practical defaults with customizable behavior
Quick Example
from chunkipy import FixedSizeTextChunker
text = "Chunkipy makes text processing modular, flexible, and powerful!"
chunker = FixedSizeTextChunker(chunk_size=20, overlap_ratio=0.2)
chunks = chunker.chunk(text)
for i, c in enumerate(chunks):
print(f"Chunk {i + 1}: {c}")
Implemented vs Roadmap
| Status | Strategy |
|---|---|
| ✅ Implemented | FixedSizeTextChunker |
| ✅ Implemented | RecursiveTextChunker |
| 🚧 Roadmap | Document-based chunking |
| 🚧 Roadmap | Semantic chunker |
| 🚧 Roadmap | LLM-based chunker |
Semantic sentence splitters and language detectors are already available and can be used today.
Installation
Install core package:
pip install chunkipy
Install optional feature groups:
pip install "chunkipy[language-detection]" # Language detection (langdetect + fasttext)
pip install "chunkipy[nlp]" # NLP backends (spacy + stanza)
pip install "chunkipy[ai]" # LLM integration (openai + tiktoken)
pip install "chunkipy[all]" # All optional dependencies
Or install individual packages:
pip install "chunkipy[spacy]"
pip install "chunkipy[stanza]"
pip install "chunkipy[langdetect]"
pip install "chunkipy[fasttext]"
pip install "chunkipy[openai]"
pip install "chunkipy[tiktoken]"
Documentation
Full guides and API reference: 👉 https://gioelecrispo.github.io/chunkipy
Examples: 👉 https://github.com/gioelecrispo/chunkipy/tree/main/examples
Contributing
Issues and pull requests are welcome: 👉 https://github.com/gioelecrispo/chunkipy/issues
For local setup, see CONTRIBUTING.md.
License
chunkipy is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chunkipy-1.2.0.tar.gz.
File metadata
- Download URL: chunkipy-1.2.0.tar.gz
- Upload date:
- Size: 18.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
012ba791c4d02e0c7c8e42784793b918f40ff91384e8d4fd0824f2d08f8e4f7f
|
|
| MD5 |
6335b04a7e3bb9a0879edb8a75f0eaab
|
|
| BLAKE2b-256 |
23ef5e00cee9bdc0ef8588012ab1d93e0e7b6b5ce67e7415cbeabf36d36d98dc
|
File details
Details for the file chunkipy-1.2.0-py3-none-any.whl.
File metadata
- Download URL: chunkipy-1.2.0-py3-none-any.whl
- Upload date:
- Size: 25.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06812a3fdf5fb2b41e35562960090f3795fafa8343594b2fda6a22185b33a008
|
|
| MD5 |
b77fc99c14bb048c0e68dbd4b4640e7c
|
|
| BLAKE2b-256 |
e29d9080700800339da110d2c375916b0aa5d745952c91087c20a8a1cd7f3d3a
|