Skip to main content

Chunkipy is an easy-to-use library for chunking text based on the size estimator function you provide.

Project description

Chunkipy

Python 3.10–3.13 PyPI version codecov Docs License: MIT


chunkipy is a modular and extensible text chunking library for Python, built for NLP and LLM pipelines.

Why Chunkipy?

  • ✅ Lightweight core with optional extras
  • ✅ Configurable overlap support via overlap_ratio
  • ✅ Composable architecture (chunkers + splitters + size estimators + language detectors)
  • ✅ Practical defaults with customizable behavior

Quick Example

from chunkipy import FixedSizeTextChunker

text = "Chunkipy makes text processing modular, flexible, and powerful!"
chunker = FixedSizeTextChunker(chunk_size=20, overlap_ratio=0.2)
chunks = chunker.chunk(text)

for i, c in enumerate(chunks):
    print(f"Chunk {i + 1}: {c}")

Implemented vs Roadmap

Status Strategy
✅ Implemented FixedSizeTextChunker
✅ Implemented RecursiveTextChunker
🚧 Roadmap Document-based chunking
🚧 Roadmap Semantic chunker
🚧 Roadmap LLM-based chunker

Semantic sentence splitters and language detectors are already available and can be used today.

Installation

Install core package:

pip install chunkipy

Install optional feature groups:

pip install "chunkipy[language-detection]"  # Language detection (langdetect + fasttext)
pip install "chunkipy[nlp]"                  # NLP backends (spacy + stanza)
pip install "chunkipy[ai]"                   # LLM integration (openai + tiktoken)
pip install "chunkipy[all]"                  # All optional dependencies

Or install individual packages:

pip install "chunkipy[spacy]"
pip install "chunkipy[stanza]"
pip install "chunkipy[langdetect]"
pip install "chunkipy[fasttext]"
pip install "chunkipy[openai]"
pip install "chunkipy[tiktoken]"

Documentation

Full guides and API reference: 👉 https://gioelecrispo.github.io/chunkipy

Examples: 👉 https://github.com/gioelecrispo/chunkipy/tree/main/examples

Contributing

Issues and pull requests are welcome: 👉 https://github.com/gioelecrispo/chunkipy/issues

For local setup, see CONTRIBUTING.md.

License

chunkipy is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunkipy-1.2.0.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chunkipy-1.2.0-py3-none-any.whl (25.1 kB view details)

Uploaded Python 3

File details

Details for the file chunkipy-1.2.0.tar.gz.

File metadata

  • Download URL: chunkipy-1.2.0.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chunkipy-1.2.0.tar.gz
Algorithm Hash digest
SHA256 012ba791c4d02e0c7c8e42784793b918f40ff91384e8d4fd0824f2d08f8e4f7f
MD5 6335b04a7e3bb9a0879edb8a75f0eaab
BLAKE2b-256 23ef5e00cee9bdc0ef8588012ab1d93e0e7b6b5ce67e7415cbeabf36d36d98dc

See more details on using hashes here.

File details

Details for the file chunkipy-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: chunkipy-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 25.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chunkipy-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 06812a3fdf5fb2b41e35562960090f3795fafa8343594b2fda6a22185b33a008
MD5 b77fc99c14bb048c0e68dbd4b4640e7c
BLAKE2b-256 e29d9080700800339da110d2c375916b0aa5d745952c91087c20a8a1cd7f3d3a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page