Chunk long text with policies.
Project description
chunkle
Split big text into reader‑friendly pieces while respecting line and token budgets.
Install
pip install chunkle
Compatible with Python ≥ 3.11.
Quick start
from chunkle import chunk
for part in chunk(big_text, lines_per_chunk=20, tokens_per_chunk=500):
... # stream, save, or send
The generator yields a chunk the moment both budgets are met.
Defaults
lines_per_chunk = 20tokens_per_chunk = 500
API
def chunk(
content: str,
*,
lines_per_chunk: int = 20,
tokens_per_chunk: int = 500,
encoding: tiktoken.Encoding | None = None,
) -> typing.Generator[str, None, None]:
...
Comming Next
- Benchmark batched vs. per‑char tokenization on a 10 MB multilingual file.
- Ship 0.1.1 with CRLF handling and an expanded README.
- Add a GitHub Action matrix (Python 3.11 & 3.12) to prevent regressions.
License
MIT © 2025 Allen Chou
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
chunkle-0.2.0.tar.gz
(3.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chunkle-0.2.0.tar.gz.
File metadata
- Download URL: chunkle-0.2.0.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.13 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1426dfabcecc6e5645516a28a941e16e96a4b692e00b8e08fe792b7d45063b8
|
|
| MD5 |
b00a494a677f5e3211b6e27986bcfa22
|
|
| BLAKE2b-256 |
da667ce310b001650ef7d3fac1bcea73bf955ba5dfbcc713c80e6744c92c8759
|
File details
Details for the file chunkle-0.2.0-py3-none-any.whl.
File metadata
- Download URL: chunkle-0.2.0-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.13 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72c8c28947fa3b293f401a326d330f3e93cf5030fd35684346c76eda5ea7692e
|
|
| MD5 |
6bc86a9361bce6cc391869b89da77e38
|
|
| BLAKE2b-256 |
c5e101798fdf8cec391fd89f5ef6c6d8cf9d5b2a52f60638a3352ce3964ad26d
|