Chunk long text with policies.
Project description
chunkle
Split big text into reader‑friendly pieces while respecting line and token budgets.
Install
pip install chunkle
Compatible with Python ≥ 3.11.
Quick start
from chunkle import chunk
for part in chunk(big_text, lines_per_chunk=20, tokens_per_chunk=500):
... # stream, save, or send
The generator yields a chunk the moment both budgets are met.
Defaults
lines_per_chunk = 20tokens_per_chunk = 500
API
def chunk(
content: str,
*,
lines_per_chunk: int = 20,
tokens_per_chunk: int = 500,
encoding: tiktoken.Encoding | None = None,
) -> typing.Generator[str, None, None]:
...
Comming Next
- Benchmark batched vs. per‑char tokenization on a 10 MB multilingual file.
- Ship 0.1.1 with CRLF handling and an expanded README.
- Add a GitHub Action matrix (Python 3.11 & 3.12) to prevent regressions.
License
MIT © 2025 Allen Chou
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
chunkle-0.1.0.tar.gz
(3.4 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chunkle-0.1.0.tar.gz.
File metadata
- Download URL: chunkle-0.1.0.tar.gz
- Upload date:
- Size: 3.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.13 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1875d7426f7d09fab51165651896ded734607f13ea2442099981a14c74a965d
|
|
| MD5 |
60bdfa4f57547bdeeb54c98c2879a996
|
|
| BLAKE2b-256 |
6de95e58bbfb3056d7abcbf83159f20d2bfc746ce8a2108da8241f946ddb0fa3
|
File details
Details for the file chunkle-0.1.0-py3-none-any.whl.
File metadata
- Download URL: chunkle-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.13 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fe388075122d1467b968ba51d00860e8ff5a30ce10a9c050dbaa1b3dee0e980
|
|
| MD5 |
81f308da75192aebfb9f3be121f734f6
|
|
| BLAKE2b-256 |
a71656a4e71c069a339e80860d627d860dddb261cff0d5402278473ccbc0b26e
|