Skip to main content

Chunk long text with policies.

Project description

chunkle

Split big text into reader‑friendly pieces while respecting line and token budgets.

Install

pip install chunkle

Compatible with Python ≥ 3.11.

Quick start

from chunkle import chunk

for part in chunk(big_text, lines_per_chunk=20, tokens_per_chunk=500):
    ...  # stream, save, or send

The generator yields a chunk the moment both budgets are met.

Defaults

  • lines_per_chunk = 20
  • tokens_per_chunk = 500

API

def chunk(
    content: str,
    *,
    lines_per_chunk: int = 20,
    tokens_per_chunk: int = 500,
    encoding: tiktoken.Encoding | None = None,
) -> typing.Generator[str, None, None]:
    ...

Comming Next

  • Benchmark batched vs. per‑char tokenization on a 10 MB multilingual file.
  • Ship 0.1.1 with CRLF handling and an expanded README.
  • Add a GitHub Action matrix (Python 3.11 & 3.12) to prevent regressions.

License

MIT © 2025 Allen Chou

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunkle-0.1.0.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chunkle-0.1.0-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file chunkle-0.1.0.tar.gz.

File metadata

  • Download URL: chunkle-0.1.0.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.13 Darwin/24.5.0

File hashes

Hashes for chunkle-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e1875d7426f7d09fab51165651896ded734607f13ea2442099981a14c74a965d
MD5 60bdfa4f57547bdeeb54c98c2879a996
BLAKE2b-256 6de95e58bbfb3056d7abcbf83159f20d2bfc746ce8a2108da8241f946ddb0fa3

See more details on using hashes here.

File details

Details for the file chunkle-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: chunkle-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.13 Darwin/24.5.0

File hashes

Hashes for chunkle-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7fe388075122d1467b968ba51d00860e8ff5a30ce10a9c050dbaa1b3dee0e980
MD5 81f308da75192aebfb9f3be121f734f6
BLAKE2b-256 a71656a4e71c069a339e80860d627d860dddb261cff0d5402278473ccbc0b26e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page