Skip to main content

Chunk long text with policies.

Project description

chunkle

Split big text into reader‑friendly pieces while respecting line and token budgets.

Install

pip install chunkle

Compatible with Python ≥ 3.11.

Quick start

from chunkle import chunk

for part in chunk(big_text, lines_per_chunk=20, tokens_per_chunk=500):
    ...  # stream, save, or send

The generator yields a chunk the moment both budgets are met.

Defaults

  • lines_per_chunk = 20
  • tokens_per_chunk = 500

API

def chunk(
    content: str,
    *,
    lines_per_chunk: int = 20,
    tokens_per_chunk: int = 500,
    encoding: tiktoken.Encoding | None = None,
) -> typing.Generator[str, None, None]:
    ...

Comming Next

  • Benchmark batched vs. per‑char tokenization on a 10 MB multilingual file.
  • Ship 0.1.1 with CRLF handling and an expanded README.
  • Add a GitHub Action matrix (Python 3.11 & 3.12) to prevent regressions.

License

MIT © 2025 Allen Chou

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunkle-0.2.0.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chunkle-0.2.0-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file chunkle-0.2.0.tar.gz.

File metadata

  • Download URL: chunkle-0.2.0.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.13 Darwin/24.5.0

File hashes

Hashes for chunkle-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a1426dfabcecc6e5645516a28a941e16e96a4b692e00b8e08fe792b7d45063b8
MD5 b00a494a677f5e3211b6e27986bcfa22
BLAKE2b-256 da667ce310b001650ef7d3fac1bcea73bf955ba5dfbcc713c80e6744c92c8759

See more details on using hashes here.

File details

Details for the file chunkle-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: chunkle-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.13 Darwin/24.5.0

File hashes

Hashes for chunkle-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 72c8c28947fa3b293f401a326d330f3e93cf5030fd35684346c76eda5ea7692e
MD5 6bc86a9361bce6cc391869b89da77e38
BLAKE2b-256 c5e101798fdf8cec391fd89f5ef6c6d8cf9d5b2a52f60638a3352ce3964ad26d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page