Skip to main content

A lightweight package for chunking iterables

Project description

Chonker is a lightweith python package that makes it easy to chunkify your data.

Benefits of chonker are:

  • Simple and flexible user interface
  • Performant implementation using iterables

The design of chonker exposes chunking logic directly to the user resulting in readable and extensible code. The entire chonker api can be expressed in a single example snippet:

from chonker import chonkify

for chonk, row in chonkify(data, size=100):
    print(chonk.size, chonk.index, chonk.subindex, chonk.rawindex)
    if chonk.is_start_of:
        print('Do something at the start of the chunk')
    elif chonk.is_end_of:
        print('Do something at the end of the chunk')

A common use case for chonker is checkpointing

data = []
for chonk, row in chonkify(data, size=100):
    data.append(process(row))
    if chonk.is_end_of:
        checkpoint(data)

Notice how this code looks similar to the equivalent code using enumerate

data = []
chunksize = 100
for idx, row in enumerate(data):
    data.append(process(row))
    if idx%chunksize==len(data)-1:
        checkpoint(data)

Chonkers power is hidden in it's simplicity. The simalarities between enumerate and chonkify make it easy to adopt. The chonking syntax is readable, while abstracting out the use of // and % avoids common errors with indexing. Chonk objects a dataclasses that can be pretty printed, compared, and stored with your data if you like. Note that chonk objects are reused between iterations for performance reasons; don't modify chonks unless you understand the implications.

Design considerations

The design of chonker was inspired by the tqdm library and by unix philosophy, e.g., do one simple thing and do it well. The design was also inspired by the python built-in enumerate. The name chonk is chosen specifically to distinguish the chonk data structure from the concept of a chunk. A chunk is a subset of data from a larger dataset. A chonk is a data structure that augements the indexes of an iteration.

The authors of chonker considered other designs. The standard chunking approch usually looks like

from chunker import chunkify

for chunk in chunkify(data, chunksize=100):
    do_something_special(chunk[0])
    for row in chunk[1:-1]:
        do_something_normal(row)
    do_something_els(chunk[-1])

This approach isn't particularly readable. With dealing with iterables this approach becomes even less readable.

Another common approach is to use callbacks like

from callchunk import chunkify

for row in chunkify(data, chunksize=100, start_callback=do_something_special, end_callback=do_something_else, skip_first=True, skip_last=True):
    do_something_normal(row)

This approach is even less readable and has a downside that the user can only specify a callback for predetermined events.

These two approaches share a common pitfall, they try to hide too much logic from the user. With chonker the user is exposed to the right level of abstraction and can handle their events as they see fit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chonkify-1.0.0.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

chonkify-1.0.0-py3-none-any.whl (3.9 kB view details)

Uploaded Python 3

File details

Details for the file chonkify-1.0.0.tar.gz.

File metadata

  • Download URL: chonkify-1.0.0.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.5

File hashes

Hashes for chonkify-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3b472c24b526de32713e377328411100592a951fecb25e092043b922870a89ad
MD5 283acbc1892ac3f35b7e9a459ca641f6
BLAKE2b-256 5b635ce39ad08ddca7cd84009b69f436ff93e521ecc46e6921a08794df4dfd36

See more details on using hashes here.

File details

Details for the file chonkify-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: chonkify-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 3.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.5

File hashes

Hashes for chonkify-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 39a546bb4fcfc7eefd48487f356f8f03dfa789e5815c7f0b264a4bae5749ca65
MD5 d8f502d5f594ffd984079a18359b934b
BLAKE2b-256 3af00ceb72df7a9732feb650e943e082bcb3b4908c020b64dc7997cde5ce80c9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page