Skip to main content

An open-source python library for writing large amounts of data to buffers via chunks

Project description

pychunkbuffers

An open-source python library for writing large amounts of data to buffers via chunks.

Description

This repositiory contains the source code for the pychunkbuffers library. I came up with the idea for this library while making my other project AdityaIyer2k7/image-file-hider. In that project, I often had to write large amounts of data (hundreds of megabytes) to lists and buffers. Doing this byte-by-byte took a lot of time, so instead I came up with the solution of chunking.

Basically, let us say we have a for loop that has to run 10^8 times, and each time it adds a value to a list. In a chunked implementation, you would pre-define this list like this:

[0]*10**8

and then create a function that goes from index a to b and updates that value of the list like this:

def func(startidx, endidx):
  for i in range(startidx, endidx):
    LIST[i] = SOMEVALUE

However, if we run func(0, 10**8), we are still running 10^8 iterations in sequence. Instead, we can run parts like func(0, 10000), func(10000, 20000) and so on simultaneously on threads. With this library, we can simply use the line

run_chunked(func, 10000, 0, 10**8) # Where 10000 is our chunk size, while 0 and 10**8 are our bounds

Now, we would like to check when all chunks have completed their tasks. The library implements this using a completion status list. The run_chunked function returns a list of boolean values which are all False when the chunks start. Whenever a chunk finishes its task, that specific chunk's status is set to True in the list. If we want to wait for all the chunks to finish, we can use a line like this:

while not all(STATUS): pass

Example implementation:

# Task: To write the squares values for numbers 1 to 10**8 (inclusive)
squares = [0]*10**8
CHUNKSIZE = 10**5
def func(startidx, endidx):
  for i in range(startidx, endidx):
    squares[i] = (i+1)**2
status = run_chunked(func, CHUNKSIZE, 0, len(squares))
while not all(status): pass
print("Done")
print(squares[:100])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pychunkbuffers-1.0.4.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

pychunkbuffers-1.0.4-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file pychunkbuffers-1.0.4.tar.gz.

File metadata

  • Download URL: pychunkbuffers-1.0.4.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for pychunkbuffers-1.0.4.tar.gz
Algorithm Hash digest
SHA256 98071e55bddcbec5fc6a5c351cebeb000bf25165720ea4bf77cb4c0d28d80f7a
MD5 b3645b545c406929bad9511818ffb25e
BLAKE2b-256 faa45280372fb448de383573cbd521c8912df9d08c5e63e510a34b9a535ab185

See more details on using hashes here.

File details

Details for the file pychunkbuffers-1.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for pychunkbuffers-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9e5b387ef41f1c8535b712cb7b39fd1c3d9d417a20cb0feef1e40f591119d518
MD5 998864e25d13bc6b0b506c0091adb8e8
BLAKE2b-256 3c839c7f8c41cb3c918d5e552cd4ffc6aca3e1fada9893137ce593abbcd74419

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page