Skip to main content

Package biglist

Project description

biglist

biglist provides a class Biglist, which implements a persisted, out-of-memory list for Python. The main use case is processing large amounts of data on single or multiple machines while leveraging local disk or a cloud blob store for storage. It is a pure Python utility with a familiar Pythonic interface.

Mutation is append-only. Updating existing elements of the list is not supported.

Random element access by index and slice is supported, but not optimized. Iteration is optimized, which is the main target scenario of consumption.

Distributed reading and writing are supported. This means appending to or reading from a Biglist by multiple workers concurrently. In the case of reading, the data of the Biglist is split between the workers. When the storage is local, the workers are multiple threads or processes. When the storage is remote (i.e. in a cloud blob store), the workers are multiple threads or processes on one or more machines.

Of course, reading the entire list concurrently by each of a number of independent workers is also possible. That, however, is not called "distributed" reading.

A very early version of this work is described in a blog post.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biglist-0.5.0.tar.gz (10.7 kB view hashes)

Uploaded Source

Built Distribution

biglist-0.5.0-py3-none-any.whl (10.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page