Skip to main content

Columnar and compressed data containers.

Project description

bcolz: columnar and compressed data containers

Gitter Version Documentation GitHub Workflow StatusCoverage Status GitHub Workflow Status GitHub Workflow Status License: BSD Twitter: @ml4t Blosc

bcolz provides columnar, chunked data containers that can be compressed either in-memory and on-disk. Column storage allows for efficiently querying tables, as well as for cheap column addition and removal. It is based on NumPy, and uses it as the standard data container to communicate with bcolz objects, but it also comes with support for import/export facilities to/from HDF5/PyTables tables and pandas dataframes.

bcolz objects are compressed by default not only for reducing memory/disk storage, but also to improve I/O speed. The compression process is carried out internally by Blosc, a high-performance, multithreaded meta-compressor that is optimized for binary data (although it works with text data just fine too).

bcolz can also use numexpr internally (it does that by default if it detects numexpr installed) or dask so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr/dask can optimize the memory usage and use multithreading for doing the computations, so it is blazing fast. This, in combination with carray/ctable disk-based, compressed containers, can be used for performing out-of-core computations efficiently, but most importantly transparently.

Just to whet your appetite, here is an example with real data, where bcolz is already fulfilling the promise of accelerating memory I/O by using compression.

Rationale

By using compression, you can deal with more data using the same amount of memory, which is very good on itself. But in case you are wondering about the price to pay in terms of performance, you should know that nowadays memory access is the most common bottleneck in many computational scenarios, and that CPUs spend most of its time waiting for data. Hence, having data compressed in memory can reduce the stress of the memory subsystem as well.

Furthermore, columnar means that the tabular datasets are stored column-wise order, and this turns out to offer better opportunities to improve compression ratio. This is because data tends to expose more similarity in elements that sit in the same column rather than those in the same row, so compressors generally do a much better job when data is aligned in such column-wise order. In addition, when you have to deal with tables with a large number of columns and your operations only involve some of them, a columnar-wise storage tends to be much more effective because minimizes the amount of data that travels to CPU caches.

So, the ultimate goal for bcolz is not only reducing the memory needs of large arrays/tables, but also making bcolz operations to go faster than using a traditional data container like those in NumPy or Pandas. That is actually already the case in some real-life scenarios (see the notebook above) but that will become pretty more noticeable in combination with forthcoming, faster CPUs integrating more cores and wider vector units.

Requisites

  • Python >= 3.7
  • NumPy >= 1.16.5
  • Cython >= 0.22 (just for compiling the beast)
  • C-Blosc >= 1.8.0 (optional, as the internal Blosc will be used by default)

Optional:

  • numexpr >= 2.5.2
  • dask >= 0.9.0
  • pandas
  • tables (pytables)

Installing as wheel

There are wheels for Linux and Mac OS X that you can install with

pip
install
bcolz - zipline

Then also install NumPy with

pip
install
numpy

and test your installation with

python - c
'import bcolz;bcolz.test()'

Building

There are different ways to compile bcolz, depending if you want to link with an already installed Blosc library or not.

Compiling with an installed Blosc library (recommended)

Python and Blosc-powered extensions have a difficult relationship when compiled using GCC, so this is why using an external C-Blosc library is recommended for maximum performance (for details, see https://github.com/Blosc/python-blosc/issues/110).

Go to https://github.com/Blosc/c-blosc/releases and download and install the C-Blosc library. Then, you can tell bcolz where is the C-Blosc library in a couple of ways:

Using an environment variable:

$ BLOSC_DIR=/usr/local     (or "set BLOSC_DIR=\blosc" on Win)
$ export BLOSC_DIR         (not needed on Win)
$ python setup.py build_ext --inplace

Using a flag:

$ python setup.py build_ext --inplace --blosc=/usr/local

Compiling without an installed Blosc library

bcolz also comes with the Blosc sources with it so, assuming that you have a C++ compiler installed, do:

$ python setup.py build_ext --inplace

That's all. You can proceed with testing section now.

Note: The requirement for the C++ compiler is just for the Snappy dependency. The rest of the other components of Blosc are pure C (including the LZ4 and Zlib libraries).

Testing

After compiling, you can quickly check that the package is sane by running:

$ PYTHONPATH=.   (or "set PYTHONPATH=." on Windows)
$ export PYTHONPATH    (not needed on Windows)
$ python -c"import bcolz; bcolz.test()"  # add `heavy=True` if desired

Installing

Install it as a typical Python package:

$ pip install -U .

Optionally Install the additional dependencies:

$ pip install .[optional]

Documentation

You can find the online manual at:

http://bcolz.blosc.org

but of course, you can always access docstrings from the console (i.e. help(bcolz.ctable)).

Also, you may want to look at the bench/ directory for some examples of use.

Resources

Visit the main bcolz site repository at: http://github.com/Blosc/bcolz

Home of Blosc compressor: http://blosc.org

User's mail list: http://groups.google.com/group/bcolz (bcolz@googlegroups.com)

An introductory talk (20 min) about bcolz at EuroPython

  1. Slides here.

License

Please see BCOLZ.txt in LICENSES/ directory.

Share your experience

Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.

Enjoy Data!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bcolz-zipline-1.2.4.tar.gz (1.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

bcolz_zipline-1.2.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

bcolz_zipline-1.2.4-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.6 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.12+ x86-64manylinux: glibc 2.5+ x86-64

bcolz_zipline-1.2.4-cp39-cp39-macosx_11_0_arm64.whl (787.6 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

bcolz_zipline-1.2.4-cp39-cp39-macosx_10_15_x86_64.whl (947.1 kB view details)

Uploaded CPython 3.9macOS 10.15+ x86-64

bcolz_zipline-1.2.4-cp39-cp39-macosx_10_15_universal2.whl (1.6 MB view details)

Uploaded CPython 3.9macOS 10.15+ universal2 (ARM64, x86-64)

bcolz_zipline-1.2.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

bcolz_zipline-1.2.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ x86-64manylinux: glibc 2.5+ x86-64

bcolz_zipline-1.2.4-cp38-cp38-macosx_11_0_arm64.whl (801.5 kB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

bcolz_zipline-1.2.4-cp38-cp38-macosx_10_15_x86_64.whl (958.4 kB view details)

Uploaded CPython 3.8macOS 10.15+ x86-64

bcolz_zipline-1.2.4-cp38-cp38-macosx_10_15_universal2.whl (1.7 MB view details)

Uploaded CPython 3.8macOS 10.15+ universal2 (ARM64, x86-64)

bcolz_zipline-1.2.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.7 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

bcolz_zipline-1.2.4-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ x86-64manylinux: glibc 2.5+ x86-64

bcolz_zipline-1.2.4-cp37-cp37m-macosx_10_15_x86_64.whl (950.0 kB view details)

Uploaded CPython 3.7mmacOS 10.15+ x86-64

File details

Details for the file bcolz-zipline-1.2.4.tar.gz.

File metadata

  • Download URL: bcolz-zipline-1.2.4.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for bcolz-zipline-1.2.4.tar.gz
Algorithm Hash digest
SHA256 8f5690043d02c40686989af158732cdcda6cca046a20475731b874ea0f56eb2e
MD5 b8c212df89901110fb2b48efe96853a4
BLAKE2b-256 e8b3ce2613c2876a824761a99a42b07e86267112fbf0c72071a1cf0e266ae9b0

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4e43b1cc34c730cebad6ee8a0cc4e26c9acaa2c70880620e5f36bacc51759dd2
MD5 fe142f1b3fd2d58907ed3164089b22e4
BLAKE2b-256 878236e8c9067997004a311c9afca2a9bf042e7758aa7ef6b0cefb17ab97102c

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.4-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.4-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 8a58d78579ccd071c8779710d070a4dff77acb37e298a74d3e24108d968e9d91
MD5 f92ce7d07c00eaad9df3a2aa135c6721
BLAKE2b-256 35d4e405276449dddcbc0eb1b3b347ef2689227768e3260beef2e795a58ff8da

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.4-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

  • Download URL: bcolz_zipline-1.2.4-cp39-cp39-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 787.6 kB
  • Tags: CPython 3.9, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for bcolz_zipline-1.2.4-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1afcc5cf2e01fdfca1d75e0e548dbb133401917f7c9e14335c35b96db33ce001
MD5 061c4e97b3705e3735cf817156118e1e
BLAKE2b-256 e61ab6763b7b0285a4bcdd5f085b613a9b8ddb5ab9b7f0976efa3f7907652367

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.4-cp39-cp39-macosx_10_15_x86_64.whl.

File metadata

  • Download URL: bcolz_zipline-1.2.4-cp39-cp39-macosx_10_15_x86_64.whl
  • Upload date:
  • Size: 947.1 kB
  • Tags: CPython 3.9, macOS 10.15+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for bcolz_zipline-1.2.4-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 6d7ca6f30274024567e468b186783f59f9aecb6bdfd4fec90862f4e0be4a7497
MD5 22c77c9ca7abde6906745777c2b2617e
BLAKE2b-256 5e58ad26e66f50dd9643ac48460122623fc9bebde8d64ce07e9112242b5039b6

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.4-cp39-cp39-macosx_10_15_universal2.whl.

File metadata

  • Download URL: bcolz_zipline-1.2.4-cp39-cp39-macosx_10_15_universal2.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: CPython 3.9, macOS 10.15+ universal2 (ARM64, x86-64)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for bcolz_zipline-1.2.4-cp39-cp39-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 bf69c85300fc4640f9f42907d59f08fcb7dd5f352068b96811e72fb0345c17ed
MD5 094374556d4514c00938188e1f7378ef
BLAKE2b-256 df1f559e1ad71ce7ebf5f2a0196f97bc02f1ebf913d100e79fa00ba0860c2dae

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 43355e11106964b4675fb70df90d5c8cd44d3ab31611d21113e7d7e2824ce0cd
MD5 d635729249ca9ccd9592b14af93825d6
BLAKE2b-256 3480e9e6cc55932d130d32bc2759914def52528ea23ef66363cc2d1c8aff9a52

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 8b4d307164f2e295d6be13ea4da577f269402a01f6b726903b7a941686b49d22
MD5 7e8184b8da6c0d4692798ec51366f231
BLAKE2b-256 1ad339c37cfd9d8535a08a7aef28bd4b3eb894999ee16d5506ecf82623bb64a8

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.4-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

  • Download URL: bcolz_zipline-1.2.4-cp38-cp38-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 801.5 kB
  • Tags: CPython 3.8, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for bcolz_zipline-1.2.4-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a6d33401037e01c23b2cb31187a9fb8e2463d6cd3afaa6e4cef09cb38175853f
MD5 760f2b9fb166c37a30308fc846c4d4ed
BLAKE2b-256 c1c2e4162c6b45ae3897c5a6f8c0c1a3d898b2d357a04a204c71a232524bd55f

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.4-cp38-cp38-macosx_10_15_x86_64.whl.

File metadata

  • Download URL: bcolz_zipline-1.2.4-cp38-cp38-macosx_10_15_x86_64.whl
  • Upload date:
  • Size: 958.4 kB
  • Tags: CPython 3.8, macOS 10.15+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for bcolz_zipline-1.2.4-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 49f4c1ca3eb635f665c5d65f98e47c8abb1b0df4656887a0255439fb1fc41c9a
MD5 e834fe3b0ae17ccafa9c10f0ddc03592
BLAKE2b-256 8a5f9c3ec18eb43f31ee1e5c38c316b85ed1ac20bfe0a28179fb561352b913c4

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.4-cp38-cp38-macosx_10_15_universal2.whl.

File metadata

  • Download URL: bcolz_zipline-1.2.4-cp38-cp38-macosx_10_15_universal2.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: CPython 3.8, macOS 10.15+ universal2 (ARM64, x86-64)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for bcolz_zipline-1.2.4-cp38-cp38-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 48c3c7892e5ee57531b107b16f9d9e55d0b2ceba99a93d03b00624d6e596c56c
MD5 de7a7c37ffe8569cb798cea00bf5d915
BLAKE2b-256 73a8f40d5cbd9e51a08c59becccabbd9edb24f242f1dfd81efd957de8688e8d7

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bf6dbf656896ed87447ceb900f5cc739925b99943645c4eeda8c63b031da8cb3
MD5 a71085d8b24d8bdf0cb4f24c4d814815
BLAKE2b-256 82fc72c94bd6b0018704b9c50e2922c49c620f85282613933089a655b520a009

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.4-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.4-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 a1fcda7d4ab06d0cea9ff8b261ef6257cb9813d703b14d301fa637bfffc1cf96
MD5 6a623cea577d0b91551a432a0db81082
BLAKE2b-256 c2ec875e0dabd046bd981476d85dd1c7f18a4bbe42916801ae07adadd1628163

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.4-cp37-cp37m-macosx_10_15_x86_64.whl.

File metadata

  • Download URL: bcolz_zipline-1.2.4-cp37-cp37m-macosx_10_15_x86_64.whl
  • Upload date:
  • Size: 950.0 kB
  • Tags: CPython 3.7m, macOS 10.15+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for bcolz_zipline-1.2.4-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 076c2b0ea2287b53d3bc65824ee58a46a67dc5310208085cb3afb6222077f4fe
MD5 6d512d9b0c23538351f5b174fd8a2ea6
BLAKE2b-256 96b587af453beca433353f8565777bb8d51b2ca02aba29d91073e5d535a011eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page