Skip to main content

Columnar and compressed data containers.

Project description

bcolz: columnar and compressed data containers

Gitter Version Documentation GitHub Workflow StatusCoverage Status GitHub Workflow Status GitHub Workflow Status License: BSD Twitter: @ml4t Blosc

bcolz provides columnar, chunked data containers that can be compressed either in-memory and on-disk. Column storage allows for efficiently querying tables, as well as for cheap column addition and removal. It is based on NumPy, and uses it as the standard data container to communicate with bcolz objects, but it also comes with support for import/export facilities to/from HDF5/PyTables tables and pandas dataframes.

bcolz objects are compressed by default not only for reducing memory/disk storage, but also to improve I/O speed. The compression process is carried out internally by Blosc, a high-performance, multithreaded meta-compressor that is optimized for binary data (although it works with text data just fine too).

bcolz can also use numexpr internally (it does that by default if it detects numexpr installed) or dask so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr/dask can optimize the memory usage and use multithreading for doing the computations, so it is blazing fast. This, in combination with carray/ctable disk-based, compressed containers, can be used for performing out-of-core computations efficiently, but most importantly transparently.

Just to whet your appetite, here is an example with real data, where bcolz is already fulfilling the promise of accelerating memory I/O by using compression.

Rationale

By using compression, you can deal with more data using the same amount of memory, which is very good on itself. But in case you are wondering about the price to pay in terms of performance, you should know that nowadays memory access is the most common bottleneck in many computational scenarios, and that CPUs spend most of its time waiting for data. Hence, having data compressed in memory can reduce the stress of the memory subsystem as well.

Furthermore, columnar means that the tabular datasets are stored column-wise order, and this turns out to offer better opportunities to improve compression ratio. This is because data tends to expose more similarity in elements that sit in the same column rather than those in the same row, so compressors generally do a much better job when data is aligned in such column-wise order. In addition, when you have to deal with tables with a large number of columns and your operations only involve some of them, a columnar-wise storage tends to be much more effective because minimizes the amount of data that travels to CPU caches.

So, the ultimate goal for bcolz is not only reducing the memory needs of large arrays/tables, but also making bcolz operations to go faster than using a traditional data container like those in NumPy or Pandas. That is actually already the case in some real-life scenarios (see the notebook above) but that will become pretty more noticeable in combination with forthcoming, faster CPUs integrating more cores and wider vector units.

Requisites

  • Python >= 3.9
  • NumPy >= 1.16.5
  • Cython >= 0.22 (Python 3.12 > 3) (just for compiling the beast)
  • C-Blosc >= 1.8.0 (optional, as the internal Blosc will be used by default)

Optional:

  • numexpr >= 2.5.2
  • dask >= 0.9.0
  • pandas
  • tables (pytables)

Installing as wheel

There are wheels for Linux and Mac OS X that you can install with

pip install bcolz-zipline

Then also install NumPy with

and test your installation with

python -c 'import bcolz;bcolz.test()'

Building

There are different ways to compile bcolz, depending on whether you want to link with an already installed Blosc library or not.

Compiling with an installed Blosc library (recommended)

Python and Blosc-powered extensions have a difficult relationship when compiled using GCC, so this is why using an external C-Blosc library is recommended for maximum performance (for details, see https://github.com/Blosc/python-blosc/issues/110).

Go to https://github.com/Blosc/c-blosc/releases and download and install the C-Blosc library. Then, you can tell bcolz where is the C-Blosc library in a couple of ways:

Using an environment variable:

$ BLOSC_DIR=/usr/local     (or "set BLOSC_DIR=\blosc" on Win)
$ export BLOSC_DIR         (not needed on Win)
$ python setup.py build_ext --inplace

Using a flag:

$ python setup.py build_ext --inplace --blosc=/usr/local

Compiling without an installed Blosc library

bcolz also comes with the Blosc sources with it so, assuming that you have a C++ compiler installed, do:

$ python setup.py build_ext --inplace

That's all. You can proceed with testing section now.

Note: The requirement for the C++ compiler is just for the Snappy dependency. The rest of the other components of Blosc are pure C (including the LZ4 and Zlib libraries).

Testing

After compiling, you can quickly check that the package is sane by running:

$ PYTHONPATH=.   (or "set PYTHONPATH=." on Windows)
$ export PYTHONPATH    (not needed on Windows)
$ python -c "import bcolz; bcolz.test()"  # add `heavy=True` if desired

Installing

Install it as a typical Python package:

$ pip install -U .

Optionally Install the additional dependencies:

$ pip install .[optional]

Documentation

You can find the online manual at:

http://bcolz.blosc.org

but of course, you can always access docstrings from the console (i.e. help(bcolz.ctable)).

Also, you may want to look at the bench/ directory for some examples of use.

Resources

Visit the main bcolz site repository at: http://github.com/Blosc/bcolz

Home of Blosc compressor: http://blosc.org

User's mail list: http://groups.google.com/group/bcolz (bcolz@googlegroups.com)

An introductory talk (20 min) about bcolz at EuroPython

  1. Slides here.

License

Please see BCOLZ.txt in LICENSES/ directory.

Share your experience

Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.

Enjoy Data!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bcolz_zipline-1.2.12.tar.gz (2.9 MB view details)

Uploaded Source

Built Distributions

bcolz_zipline-1.2.12-cp312-cp312-win_amd64.whl (491.1 kB view details)

Uploaded CPython 3.12 Windows x86-64

bcolz_zipline-1.2.12-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

bcolz_zipline-1.2.12-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

bcolz_zipline-1.2.12-cp312-cp312-macosx_10_15_x86_64.whl (886.8 kB view details)

Uploaded CPython 3.12 macOS 10.15+ x86-64

bcolz_zipline-1.2.12-cp311-cp311-win_amd64.whl (499.4 kB view details)

Uploaded CPython 3.11 Windows x86-64

bcolz_zipline-1.2.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

bcolz_zipline-1.2.12-cp311-cp311-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

bcolz_zipline-1.2.12-cp311-cp311-macosx_10_15_x86_64.whl (894.4 kB view details)

Uploaded CPython 3.11 macOS 10.15+ x86-64

bcolz_zipline-1.2.12-cp310-cp310-win_amd64.whl (498.8 kB view details)

Uploaded CPython 3.10 Windows x86-64

bcolz_zipline-1.2.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

bcolz_zipline-1.2.12-cp310-cp310-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

bcolz_zipline-1.2.12-cp310-cp310-macosx_10_15_x86_64.whl (893.2 kB view details)

Uploaded CPython 3.10 macOS 10.15+ x86-64

bcolz_zipline-1.2.12-cp39-cp39-win_amd64.whl (498.6 kB view details)

Uploaded CPython 3.9 Windows x86-64

bcolz_zipline-1.2.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

bcolz_zipline-1.2.12-cp39-cp39-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

bcolz_zipline-1.2.12-cp39-cp39-macosx_10_15_x86_64.whl (893.4 kB view details)

Uploaded CPython 3.9 macOS 10.15+ x86-64

File details

Details for the file bcolz_zipline-1.2.12.tar.gz.

File metadata

  • Download URL: bcolz_zipline-1.2.12.tar.gz
  • Upload date:
  • Size: 2.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for bcolz_zipline-1.2.12.tar.gz
Algorithm Hash digest
SHA256 196b38c0d810f422374fc84f00139a0c6bc03bd3bdc9def678c09c460d602456
MD5 b65ae747d798f6a5b30be41e4a5c9841
BLAKE2b-256 a1d3e60b96c1aaa7787e8bb1950edc9ec8a5708f7210a6ea57f006e7216ecdb2

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 d549ca3ed21f09388b49c1e150731ac1f44b5599cd36bc9a4e308103481b6566
MD5 e2934bf9016c51dcdfbbd383525a713e
BLAKE2b-256 d72e75733272e7e972e435c7f2cfce87af6af5896ae1a896be2d1475c0d91113

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2d027b7ee68d1d33f3f22469bdb97f2a56ac1d0209ed27a99369d4c1db7426a5
MD5 4f2660d556c0f01841ae4eccf3dc43ad
BLAKE2b-256 c7fa805219137a4b49d27058a7acfb074c1f8f1edd0ef81231c03873b08a9ae6

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8035ba73ae05e69c321924168ee79f2d468b50c327c3b14fbc97f7e4bd3c4f77
MD5 2a581be8368b15811b159d825e9b8fd6
BLAKE2b-256 755bbbca324344243d9473a9a874c109734292dab6b958b3bfcc13b00f6f75fe

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp312-cp312-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp312-cp312-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 b4d0b509360741ae042b628409f90a50f2ca0e988a18e3c146998246b8a853c0
MD5 16b654873768c45d0ffc20c415678795
BLAKE2b-256 aa29e29a2b9a5e88f20db5293142bf6ca6094576cec12440620f02a33d31961e

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a254d6206b84ddaabc7f8d9a6b8c852d13172ab238b858bd88f9bf7191b40b78
MD5 f4aa82e60a62da485e6abfedc93fb2c5
BLAKE2b-256 ff2026f11f5392e93d6ae7abe5daeb7d38eec1e1e4d47945b11b4eb0548c402d

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1d8d65440eca3c96b908175c0929072089fe140dbaf5a32cd35b3254814fe22a
MD5 f79e46613a38fb8c038b8897b715d73c
BLAKE2b-256 541e8b0ec42691579f0b06e392b21d2ca167226aaa6a1af9f29b248bd374f276

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 75a11c6bd9b368267528f109071d73385ca977d796954298d30a429f8377f24e
MD5 33d1a4e754835ed42a364a59819b968c
BLAKE2b-256 90f4458ec1f470fac52babb98777ff76a0ef7eda4a53a144e8be4772d4fd1999

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp311-cp311-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp311-cp311-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 f1a7702f99259425832e0862d1065c675b00b08124d581e6d4a3f38b162a4194
MD5 b14050ff1fd944d1b20b0ebe95881ae6
BLAKE2b-256 4d8368043158465d5f1d13a267e6f70ff37a6f6e8eb8f5610c911527919b7465

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 f91b0178c5b0130480019a7001380db40cd92cd5cd3a8a3b64f8a5f353515581
MD5 f0f2c7e5faab260a2e0bbde07d56b4d1
BLAKE2b-256 f3ff68cf0e170723d3c8453161b2573076d17e249479d0a75665f13131f80d36

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2db9410fb3f8777ac41c7a334d5f698fe4f1ebe39dcc807c442dda32a39dd832
MD5 74906055ea049bebf19ecdec0a556995
BLAKE2b-256 cfb65bb0ac5ff51b3dddc1c1fa5ddf8a3d95d43dc90d779ad55204301095925c

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 51fc85e380afa43dcec6d0af69c71289130db768655b42b62609552c9054a614
MD5 db59fcd13b1799c0b6d2ba68928ccf5f
BLAKE2b-256 35188f211ea5ff10f558afd693c7b1ff4048a05fa89a7ae171646ea1bb8a7f6c

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp310-cp310-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp310-cp310-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 ab276c9ff5bd340e6ca754338af10516aebfa8649626771ea3a88ca031f794aa
MD5 f80662d11b2dcbe9026e71b6f841c587
BLAKE2b-256 502462c4f17949262d14837b267b535c849788d4c8c370f566a4b598dbec7e02

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 564c1ae32b89f57f80fd304a44422d57f634e968168b101bf6379e981cfe0934
MD5 dad9496db4bc0ff67e5bd67f9692bcda
BLAKE2b-256 36be1cb659114674509b1cc9daabf5d4659092d98f43f74150644e4aa2bcab15

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1ae054f418e2b17b92efac31a1f92372211b43c3c78a8926ebb597684eb50b6b
MD5 379d8dd6dbe401a761f90dbf61f08c3b
BLAKE2b-256 43461ee3195a1a49741d03012674bb3c134bddffb26923428bda157dd3df48c5

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 43ae60cb6867f874432059cb6d1978987624e7cc0f825975e585d02ce223176c
MD5 c915ae7febb033ae0499701327dc9317
BLAKE2b-256 e93c6efad82338e49b6cb16e3ce9f9bafbfb30693880b546e679b3aab42ae9db

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.2.12-cp39-cp39-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.2.12-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 2c0f4a56fc04d289705dde208ad00ecc9a3a1354bfb2503063087f44e6a23400
MD5 55d46eaac0c4a615e397beb7b544d139
BLAKE2b-256 368d83e4ad5a4416d4fdabc6a9134faeff22c3e17594a839115142b43bf24bae

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page