Skip to main content

Columnar and compressed data containers.

Project description

bcolz: columnar and compressed data containers

Gitter Version Documentation GitHub Workflow StatusCoverage Status GitHub Workflow Status GitHub Workflow Status License: BSD Twitter: @ml4t Blosc

bcolz provides columnar, chunked data containers that can be compressed either in-memory and on-disk. Column storage allows for efficiently querying tables, as well as for cheap column addition and removal. It is based on NumPy, and uses it as the standard data container to communicate with bcolz objects, but it also comes with support for import/export facilities to/from HDF5/PyTables tables and pandas dataframes.

bcolz objects are compressed by default not only for reducing memory/disk storage, but also to improve I/O speed. The compression process is carried out internally by Blosc, a high-performance, multithreaded meta-compressor that is optimized for binary data (although it works with text data just fine too).

bcolz can also use numexpr internally (it does that by default if it detects numexpr installed) or dask so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr/dask can optimize the memory usage and use multithreading for doing the computations, so it is blazing fast. This, in combination with carray/ctable disk-based, compressed containers, can be used for performing out-of-core computations efficiently, but most importantly transparently.

Just to whet your appetite, here is an example with real data, where bcolz is already fulfilling the promise of accelerating memory I/O by using compression.

Rationale

By using compression, you can deal with more data using the same amount of memory, which is very good on itself. But in case you are wondering about the price to pay in terms of performance, you should know that nowadays memory access is the most common bottleneck in many computational scenarios, and that CPUs spend most of its time waiting for data. Hence, having data compressed in memory can reduce the stress of the memory subsystem as well.

Furthermore, columnar means that the tabular datasets are stored column-wise order, and this turns out to offer better opportunities to improve compression ratio. This is because data tends to expose more similarity in elements that sit in the same column rather than those in the same row, so compressors generally do a much better job when data is aligned in such column-wise order. In addition, when you have to deal with tables with a large number of columns and your operations only involve some of them, a columnar-wise storage tends to be much more effective because minimizes the amount of data that travels to CPU caches.

So, the ultimate goal for bcolz is not only reducing the memory needs of large arrays/tables, but also making bcolz operations to go faster than using a traditional data container like those in NumPy or Pandas. That is actually already the case in some real-life scenarios (see the notebook above) but that will become pretty more noticeable in combination with forthcoming, faster CPUs integrating more cores and wider vector units.

Requisites

  • Python >= 3.9
  • NumPy >= 1.16.5
  • Cython >= 0.22 (Python 3.12 > 3) (just for compiling the beast)
  • C-Blosc >= 1.8.0 (optional, as the internal Blosc will be used by default)

Optional:

  • numexpr >= 2.5.2
  • dask >= 0.9.0
  • pandas
  • tables (pytables)

Installing as wheel

There are wheels for Linux and Mac OS X that you can install with

pip install bcolz-zipline

Then also install NumPy with

and test your installation with

python -c 'import bcolz;bcolz.test()'

Building

There are different ways to compile bcolz, depending on whether you want to link with an already installed Blosc library or not.

Compiling with an installed Blosc library (recommended)

Python and Blosc-powered extensions have a difficult relationship when compiled using GCC, so this is why using an external C-Blosc library is recommended for maximum performance (for details, see https://github.com/Blosc/python-blosc/issues/110).

Go to https://github.com/Blosc/c-blosc/releases and download and install the C-Blosc library. Then, you can tell bcolz where is the C-Blosc library in a couple of ways:

Using an environment variable:

$ BLOSC_DIR=/usr/local     (or "set BLOSC_DIR=\blosc" on Win)
$ export BLOSC_DIR         (not needed on Win)
$ python setup.py build_ext --inplace

Using a flag:

$ python setup.py build_ext --inplace --blosc=/usr/local

Compiling without an installed Blosc library

bcolz also comes with the Blosc sources with it so, assuming that you have a C++ compiler installed, do:

$ python setup.py build_ext --inplace

That's all. You can proceed with testing section now.

Note: The requirement for the C++ compiler is just for the Snappy dependency. The rest of the other components of Blosc are pure C (including the LZ4 and Zlib libraries).

Testing

After compiling, you can quickly check that the package is sane by running:

$ PYTHONPATH=.   (or "set PYTHONPATH=." on Windows)
$ export PYTHONPATH    (not needed on Windows)
$ python -c "import bcolz; bcolz.test()"  # add `heavy=True` if desired

Installing

Install it as a typical Python package:

$ pip install -U .

Optionally Install the additional dependencies:

$ pip install .[optional]

Documentation

You can find the online manual at:

http://bcolz.blosc.org

but of course, you can always access docstrings from the console (i.e. help(bcolz.ctable)).

Also, you may want to look at the bench/ directory for some examples of use.

Resources

Visit the main bcolz site repository at: http://github.com/Blosc/bcolz

Home of Blosc compressor: http://blosc.org

User's mail list: http://groups.google.com/group/bcolz (bcolz@googlegroups.com)

An introductory talk (20 min) about bcolz at EuroPython

  1. Slides here.

License

Please see BCOLZ.txt in LICENSES/ directory.

Share your experience

Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.

Enjoy Data!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bcolz_zipline-1.13.0.tar.gz (2.9 MB view details)

Uploaded Source

Built Distributions

bcolz_zipline-1.13.0-cp312-cp312-win_amd64.whl (479.4 kB view details)

Uploaded CPython 3.12Windows x86-64

bcolz_zipline-1.13.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

bcolz_zipline-1.13.0-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

bcolz_zipline-1.13.0-cp312-cp312-macosx_10_15_x86_64.whl (877.1 kB view details)

Uploaded CPython 3.12macOS 10.15+ x86-64

bcolz_zipline-1.13.0-cp311-cp311-win_amd64.whl (487.6 kB view details)

Uploaded CPython 3.11Windows x86-64

bcolz_zipline-1.13.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

bcolz_zipline-1.13.0-cp311-cp311-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

bcolz_zipline-1.13.0-cp311-cp311-macosx_10_15_x86_64.whl (888.3 kB view details)

Uploaded CPython 3.11macOS 10.15+ x86-64

bcolz_zipline-1.13.0-cp310-cp310-win_amd64.whl (487.6 kB view details)

Uploaded CPython 3.10Windows x86-64

bcolz_zipline-1.13.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

bcolz_zipline-1.13.0-cp310-cp310-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

bcolz_zipline-1.13.0-cp310-cp310-macosx_10_15_x86_64.whl (881.9 kB view details)

Uploaded CPython 3.10macOS 10.15+ x86-64

bcolz_zipline-1.13.0-cp39-cp39-win_amd64.whl (487.0 kB view details)

Uploaded CPython 3.9Windows x86-64

bcolz_zipline-1.13.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

bcolz_zipline-1.13.0-cp39-cp39-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

bcolz_zipline-1.13.0-cp39-cp39-macosx_10_15_x86_64.whl (881.9 kB view details)

Uploaded CPython 3.9macOS 10.15+ x86-64

File details

Details for the file bcolz_zipline-1.13.0.tar.gz.

File metadata

  • Download URL: bcolz_zipline-1.13.0.tar.gz
  • Upload date:
  • Size: 2.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for bcolz_zipline-1.13.0.tar.gz
Algorithm Hash digest
SHA256 51166ad88e20ba70af7677b9402ffdecfe327ed4d0d15d59a9c249323bf11433
MD5 3953fe7326fad5a671bba1a45a32342f
BLAKE2b-256 ea31d67d966443addd1c267b8bbbca86cbe44dc48e5873e6fe5ab6ba258e6f61

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 8a0752e04702b35f548963c70f743732da3f8a5785046b0629de931d4cbc0e79
MD5 a711c18fdcefa2ebb2dc4cc68aa49718
BLAKE2b-256 3e1c2a41f1c2c318639bfcd68fa28e231fa137fe077cd54624e1ebe6e51da78c

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0e0c05fb73e4726f3cf1a1e389d5e16d8064e1045395ededee29082249f76e83
MD5 63869f8d80a7386f862652f039bdbb60
BLAKE2b-256 46a06a3ec2c6c93a90f95c0d47c32ebca22ff0a9c5b2222a7b54b78edaf1e4b0

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 df0ad38e76b340328b81789e9c1f6df1c17238d272255b7cee1025dc70de1603
MD5 a2dd562541f693e501eaaf0d53acb6d2
BLAKE2b-256 ae36e4b4220d3dfdde31b67fab3a637a15e334ffe47eef6b9160bee6e8217b08

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp312-cp312-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp312-cp312-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 0fc6f2f04bb71a86ed7f96f65022a213134839977e3c0aacb7a6c9ae36bbfe68
MD5 c190b93f153f2b6bfe9aa759237779d7
BLAKE2b-256 fd5a59b1ee779003a0886018f0f9327f8b9103fe9338c7558fe0c761471e146d

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 f9372bdb650d6107665a8e08c2cdfb8eaccad1a407d82be431b92f3ced0df954
MD5 30207037a793ba6ab35c9cf87e504db9
BLAKE2b-256 33e3440bc3f9a89c16e7d57fab26935c8f02bfc8efd7e0a16f10cc575a44342b

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 062b50d35a1957cdf51651344316604378138a698346021f754052da3156a85c
MD5 121328b24674aba34a4b57405b613d63
BLAKE2b-256 c4c404d30850b84aafa24e0f1493b78b313fa2c4b32a9c90dccbb8a1de6f15b1

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d4a1c7666459e264de8e9fbd5bf5b71c6cc82adfa173402f1d01ea1d2304b106
MD5 1efe716f5c55aae42ad7b486c7fe6408
BLAKE2b-256 c93a3635ac72aa9ce31d84eb2d2e5d16f5d5d0718a2709997cae1c58cd99a71b

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp311-cp311-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp311-cp311-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 e05904c3f2d7010dcd278214f9d47ea627a19913738997057c45c6f448f4f889
MD5 982e5476d6bc0f931480030abc4c9125
BLAKE2b-256 24972f6d26cec4f5f809b6d86e198edbed98b4585d4f3b59a8781c8d6fdc595e

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 001f66ffb4c73d1969d5a0e487fcc9514017df06da0e959a4d1699f50dc19c85
MD5 134c95e30a1c6377a150d97094cd58b9
BLAKE2b-256 fa1a99e7eca0e1fe9824c87d4704bec859ec324030bd9a8a7d4583ca33c39bc1

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0c12cddc39e8c9013beae59d561f98f1a2b89f80ae6243c8df2985206d8f0e6a
MD5 d48bcc101ba9dbc7d40c8b0d529f2fa0
BLAKE2b-256 c8b97a478e24e9b34eb6092940f97f4df44efd9fffd341591617ed11fdbe5ae1

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8023de82d0885239b5def92410abac5e8ddf248cd36a33a94c8637b4225eb51e
MD5 a4c5ff13cc4dd8659628b9975dbfc40d
BLAKE2b-256 42e5a44e6989a4c402fd4e04d0dc83b9fdd2cf587906fc171f6c056f3b817c5a

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp310-cp310-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp310-cp310-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 fedcd400cd915ec29ebd7a3df4176bd6a767b2a8eac6f6e794a06f56558cc384
MD5 b5fc35919e129f0fbaf0b48cd1d6dead
BLAKE2b-256 c0cd309e749bdc6c90a646ab5a825ac5d9cbef0760015c506fd01fa6e182a14b

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 4e108b4cd09289366fdc6f56e7144c0eb6f6634a144521d57aafcc2f7c03af77
MD5 161b051d63205320b9a3fb3f5bd2c4e3
BLAKE2b-256 189f010516686d66e71fee8eb20b583c90ac86107eabbf91cbbb0015e3b252da

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 aef1294aa6a72dbe364f26ce8bb73bfab321a1ccaf15a696e115b49103d92d8f
MD5 0346a6b24293644e6dd7eefbfddf330d
BLAKE2b-256 2bd0182bf5fdc159fd56bd22bcf0175ad14d490ca62341d79a13bd7a448b987d

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d855a6399e54bb89ff44cffed1074335f7763920df86eb74d5b0129d46e1751c
MD5 87ff47267402e726d3b109cb9555a840
BLAKE2b-256 a1280503df31729582dde77fda983465a0d0eaebf591785a29fb40953b826099

See more details on using hashes here.

File details

Details for the file bcolz_zipline-1.13.0-cp39-cp39-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for bcolz_zipline-1.13.0-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 cc11dd4e3a27c75afba2f523c0b1e0fbe52979f37de53761169202a856a05751
MD5 6d2edca2da1955a1674125b3f65a72ed
BLAKE2b-256 2da6c9cd463e05e90e5ceb4c5c8c52b7d007a0babc2196db964f24e2ba82ad98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page