Skip to main content

A C++ implementation with Python bindings of StreamVByte.

Project description


libstreamvbyte


Table of Contents
  1. About The Project
  2. Getting Started
  3. Benchmark
  4. Roadmap
  5. Contributing
  6. License
  7. Reference
  8. Contact

About The Project

libstreamvbyte is a C++ implementation of StreamVByte, with Python bindings using pybind11.

StreamVByte is a integer compression technique that use SIMD instructions (vectorization) to improve performance. The library is optimized for CPUs with the SSSE3 instruction set (which is supported by most x86_64 processors), and can also be used with ARM processors and other 32-bit architectures, although it will fall back to scalar implementations in those cases.

With libstreamvbyte, you can quickly and efficiently compress integer sequences, reducing the amount of storage space and network bandwidth required. The library is easy to use and integrates seamlessly with Python via pybind11 bindings. Whether you're working with large datasets or building a distributed computing system, libstreamvbyte can help you improve performance and reduce the resources needed to handle your data.

Currently supports Python 3.10+ on Windows, Linux (manylinux_2_17, musllinux_1_1) and macOS (universal2).

(back to top)

Getting Started

Installation

For Python

Install from PyPI using pip.

pip install libstreamvbyte

Or install from .whl file.

pip install "path/to/your/downloaded/whl"

To find appropriate .whl file, please visit releases.

For C++

You must have CMake installed on your system.

# clone the repo
git clone https://github.com/wst24365888/libstreamvbyte
cd libstreamvbyte

# build and install
cmake .
make
sudo make install

Usage

For Python

Import libstreamvbyte first.

import libstreamvbyte as svb

And here are the APIs.

# Encode an array of unsigned integers into a byte array.
encode(arg0: numpy.ndarray[numpy.uint32]) -> numpy.ndarray[numpy.uint8]

# Decode a byte array into an array of unsigned integers.
decode(arg0: numpy.ndarray[numpy.uint8], arg1: int) -> numpy.ndarray[numpy.uint32]

# Encode an array of signed integers into an array of unsigned integers.
zigzag_encode(arg0: numpy.ndarray[numpy.int32]) -> numpy.ndarray[numpy.uint32]

# Decode an array of unsigned integers into an array of signed integers.
zigzag_decode(arg0: numpy.ndarray[numpy.uint32]) -> numpy.ndarray[numpy.int32]

# Check if the current wheel is a vectorized version.
is_vectorized_version() -> bool

For C++

Include streamvbyte.h first.

#include "streamvbyte.h"

For the APIs, please refer to include/streamvbyte.h.

Example

For Python

import libstreamvbyte as svb

N = 2**20 + 2

# type(original_data) == np.ndarray
# original_data.dtype == np.int32
original_data = np.random.randint(-2**31, 2**31, N, dtype=np.int32)

# type(compressed_bytes) == np.ndarray
# compressed_bytes.dtype == np.uint8
compressed_bytes = svb.encode(svb.zigzag_encode(original_data))

# type(recovered_data) == np.ndarray
# recovered_data.dtype == np.int32
recovered_data = svb.zigzag_decode(svb.decode(compressed_bytes, N))

For C++

#include "streamvbyte.h"

int main() {
    std::size_t N = (1 << 20) + 2;

    std::vector<int32_t> original_data(N);
    for (std::size_t i = 0; i < N; ++i) {
        original_data[i] = rand() - rand();
    }

    std::vector<uint8_t> compressed_bytes = streamvbyte::encode(streamvbyte::zigzag_encode(original_data));
    std::vector<int32_t> recovered_data = streamvbyte::zigzag_decode(streamvbyte::decode(compressed_bytes, N));

    return 0;
}

Compile it with linking to libstreamvbyte.

g++ -o example example.cpp -lstreamvbyte

(back to top)

Benchmark

OS: Linux 5.15.79.1-microsoft-standard-WSL2 x86_64
CPU: AMD Ryzen 5 3600 6-Core Processor (12) @ 3.600GHz

Run on (12 X 3593.26 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 512 KiB (x6)
  L3 Unified 16384 KiB (x1)
Load Average: 0.62, 0.90, 0.90
-------------------------------------------------------------------------------------------------------------
Benchmark                                                        Time             CPU   Iterations Throughput
-------------------------------------------------------------------------------------------------------------
BM_streamvbyte_encode/1000000/min_time:10.000               427609 ns       427609 ns        35964 9.35434G/s
BM_streamvbyte_encode/10000000/min_time:10.000             4387786 ns      4387691 ns         3262 9.11641G/s
BM_streamvbyte_encode/100000000/min_time:10.000           45285441 ns     45285378 ns          278 8.83287G/s
BM_streamvbyte_encode/1000000000/min_time:10.000         482895663 ns    482894996 ns           27 8.28337G/s
BM_streamvbyte_decode/1000000/min_time:10.000               176918 ns       176807 ns        81674 22.6235G/s
BM_streamvbyte_decode/10000000/min_time:10.000             3460414 ns      3460293 ns         4059 11.5597G/s
BM_streamvbyte_decode/100000000/min_time:10.000           35830694 ns     35830178 ns          399 11.1638G/s
BM_streamvbyte_decode/1000000000/min_time:10.000         395000967 ns    394998152 ns           29 10.1266G/s
BM_streamvbyte_zigzag_encode/1000000/min_time:10.000        198481 ns       198481 ns        71648 20.1531G/s
BM_streamvbyte_zigzag_encode/10000000/min_time:10.000      3905349 ns      3905318 ns         3699 10.2424G/s
BM_streamvbyte_zigzag_encode/100000000/min_time:10.000    38865616 ns     38865483 ns          367 10.2919G/s
BM_streamvbyte_zigzag_encode/1000000000/min_time:10.000  431700632 ns    431698141 ns           29 9.26573G/s
BM_streamvbyte_zigzag_decode/1000000/min_time:10.000        201529 ns       201529 ns        71350 19.8483G/s
BM_streamvbyte_zigzag_decode/10000000/min_time:10.000      3740073 ns      3739945 ns         3328 10.6953G/s
BM_streamvbyte_zigzag_decode/100000000/min_time:10.000    41444965 ns     41444779 ns          332  9.6514G/s
BM_streamvbyte_zigzag_decode/1000000000/min_time:10.000  416964668 ns    416963581 ns           32 9.59316G/s

Build Benchmarks from Source

cmake . \
    -DCMAKE_BUILD_TYPE=Release \
    -DBUILD_SHARED_LIBS=OFF \
    -DBUILD_PYBIND11=OFF \
    -DPRINT_BENCHMARK=OFF \
    -DBUILD_TESTS=ON \
    -DBUILD_BENCHMARKS=ON
make libstreamvbyte_benchmarks
./libstreamvbyte_benchmarks --benchmark_counters_tabular=true

(back to top)

Roadmap

  • Zigzag encoding/decoding.
  • Support ARM processors with NEON intrinsics.
  • Differential coding (delta encoding/decoding).

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feat/amazing-feature)
  3. Commit your Changes with Conventional Commits
  4. Push to the Branch (git push origin feat/amazing-feature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

Reference

(back to top)

Contact

Author

Project Link

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

libstreamvbyte-0.2.4.tar.gz (763.7 kB view details)

Uploaded Source

Built Distributions

libstreamvbyte-0.2.4-cp311-cp311-win_amd64.whl (59.4 kB view details)

Uploaded CPython 3.11 Windows x86-64

libstreamvbyte-0.2.4-cp311-cp311-win32.whl (47.3 kB view details)

Uploaded CPython 3.11 Windows x86

libstreamvbyte-0.2.4-cp311-cp311-musllinux_1_1_x86_64.whl (606.0 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

libstreamvbyte-0.2.4-cp311-cp311-musllinux_1_1_i686.whl (662.1 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

libstreamvbyte-0.2.4-cp311-cp311-musllinux_1_1_aarch64.whl (587.2 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ ARM64

libstreamvbyte-0.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (81.5 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

libstreamvbyte-0.2.4-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (86.4 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686

libstreamvbyte-0.2.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (76.0 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

libstreamvbyte-0.2.4-cp311-cp311-macosx_10_9_universal2.whl (103.7 kB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

libstreamvbyte-0.2.4-cp310-cp310-win_amd64.whl (59.4 kB view details)

Uploaded CPython 3.10 Windows x86-64

libstreamvbyte-0.2.4-cp310-cp310-win32.whl (47.3 kB view details)

Uploaded CPython 3.10 Windows x86

libstreamvbyte-0.2.4-cp310-cp310-musllinux_1_1_x86_64.whl (606.0 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

libstreamvbyte-0.2.4-cp310-cp310-musllinux_1_1_i686.whl (662.1 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

libstreamvbyte-0.2.4-cp310-cp310-musllinux_1_1_aarch64.whl (587.2 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ ARM64

libstreamvbyte-0.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (81.5 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

libstreamvbyte-0.2.4-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (86.4 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686

libstreamvbyte-0.2.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (76.1 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

libstreamvbyte-0.2.4-cp310-cp310-macosx_10_9_universal2.whl (103.7 kB view details)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file libstreamvbyte-0.2.4.tar.gz.

File metadata

  • Download URL: libstreamvbyte-0.2.4.tar.gz
  • Upload date:
  • Size: 763.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for libstreamvbyte-0.2.4.tar.gz
Algorithm Hash digest
SHA256 64e5deb067d8cfb55312dbaeb5673e739f6cbf4d403f4205c5a1540654ebe332
MD5 f7847f5c3efcee87a0483b4e8bd979da
BLAKE2b-256 4face648b0f5aec63c071796ebcee1dd06ba76910c3415eceb03d80303b0752f

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 e5135407525db92afdd84c0229b588c986bc0e8361aae2b98c97b29bcfbaa5cb
MD5 1370159ef6f091bc9e333b4a80e1330c
BLAKE2b-256 17c015890010ea57d0a9f30f6c744cd22d77b5c4c53dadfd36f7e0dba66de236

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp311-cp311-win32.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 bf266490dd9471f8e4c07cffa328075ba18af96f80e75563f1527f07abc0bb61
MD5 320aab8ad22ca5af0a92ad47a9961c1f
BLAKE2b-256 ed7a5c917d3236858f672f9087375e4cad78c81a87d87a5f86e41b1d31c5f0ea

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 11ee18c104d899d0dc1705dd58a2fe5b0867843854aeb401ff39e08a0ac09587
MD5 415743ca66f0593d0356e31892981249
BLAKE2b-256 aa460942faef59a9d93252922bc0ee60f70d052208216036d1153c0a1bcf10ec

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp311-cp311-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp311-cp311-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 96d05338f2e336a7a0bfa3790774ec6969ba0f32ef774d2768fc636271496084
MD5 1f0adacae9028e8335a2e2b121f8c787
BLAKE2b-256 aea09c942bb5673dc0bb1f64884f4f44ef7b31b4df97092c8bda0eb7fa653fd8

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp311-cp311-musllinux_1_1_aarch64.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp311-cp311-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 d13a8ae190d40e438549823348567fd947fafc41910c2fecd95f30048cfda901
MD5 fe96fc7bb57602a03b5d2eb48cc92551
BLAKE2b-256 d2f448cfab2c914d40b54e4123283d2ee444cc94a2fe0b9456e33e4ce81b2703

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a9a2da3bebce600952fe802a28591f60226331c214ebc4d4fcfde2ab2113d5d3
MD5 05fee654230939af2f9cd790f9150030
BLAKE2b-256 ef52ca191688def8184731d10e5b2229dcb3040cc13c61d194cf06ef0bafc73a

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 8e66c8802914003316cf7441de71fbf79cc7101dcb04ecc8018d80a4a1672f4f
MD5 dd70d18d3a04f886cbefba41fdd105ae
BLAKE2b-256 a23914915e0b2ceef5904f337265460da6e774f6f150b81642083743d2acc79e

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5c5d2700073cd32f43a9dd2e09019c8c4a2ad3bc6500f1977afea5b00950e877
MD5 14e1725517e2b18bfaf837109dc3256a
BLAKE2b-256 841424cbe392471df08980940fd51b542959aab76fd1519371ae82fa340b8099

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 d3d96817d3fa3c9d78bb33360762c710ea80c5341ac979c0fb7a7704ad99bd40
MD5 942fdc01a3308e6d2c8d0bddc008c3d5
BLAKE2b-256 78dcd5a3b84d14943d65eb986d1d9356722a819b676323217ce8dddb01ec7520

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 985eb602745dfa3bba6dfd420d5713fdf5b82c2db9c77608140456fe73d8bd4a
MD5 92a2962e5a547752b44ee4b16aa8faca
BLAKE2b-256 04891692cea66ff14ea9fdc773d52836357d0616ff10e956cc2a2698615c8e19

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp310-cp310-win32.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 de8fcf720f93616d89ec706087b1e86aa023f3f78feeff24e41fda25ca305766
MD5 617d141205947aa41ffd589e2692199b
BLAKE2b-256 d24461a36a0805c37d5ec1ce67dcb0caa5d43c4d840c41564035af4b929373a3

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 8c15d45e647f8d398d463c1d4d78bc86a054d09ce92dcb63608d6ebea7c49b33
MD5 a07290a3a715fa28308552d27092d37f
BLAKE2b-256 d069542d3778621eae4609f512a98612582f91c5621ac065f6e2a83274f3065a

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp310-cp310-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp310-cp310-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 89ba92ccb7480cf2ce2d44bb1abb4befc5ee6e39cdd78441f27d60f093c5a332
MD5 a5a648d6eb9f950b7d74f906326dbb2a
BLAKE2b-256 b9d656721d6af494bb62f37e4b282aff8fa360a5e033f6aba8f0c3e44b0e58bd

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp310-cp310-musllinux_1_1_aarch64.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp310-cp310-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 c07a23850f7f489cacf79c537a50abc2b4696f1ff4baa2d1675fc916dedcd00d
MD5 31612baa0fb204374eeb20371148317f
BLAKE2b-256 eb45fec2249e76beaff3eb3eaf466915122a763f2bd7d5a76258e6aed8751494

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 090163c767a729f948b8b5f054566176fc739c13d319cd56833d911b03fefa95
MD5 0349d551e7eb5d9feb8da4012649b827
BLAKE2b-256 8a0ed886ffc3196bbd51031cde72ac02b5adf627e6732ba8853e0de8326215df

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 46b614e86d705218fb4b333f31da06359ddfca497f4f3f3a57b5f9b0259ab195
MD5 57a3f920652c8b92b332a864d271d1c3
BLAKE2b-256 d925fe7e47bbe4028f8e1f0664a6d14b410920e2028094ab6dd6742206eadf8f

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c70a35490de72be1cd85b741e3e93b02219f2e23f3b322465abad5171df48d75
MD5 7f468c8b7147db18bdc334766819356d
BLAKE2b-256 884c2f767273767870d85acf5e1d7c29b91f4f78bc891793ca7efb26966f64b3

See more details on using hashes here.

File details

Details for the file libstreamvbyte-0.2.4-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for libstreamvbyte-0.2.4-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 ef8aca64e1752079d12ed1395e8b43070898fb2228484caa8fbd072cf73e16f1
MD5 c0e53328e90b0cca2cc2e467e325063b
BLAKE2b-256 b7bc3ee48a8ec06593cb8b8b3137384ee87a4aa5076ffc4903840767d12582e9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page