A C++ implementation with Python bindings of StreamVByte.
Project description
Table of Contents
About The Project
libstreamvbyte
is a C++
implementation of StreamVByte, with Python
bindings using pybind11
.
StreamVByte is a integer compression technique that use SIMD instructions (vectorization) to improve performance. The library is optimized for CPUs with the SSSE3
instruction set (which is supported by most x86_64 processors), and can also be used with ARM processors and other 32-bit architectures, although it will fall back to scalar implementations in those cases.
With libstreamvbyte
, you can quickly and efficiently compress integer sequences, reducing the amount of storage space and network bandwidth required. The library is easy to use and integrates seamlessly with Python
via pybind11
bindings. Whether you're working with large datasets or building a distributed computing system, libstreamvbyte
can help you improve performance and reduce the resources needed to handle your data.
Currently supports Python 3.10+
on Windows, Linux (manylinux_2_17
, musllinux_1_1
) and macOS (universal2
).
Getting Started
Installation
For Python
Install from PyPI
using pip
.
pip install libstreamvbyte
Or install from .whl
file.
pip install "path/to/your/downloaded/whl"
To find appropriate .whl
file, please visit releases.
For C++
You must have CMake
installed on your system.
# clone the repo
git clone https://github.com/wst24365888/libstreamvbyte
cd libstreamvbyte
# build and install
cmake .
make
sudo make install
Usage
For Python
Import libstreamvbyte
first.
import libstreamvbyte as svb
And here are the APIs.
# Encode an array of unsigned integers into a byte array.
encode(arg0: numpy.ndarray[numpy.uint32]) -> numpy.ndarray[numpy.uint8]
# Decode a byte array into an array of unsigned integers.
decode(arg0: numpy.ndarray[numpy.uint8], arg1: int) -> numpy.ndarray[numpy.uint32]
# Encode an array of signed integers into an array of unsigned integers.
zigzag_encode(arg0: numpy.ndarray[numpy.int32]) -> numpy.ndarray[numpy.uint32]
# Decode an array of unsigned integers into an array of signed integers.
zigzag_decode(arg0: numpy.ndarray[numpy.uint32]) -> numpy.ndarray[numpy.int32]
# Check if the current wheel is a vectorized version.
is_vectorized_version() -> bool
For C++
Include streamvbyte.h
first.
#include "streamvbyte.h"
For the APIs, please refer to include/streamvbyte.h.
Example
For Python
import libstreamvbyte as svb
N = 2**20 + 2
# type(original_data) == np.ndarray
# original_data.dtype == np.int32
original_data = np.random.randint(-2**31, 2**31, N, dtype=np.int32)
# type(compressed_bytes) == np.ndarray
# compressed_bytes.dtype == np.uint8
compressed_bytes = svb.encode(svb.zigzag_encode(original_data))
# type(recovered_data) == np.ndarray
# recovered_data.dtype == np.int32
recovered_data = svb.zigzag_decode(svb.decode(compressed_bytes, N))
For C++
#include "streamvbyte.h"
int main() {
std::size_t N = (1 << 20) + 2;
std::vector<int32_t> original_data(N);
for (std::size_t i = 0; i < N; ++i) {
original_data[i] = rand() - rand();
}
std::vector<uint8_t> compressed_bytes = streamvbyte::encode(streamvbyte::zigzag_encode(original_data));
std::vector<int32_t> recovered_data = streamvbyte::zigzag_decode(streamvbyte::decode(compressed_bytes, N));
return 0;
}
Compile it with linking to libstreamvbyte
.
g++ -o example example.cpp -lstreamvbyte
Benchmark
OS: Linux 5.15.79.1-microsoft-standard-WSL2 x86_64
CPU: AMD Ryzen 5 3600 6-Core Processor (12) @ 3.600GHz
Run on (12 X 3593.26 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x6)
L1 Instruction 32 KiB (x6)
L2 Unified 512 KiB (x6)
L3 Unified 16384 KiB (x1)
Load Average: 0.62, 0.90, 0.90
-------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations Throughput
-------------------------------------------------------------------------------------------------------------
BM_streamvbyte_encode/1000000/min_time:10.000 427609 ns 427609 ns 35964 9.35434G/s
BM_streamvbyte_encode/10000000/min_time:10.000 4387786 ns 4387691 ns 3262 9.11641G/s
BM_streamvbyte_encode/100000000/min_time:10.000 45285441 ns 45285378 ns 278 8.83287G/s
BM_streamvbyte_encode/1000000000/min_time:10.000 482895663 ns 482894996 ns 27 8.28337G/s
BM_streamvbyte_decode/1000000/min_time:10.000 176918 ns 176807 ns 81674 22.6235G/s
BM_streamvbyte_decode/10000000/min_time:10.000 3460414 ns 3460293 ns 4059 11.5597G/s
BM_streamvbyte_decode/100000000/min_time:10.000 35830694 ns 35830178 ns 399 11.1638G/s
BM_streamvbyte_decode/1000000000/min_time:10.000 395000967 ns 394998152 ns 29 10.1266G/s
BM_streamvbyte_zigzag_encode/1000000/min_time:10.000 198481 ns 198481 ns 71648 20.1531G/s
BM_streamvbyte_zigzag_encode/10000000/min_time:10.000 3905349 ns 3905318 ns 3699 10.2424G/s
BM_streamvbyte_zigzag_encode/100000000/min_time:10.000 38865616 ns 38865483 ns 367 10.2919G/s
BM_streamvbyte_zigzag_encode/1000000000/min_time:10.000 431700632 ns 431698141 ns 29 9.26573G/s
BM_streamvbyte_zigzag_decode/1000000/min_time:10.000 201529 ns 201529 ns 71350 19.8483G/s
BM_streamvbyte_zigzag_decode/10000000/min_time:10.000 3740073 ns 3739945 ns 3328 10.6953G/s
BM_streamvbyte_zigzag_decode/100000000/min_time:10.000 41444965 ns 41444779 ns 332 9.6514G/s
BM_streamvbyte_zigzag_decode/1000000000/min_time:10.000 416964668 ns 416963581 ns 32 9.59316G/s
Build Benchmarks from Source
cmake . \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \
-DBUILD_PYBIND11=OFF \
-DPRINT_BENCHMARK=OFF \
-DBUILD_TESTS=ON \
-DBUILD_BENCHMARKS=ON
make libstreamvbyte_benchmarks
./libstreamvbyte_benchmarks --benchmark_counters_tabular=true
Roadmap
- Zigzag encoding/decoding.
- Support ARM processors with
NEON
intrinsics. - Differential coding (delta encoding/decoding).
See the open issues for a full list of proposed features (and known issues).
Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feat/amazing-feature
) - Commit your Changes with Conventional Commits
- Push to the Branch (
git push origin feat/amazing-feature
) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE for more information.
Reference
- Daniel Lemire, Nathan Kurz, Christoph Rupp, Stream VByte: Faster Byte-Oriented Integer Compression, Information Processing Letters 130, 2018.
Contact
Author
- HSING-HAN, WU (Xyphuz)
- Mail me: xyphuzwu@gmail.com
- About me: https://www.xyphuz.com
- GitHub: https://github.com/wst24365888
Project Link
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file libstreamvbyte-0.2.4.tar.gz
.
File metadata
- Download URL: libstreamvbyte-0.2.4.tar.gz
- Upload date:
- Size: 763.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64e5deb067d8cfb55312dbaeb5673e739f6cbf4d403f4205c5a1540654ebe332 |
|
MD5 | f7847f5c3efcee87a0483b4e8bd979da |
|
BLAKE2b-256 | 4face648b0f5aec63c071796ebcee1dd06ba76910c3415eceb03d80303b0752f |
File details
Details for the file libstreamvbyte-0.2.4-cp311-cp311-win_amd64.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 59.4 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5135407525db92afdd84c0229b588c986bc0e8361aae2b98c97b29bcfbaa5cb |
|
MD5 | 1370159ef6f091bc9e333b4a80e1330c |
|
BLAKE2b-256 | 17c015890010ea57d0a9f30f6c744cd22d77b5c4c53dadfd36f7e0dba66de236 |
File details
Details for the file libstreamvbyte-0.2.4-cp311-cp311-win32.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp311-cp311-win32.whl
- Upload date:
- Size: 47.3 kB
- Tags: CPython 3.11, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf266490dd9471f8e4c07cffa328075ba18af96f80e75563f1527f07abc0bb61 |
|
MD5 | 320aab8ad22ca5af0a92ad47a9961c1f |
|
BLAKE2b-256 | ed7a5c917d3236858f672f9087375e4cad78c81a87d87a5f86e41b1d31c5f0ea |
File details
Details for the file libstreamvbyte-0.2.4-cp311-cp311-musllinux_1_1_x86_64.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp311-cp311-musllinux_1_1_x86_64.whl
- Upload date:
- Size: 606.0 kB
- Tags: CPython 3.11, musllinux: musl 1.1+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11ee18c104d899d0dc1705dd58a2fe5b0867843854aeb401ff39e08a0ac09587 |
|
MD5 | 415743ca66f0593d0356e31892981249 |
|
BLAKE2b-256 | aa460942faef59a9d93252922bc0ee60f70d052208216036d1153c0a1bcf10ec |
File details
Details for the file libstreamvbyte-0.2.4-cp311-cp311-musllinux_1_1_i686.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp311-cp311-musllinux_1_1_i686.whl
- Upload date:
- Size: 662.1 kB
- Tags: CPython 3.11, musllinux: musl 1.1+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96d05338f2e336a7a0bfa3790774ec6969ba0f32ef774d2768fc636271496084 |
|
MD5 | 1f0adacae9028e8335a2e2b121f8c787 |
|
BLAKE2b-256 | aea09c942bb5673dc0bb1f64884f4f44ef7b31b4df97092c8bda0eb7fa653fd8 |
File details
Details for the file libstreamvbyte-0.2.4-cp311-cp311-musllinux_1_1_aarch64.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp311-cp311-musllinux_1_1_aarch64.whl
- Upload date:
- Size: 587.2 kB
- Tags: CPython 3.11, musllinux: musl 1.1+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d13a8ae190d40e438549823348567fd947fafc41910c2fecd95f30048cfda901 |
|
MD5 | fe96fc7bb57602a03b5d2eb48cc92551 |
|
BLAKE2b-256 | d2f448cfab2c914d40b54e4123283d2ee444cc94a2fe0b9456e33e4ce81b2703 |
File details
Details for the file libstreamvbyte-0.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 81.5 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9a2da3bebce600952fe802a28591f60226331c214ebc4d4fcfde2ab2113d5d3 |
|
MD5 | 05fee654230939af2f9cd790f9150030 |
|
BLAKE2b-256 | ef52ca191688def8184731d10e5b2229dcb3040cc13c61d194cf06ef0bafc73a |
File details
Details for the file libstreamvbyte-0.2.4-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 86.4 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e66c8802914003316cf7441de71fbf79cc7101dcb04ecc8018d80a4a1672f4f |
|
MD5 | dd70d18d3a04f886cbefba41fdd105ae |
|
BLAKE2b-256 | a23914915e0b2ceef5904f337265460da6e774f6f150b81642083743d2acc79e |
File details
Details for the file libstreamvbyte-0.2.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 76.0 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c5d2700073cd32f43a9dd2e09019c8c4a2ad3bc6500f1977afea5b00950e877 |
|
MD5 | 14e1725517e2b18bfaf837109dc3256a |
|
BLAKE2b-256 | 841424cbe392471df08980940fd51b542959aab76fd1519371ae82fa340b8099 |
File details
Details for the file libstreamvbyte-0.2.4-cp311-cp311-macosx_10_9_universal2.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp311-cp311-macosx_10_9_universal2.whl
- Upload date:
- Size: 103.7 kB
- Tags: CPython 3.11, macOS 10.9+ universal2 (ARM64, x86-64)
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3d96817d3fa3c9d78bb33360762c710ea80c5341ac979c0fb7a7704ad99bd40 |
|
MD5 | 942fdc01a3308e6d2c8d0bddc008c3d5 |
|
BLAKE2b-256 | 78dcd5a3b84d14943d65eb986d1d9356722a819b676323217ce8dddb01ec7520 |
File details
Details for the file libstreamvbyte-0.2.4-cp310-cp310-win_amd64.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 59.4 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 985eb602745dfa3bba6dfd420d5713fdf5b82c2db9c77608140456fe73d8bd4a |
|
MD5 | 92a2962e5a547752b44ee4b16aa8faca |
|
BLAKE2b-256 | 04891692cea66ff14ea9fdc773d52836357d0616ff10e956cc2a2698615c8e19 |
File details
Details for the file libstreamvbyte-0.2.4-cp310-cp310-win32.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp310-cp310-win32.whl
- Upload date:
- Size: 47.3 kB
- Tags: CPython 3.10, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | de8fcf720f93616d89ec706087b1e86aa023f3f78feeff24e41fda25ca305766 |
|
MD5 | 617d141205947aa41ffd589e2692199b |
|
BLAKE2b-256 | d24461a36a0805c37d5ec1ce67dcb0caa5d43c4d840c41564035af4b929373a3 |
File details
Details for the file libstreamvbyte-0.2.4-cp310-cp310-musllinux_1_1_x86_64.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp310-cp310-musllinux_1_1_x86_64.whl
- Upload date:
- Size: 606.0 kB
- Tags: CPython 3.10, musllinux: musl 1.1+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c15d45e647f8d398d463c1d4d78bc86a054d09ce92dcb63608d6ebea7c49b33 |
|
MD5 | a07290a3a715fa28308552d27092d37f |
|
BLAKE2b-256 | d069542d3778621eae4609f512a98612582f91c5621ac065f6e2a83274f3065a |
File details
Details for the file libstreamvbyte-0.2.4-cp310-cp310-musllinux_1_1_i686.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp310-cp310-musllinux_1_1_i686.whl
- Upload date:
- Size: 662.1 kB
- Tags: CPython 3.10, musllinux: musl 1.1+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89ba92ccb7480cf2ce2d44bb1abb4befc5ee6e39cdd78441f27d60f093c5a332 |
|
MD5 | a5a648d6eb9f950b7d74f906326dbb2a |
|
BLAKE2b-256 | b9d656721d6af494bb62f37e4b282aff8fa360a5e033f6aba8f0c3e44b0e58bd |
File details
Details for the file libstreamvbyte-0.2.4-cp310-cp310-musllinux_1_1_aarch64.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp310-cp310-musllinux_1_1_aarch64.whl
- Upload date:
- Size: 587.2 kB
- Tags: CPython 3.10, musllinux: musl 1.1+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c07a23850f7f489cacf79c537a50abc2b4696f1ff4baa2d1675fc916dedcd00d |
|
MD5 | 31612baa0fb204374eeb20371148317f |
|
BLAKE2b-256 | eb45fec2249e76beaff3eb3eaf466915122a763f2bd7d5a76258e6aed8751494 |
File details
Details for the file libstreamvbyte-0.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 81.5 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 090163c767a729f948b8b5f054566176fc739c13d319cd56833d911b03fefa95 |
|
MD5 | 0349d551e7eb5d9feb8da4012649b827 |
|
BLAKE2b-256 | 8a0ed886ffc3196bbd51031cde72ac02b5adf627e6732ba8853e0de8326215df |
File details
Details for the file libstreamvbyte-0.2.4-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 86.4 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46b614e86d705218fb4b333f31da06359ddfca497f4f3f3a57b5f9b0259ab195 |
|
MD5 | 57a3f920652c8b92b332a864d271d1c3 |
|
BLAKE2b-256 | d925fe7e47bbe4028f8e1f0664a6d14b410920e2028094ab6dd6742206eadf8f |
File details
Details for the file libstreamvbyte-0.2.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 76.1 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c70a35490de72be1cd85b741e3e93b02219f2e23f3b322465abad5171df48d75 |
|
MD5 | 7f468c8b7147db18bdc334766819356d |
|
BLAKE2b-256 | 884c2f767273767870d85acf5e1d7c29b91f4f78bc891793ca7efb26966f64b3 |
File details
Details for the file libstreamvbyte-0.2.4-cp310-cp310-macosx_10_9_universal2.whl
.
File metadata
- Download URL: libstreamvbyte-0.2.4-cp310-cp310-macosx_10_9_universal2.whl
- Upload date:
- Size: 103.7 kB
- Tags: CPython 3.10, macOS 10.9+ universal2 (ARM64, x86-64)
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef8aca64e1752079d12ed1395e8b43070898fb2228484caa8fbd072cf73e16f1 |
|
MD5 | c0e53328e90b0cca2cc2e467e325063b |
|
BLAKE2b-256 | b7bc3ee48a8ec06593cb8b8b3137384ee87a4aa5076ffc4903840767d12582e9 |