Fast and lightweight set for unsigned 32 bits integers.
Project description
An efficient and light-weight ordered set of 32 bits integers. This is a Python wrapper for the C library CRoaring.
The wrapping used to be done with Ctypes. We recently switched to Cython for the following reasons:
Much better performances for short function calls (e.g. addition/deletion of a single element).
Easier installation, no need to install manually the C library, it is now distributed with PyRoaring.
Extensibility, it will be easier to write efficient codes with Cython (e.g. some data structures based on roaring bitmaps).
If for some reason you wish to keep using the old version, based on Ctypes, use PyRoaring 0.0.7.
Requirements
Environment like Linux and MacOS
Python 2.7, or Python 3.3 or better
A recent C compiler like GCC
The Python package hypothesis (optional, for testing)
The Python package Cython (optional, for compiling pyroaring from the sources)
Installation
To install pyroaring and the CRoaring library on your local account, use the following two lines:
pip install pyroaring --user # installs PyRoaringBitMap
To install them system-wide, use the following lines :
pip install pyroaring
Naturally, the latter may require superuser rights (consider prefixing the commands by sudo).
(If you want to use Python 3 and your system defaults on Python 2.7, you may need to adjust the above commands, e.g., replace pip by pip3 and python by python3.)
Manual compilation / installation
If you want to compile (and install) pyroaring by yourself, for instance to modify the Cython sources or because you do not have pip, follow these steps. Note that the Python package Cython is required.
Clone this repository.
git clone https://github.com/Ezibenroc/PyRoaringBitMap.git
cd PyRoaringBitMap
Build pyroaring locally, e.g. to test a new feature you made.
python setup.py build_ext -i
python test.py # run the tests, optionnal but recommended
Install pyroaring (use this if you do not have pip).
python setup.py install # may require superuser rights, add option --user if you wish to install it on your local account
Package pyroaring.
python setup.py sdist
pip install dist/pyroaring-0.1.?.tar.gz # optionnal, to install the package
Utilization
First, you can run the tests to make sure everything is ok:
pip install hypothesis --user
python test.py
You can use a bitmap nearly as the classical Python set in your code:
from pyroaring import BitMap
bm1 = BitMap()
bm1.add(3)
bm1.add(18)
bm2 = BitMap([3, 27, 42])
print("bm1 = %s" % bm1)
print("bm2 = %s" % bm2)
print("bm1 & bm2 = %s" % (bm1&bm2))
print("bm1 | bm2 = %s" % (bm1|bm2))
Output:
bm1 = BitMap([3, 18]) bm2 = BitMap([3, 27, 42]) bm1 & bm2 = BitMap([3]) bm1 | bm2 = BitMap([3, 18, 27, 42])
Benchmark
The built-in set is compared with pyroaring, a Cython implementation of Roaring bitmaps called roaringbitmap, and a Python implemenntation of ordered sets called sortedcontainers.
Quick benchmarks for common operations
The script quick_bench.py measures the time of different set operations. It uses randomly generated sets of size 1e6 and density 0.125. For each operation, the average time (in seconds) of 30 tests is reported.
The results have been obtained with:
CPython version 3.5.2
pyroaring commit 2ccc0bf6e428d771061499440bcde0ca0fa2a946
roaringbitmap commit 95ca18f902389464d15732d35de7be868d660ed2
sortedcontainers version 1.5.7
operation |
pyroaring |
roaringbitmap |
set |
sortedcontainers |
---|---|---|---|---|
range constructor |
1.39e-04 |
1.35e-04 |
5.41e-02 |
1.85e-01 |
ordered list constructor |
3.27e-02 |
1.61e-01 |
1.64e-01 |
5.09e-01 |
list constructor |
1.23e-01 |
1.55e-01 |
9.76e-02 |
4.64e-01 |
ordered array constructor |
3.69e-03 |
2.81e-01 |
8.10e-02 |
2.83e-01 |
array constructor |
1.17e-01 |
4.95e-01 |
1.47e-01 |
4.97e-01 |
element addition |
1.79e-07 |
1.77e-07 |
1.39e-07 |
9.39e-07 |
element removal |
1.81e-07 |
1.73e-07 |
1.27e-07 |
3.09e-07 |
membership test |
8.99e-08 |
1.15e-07 |
9.50e-08 |
4.23e-07 |
union |
1.80e-04 |
1.61e-04 |
1.47e-01 |
9.15e-01 |
intersection |
8.99e-04 |
9.12e-04 |
4.61e-02 |
1.45e-01 |
difference |
1.92e-04 |
1.60e-04 |
1.26e-01 |
4.24e-01 |
symmetric diference |
1.81e-04 |
1.60e-04 |
1.81e-01 |
7.71e-01 |
equality test |
7.99e-05 |
6.40e-05 |
1.93e-02 |
1.86e-02 |
subset test |
8.22e-05 |
8.26e-05 |
1.84e-02 |
1.83e-02 |
conversion to list |
4.39e-02 |
4.19e-02 |
5.40e-02 |
5.05e-02 |
pickle dump & load |
6.51e-04 |
6.23e-04 |
2.42e-01 |
4.47e-01 |
“naive” conversion to array |
4.83e-02 |
4.65e-02 |
1.06e-01 |
1.06e-01 |
“optimized” conversion to array |
1.37e-03 |
nan |
nan |
nan |
selection |
9.29e-07 |
5.77e-07 |
nan |
1.34e-05 |
slice |
5.34e-02 |
1.42e-01 |
nan |
7.63e-01 |
Complete benchmark for the union
The performances of the union operation have been measured more carefully. Full results can be found here.
To sum up, both Roaring bitmap implementations are several orders of magnitude faster than the built-in set, regardless of the density of the data.
For sparse data, pyroaring is faster than cyroaring, for very dense data cyroaring is faster. Otherwise, they are similar.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.