Roaring Bitmap
Project description
A roaring bitmap is an efficient compressed datastructure to store a set of integers. A Roaring bitmap stores a set of 32-bit integers in a series of arrays and bitmaps, whichever takes the least space (which is always 2 ** 16 bits or less).
This datastructure is useful for storing a large number of integers, e.g., for an inverted index used in search indexes and databases. In particular, it is possible to quickly compute the intersection of a series of sets, which can be used to implement a query as the conjunction of subqueries.
This implementation is mostly a translation from the Java implementation at https://github.com/lemire/RoaringBitmap
An additional feature of this implementation is that it uses arrays not only when a block contains less than 2 ** 12 elements, but also when it contains more than 2 ** 32 - 2 ** 12 elements; i.e., blocks that are mostly full are stored just as compactly as blocks that are mostly empty. Other blocks are encoded as bitmaps of fixed size. This trick is based on the implementation in Lucene, cf. https://issues.apache.org/jira/browse/LUCENE-5983
Requirements
Python 2.7+/3 http://www.python.org (headers required, e.g. python-dev package)
Cython 0.20+ http://www.cython.org
Installation
$ make
Usage
A RoaringBitmap() can be used as a replacement for a normal (mutable) Python set containing (unsigned) 32-bit integers:
>>> from roaringbitmap import RoaringBitmap >>> RoaringBitmap(range(10)) & RoaringBitmap(range(5, 15)) RoaringBitmap({5, 6, 7, 8, 9})
Benchmarks
Output of $ python benchmarks.py:
sparse set 100 runs with sets of 200 random elements n s.t. 0 <= n < 40000 set() RoaringBitmap() ratio init 0.00217 0.00941 0.231 and 0.00116 0.000166 6.97 or 0.00189 0.000255 7.42 xor 0.00171 0.000231 7.4 sub 0.00104 0.000166 6.26 eq 0.000513 0.000487 1.05 neq 9.06e-06 3.7e-05 0.245 dense set / high load factor 100 runs with sets of 39800 random elements n s.t. 0 <= n < 40000 set() RoaringBitmap() ratio init 0.294 1.16 0.252 and 0.217 0.000246 883 or 0.427 0.000262 1628 xor 0.391 0.00024 1629 sub 0.16 0.000234 682 eq 0.0569 0.00741 7.67 neq 8.82e-06 4.51e-05 0.196 medium load factor 100 runs with sets of 59392 random elements n s.t. 0 <= n < 118784 set() RoaringBitmap() ratio init 0.481 1.96 0.246 and 0.6 0.000478 1255 or 0.964 0.000478 2015 xor 0.862 0.000487 1769 sub 0.341 0.000485 703 eq 0.116 0.017 6.83 neq 1.22e-05 4.98e-05 0.244
References
Samy Chambi, Daniel Lemire, Owen Kaser, Robert Godin (2014), Better bitmap performance with Roaring bitmaps, http://arxiv.org/abs/1402.6407
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file roaringbitmap-0.1.tar.gz
.
File metadata
- Download URL: roaringbitmap-0.1.tar.gz
- Upload date:
- Size: 22.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6f0686761ca93c5078e0c1e775b1501e91d766c26be0c7860b6f386937ede51 |
|
MD5 | d91f34a59235834c5e26701c57211298 |
|
BLAKE2b-256 | 8390aa8c6918c69bfa0e74c337e20d968eb4a82583a9ecdf3de1af6fa2f7f7b3 |