Skip to main content

Roaring Bitmap

Project description

A roaring bitmap is an efficient compressed datastructure to store a set of integers. A Roaring bitmap stores a set of 32-bit integers in a series of arrays and bitmaps, whichever takes the least space (which is always 2 ** 16 bits or less).

This datastructure is useful for storing a large number of integers, e.g., for an inverted index used in search indexes and databases. In particular, it is possible to quickly compute the intersection of a series of sets, which can be used to implement a query as the conjunction of subqueries.

This implementation is mostly a translation from the Java implementation at https://github.com/lemire/RoaringBitmap

An additional feature of this implementation is that it uses arrays not only when a block contains less than 2 ** 12 elements, but also when it contains more than 2 ** 32 - 2 ** 12 elements; i.e., blocks that are mostly full are stored just as compactly as blocks that are mostly empty. Other blocks are encoded as bitmaps of fixed size. This trick is based on the implementation in Lucene, cf. https://issues.apache.org/jira/browse/LUCENE-5983

Requirements

Installation

$ make

Usage

A RoaringBitmap() can be used as a replacement for a normal (mutable) Python set containing (unsigned) 32-bit integers:

>>> from roaringbitmap import RoaringBitmap
>>> RoaringBitmap(range(10)) & RoaringBitmap(range(5, 15))
RoaringBitmap({5, 6, 7, 8, 9})

Benchmarks

Output of $ python benchmarks.py:

sparse set
100 runs with sets of 200 random elements n s.t. 0 <= n < 40000
            set()  RoaringBitmap()    ratio
init      0.00217          0.00941    0.231
and       0.00116         0.000166     6.97
or        0.00189         0.000255     7.42
xor       0.00171         0.000231      7.4
sub       0.00104         0.000166     6.26
eq       0.000513         0.000487     1.05
neq      9.06e-06          3.7e-05    0.245

dense set / high load factor
100 runs with sets of 39800 random elements n s.t. 0 <= n < 40000
            set()  RoaringBitmap()    ratio
init        0.294             1.16    0.252
and         0.217         0.000246      883
or          0.427         0.000262     1628
xor         0.391          0.00024     1629
sub          0.16         0.000234      682
eq         0.0569          0.00741     7.67
neq      8.82e-06         4.51e-05    0.196

medium load factor
100 runs with sets of 59392 random elements n s.t. 0 <= n < 118784
            set()  RoaringBitmap()    ratio
init        0.481             1.96    0.246
and           0.6         0.000478     1255
or          0.964         0.000478     2015
xor         0.862         0.000487     1769
sub         0.341         0.000485      703
eq          0.116            0.017     6.83
neq      1.22e-05         4.98e-05    0.244

References

Samy Chambi, Daniel Lemire, Owen Kaser, Robert Godin (2014), Better bitmap performance with Roaring bitmaps, http://arxiv.org/abs/1402.6407

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roaringbitmap-0.3.0.tar.gz (117.7 kB view details)

Uploaded Source

File details

Details for the file roaringbitmap-0.3.0.tar.gz.

File metadata

File hashes

Hashes for roaringbitmap-0.3.0.tar.gz
Algorithm Hash digest
SHA256 aac342239dacc0fa5661e63a68ecd3619cb3bf6f47257f68a975baa1555009c0
MD5 060edc8b8996284adb0a5cc2ca3c3342
BLAKE2b-256 5782e3cfc44fd7f9ad0f448db289e199bcbeb29a05cb9933ae3489441c9874b1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page