Skip to main content

A Cython fast number set based on bitfields

Project description

sparsebitfield

This is a fork of https://github.com/stestagg/bitfield which has been adapted to be efficient with sparse bitfields and large numbers. The API is the same but support for Python 2 has been dropped.

WARNING : The serialisation mechanism isn't portable at the moment.

Installation

$ sudo pip3 install sparsebitfield

Usage

>>> import sparsebitfield
>>> field = sparsebitfield.SparseBitfield()
>>> field.add(100)
>>> print(list(field))
[100]
>>> second = sparsebitfield.SparseBitfield([2, 100])
>>> list(field | second)
[2, 100]

>>> second.add(10000)
>>> second.pickle()
b'BZ:x\x9c\xed\xce\xc1\t\x00 \x0c\x04\xb0+8@\xf7\x9f\xd6\x87\x0f7P(\xc9\x04I\x8eZ\xb9:\x00\x93\xd4\xef\x00\x00\x00\x00\x00\x00\x00<\xb3\x01\xda\x86\x00\x17'

>>> import random
>>> large=sparsebitfield.SparseBitfield(random.sample(range(1000000), 500000)) # 500,000 items, randomly distributed
>>> len(large)
500000
>>> len(large.pickle())
125269  # 122KB

>>> large=sparsebitfield.SparseBitfield(range(1000000)) # 1 million items, all sequential
>>> len(large)
1000000
>>> len(large.pickle())
69 # <100 bytes

Sparse bitfields support most of the same operations/usage as regular sets, see the tests for examples.

Design

Sparsebitfield was designed to efficiently handle tracking large sets of items.

The main design goals were:

  • Space-efficient serialisation format
  • Fast membership tests and set differences
  • Space-efficent handling of large sparse bitfields
  • Support for large integers (>2**64)

Internally, sparsebitfield achieves this by using a 1-d bitmap split into pages. These pages are organised as a sorted list.

Within a page, a number is recorded as being present in the set by setting the n-th bit to 1. I.e. the set([1]) is recorded as ...00000010b, while set([1,4]) would be ...00010010b.

If a particular page is empty (no set members in that range) or full, then the bitfield is discarded, and represented by an EMPTY or FULL flag. Pages which haven not been written to don't take up any memory at all. Also empty pages are not included in the pickled data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparsebitfield-0.2.5.tar.gz (101.0 kB view details)

Uploaded Source

File details

Details for the file sparsebitfield-0.2.5.tar.gz.

File metadata

  • Download URL: sparsebitfield-0.2.5.tar.gz
  • Upload date:
  • Size: 101.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.28.1 setuptools/49.1.3 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.16

File hashes

Hashes for sparsebitfield-0.2.5.tar.gz
Algorithm Hash digest
SHA256 305f91cf146f94320331e6364f1f66c9d0e13fee0c97ecd377f3af5963d28a7f
MD5 df4d1cff6e5b7518ed86fab13e69d385
BLAKE2b-256 1abf8db337391419a2ca0fbeabfd6f7138b6c26dccbd1409816e9de5c93fa48d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page