Skip to main content

Provides a django wrapper for postgresql-hll library by CitusData

Project description

django-pg-hll

Provides a django wrapper for postgresql-hll library by CitusData

Requirements

  • Python Python 3.5+
  • django >= 1.9 (tested 2.0+)
  • pytz
  • six
  • typing
  • psycopg2
  • PostgreSQL 9.4+ (tested 9.5+)

Installation

Install via pip:
pip install django-pg-hll
or via setup.py:
python setup.py install

Usage

Prerequisites

Install postgresql-hll extension

Creating table with hll field

  • Add HllField to your model:
    from django.db import models
    from django_pg_hll import HllField
    
    class MyModel(models.Model):
        hll = HllField()
    
  • Call makemigrations to create a migration
  • Call migrate to apply migration.

Hll values

In order to create and update Hll this library introduces a set of functions (corresponding to postgres-hll hash functions), to hash values:

from django_pg_hll import HllField

# Empty hll
HllEmpty()

# Empty hll with custom configuration parameters
# hll_empty([log2m[, regwidth[, expthresh[, sparseon]]]])
HllEmpty(13, 2, 1, 0)

# Hash from boolean
HllBoolean(True)

# Hash from integer with different ranges
HllSmallInt(1)
HllInteger(65540)
HllBigint(2147483650)

# Hash from bytes sequence
HllByteA(b'test')

# Hash from text
HllText('test')

# Auto detection of type by postgres-hll
HllAny('some data')

To save a value to HllField, you can pass any of these functions as a value:

from django_pg_hll import HllInteger

instance = MyModel.objects.create(hll=HllInteger(123))
instance.hll |= HllInteger(456)
instance.save()

Chaining hll values

Hll values can be chained with each other and functions like django.db.models.F using | operator.
The chaining result will be django_pg_hll.values.HllSet instance, which can be also saved to database.
You can also chain simple values and iterables. In this case, library will try to detect appropriate hashing function, based on value.
Important: Native django functions can't be used as chain start, as | operator is redeclared for HllValue instances.
Example:

from django_pg_hll import HllInteger
from django.db.models import F

instance = MyModel.objects.create(hll=HllInteger(123))

# This works
instance.hll |= HllInteger(456)
instance.hll = HllInteger(456) | F('hll')
instance.hll |= 789  # HllSmallInt will be used
instance.hll |= 100500  # HllInteger will be used
instance.hll |= True  # HllBoolean will be used
instance.hll |= {1, 2, 3, 4, 5}  # set. HllSmallInt will be used.

# This throws exception, as F function doesn't support bitor operator
instance.hll = F('hll') | HllInteger(456)

Hashing seed

You can pass hash_seed optional argument to any HllValue, expecting data.
Look here for more details about hashing.

Filtering QuerySet

HllField realizes several lookups (returning float value) in order to make filtering easier:

# Equality
MyModel.objects.filter(hll=HllInteger(1)).count()
MyModel.objects.exclude(hll=HllInteger(2)).count()

# Cardinality
MyModel.objects.filter(hll__cardinality=3).count()

# Configuration lookups
MyModel.objects.filter(hll__schema_version=1).count()
MyModel.objects.filter(hll__type=1).count()
MyModel.objects.filter(hll__log2m=11).count()
MyModel.objects.filter(hll__regwidth=2).count()
MyModel.objects.filter(hll__sparseon=1).count()

Aggregate functions

In order to count aggregations and annotations, library provides aggregate functions:

  • django_pg_hll.aggregate.Cardinality Counts cardinality of hll field
  • django_pg_hll.aggregate.UnionAgg Aggregates multiple hll fields to one hll.
  • django_pg_hll.aggregate.UnionAggCardinality Counts cardinality of hll, combined by UnionAgg function. In fact, it does Cardinality(UnionAgg(hll)).
    P. s. django doesn't give ability to use function inside function.
  • django_pg_hll.aggregate.CardinalitySum Counts sum of multiple rows hll cardinalities. In fact, it does Sum(Cardinality(hll)).
    P. s. django doesn't give ability to use function inside function.
from django.db import models
from django_pg_hll.aggregate import Cardinality, UnionAggCardinality, CardinalitySum
from django_pg_hll.fields import HllField
from django_pg_hll.values import HllInteger


class ForeignModel(models.Model):
    pass


class MyModel(models.Model):
    hll = HllField()
    fk = models.ForeignKey(ForeignModel)

MyModel.objects.bulk_create([
   MyModel(fk=1, hll=HllInteger(1)),
   MyModel(fk=2, hll=HllInteger(2) | HllInteger(3) | HllInteger(4)),
   MyModel(fk=3, hll=HllInteger(4))
])

MyModel.objects.annotate(card=Cardinality('hll_field')).values_list('id', 'card')
# outputs (1, 1), (2, 3), (3, 1)

# Count cardinality for hll, build by union of all rows
# 4 element exists in rows with fk=2 and fk=3. After union it gives single result 
ForeignModel.objects.annotate(card=UnionAggCardinality('testmodel__hll_field')).values_list('card', flat=True)
# outputs [4]

# Count sum of cardinalities for each row
ForeignModel.objects.annotate(card=CardinalitySum('testmodel__hll_field')).values_list('card', flat=True)
# outputs [5]

Configuration aggregate functions

In order to get hll field creation parameters, library provides aggregate functions:

  • django_pg_hll.aggregate.HllSchemaVersion Returns the schema version value (integer) of the hll

  • django_pg_hll.aggregate.HllType Returns the schema version-specific type value (integer) of the hll. See the storage specification (v1.0.0) for more details.

  • django_pg_hll.aggregate.HllRegWidth Returns the register bit-width (integer) of the hll

  • django_pg_hll.aggregate.HllLog2M Returns the log-base-2 of the number of registers of the hll. If the hll is not of type FULL or SPARSE it returns the log2m value which would be used if the hll were promoted.

  • django_pg_hll.aggregate.HllExpThreshold Returns an array with 2 elements of the specified and effective EXPLICIT promotion cutoffs for the hll. The specified cutoff and the effective cutoff will be the same unless expthresh has been set to 'auto' (-1). In that case the specified value will be -1 and the effective value will be the implementation-dependent number of explicit values that will be stored before an EXPLICIT hll is promoted.

  • django_pg_hll.aggregate.HllSParseOn Returns 1 if the SPARSE representation is enabled for the hll, and 0 otherwise

from django.db import models
from django_pg_hll.aggregate import HllLog2M
from django_pg_hll.fields import HllField
from django_pg_hll.values import HllEmpty, HllInteger


class MyModel(models.Model):
    default_hll = HllField()
    configured_hll = HllField(log2m=13, regwidth=2, expthresh=1, sparseon=0)

MyModel.objects.create(fk=1, hll=HllInteger(1), configured_hll=HllEmpty(13, 2, 1, 0))

MyModel.objects.annotate(log2m=HllLog2M('default_hll'), log2m_conf=HllLog2M('configured_hll')). \
    values_list('log2m', 'log2m_conf')
# outputs (11, 13)

django-pg-bulk-update integration

This library provides a hll_concat set function, allowing to use hll in bulk_update and bulk_update_or_create queries.

MyModel.objects.bulk_update_or_create([
    {'id': 100501, 'hll_field': HllInteger(1)},
    {'id': 100502, 'hll_field': HllInteger(2) | HllInteger(3)}
    ], set_functions={'hll_field': 'hll_concat'}
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for django-pg-hll, version 2.0.0
Filename, size File type Python version Upload date Hashes
Filename, size django_pg_hll-2.0.0-py2.py3-none-any.whl (11.4 kB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size django-pg-hll-2.0.0.tar.gz (11.7 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page