Provides a django wrapper for postgresql-hll library by CitusData
Project description
django-pg-hll
Provides a django wrapper for postgresql-hll library by CitusData
Requirements
- Python 2.7 or Python 3.4+
- django >= 1.9
- pytz
- six
- typing
- psycopg2
- PostgreSQL 9.4+
Installation
Install via pip:
pip install django-pg-hll
or via setup.py:
python setup.py install
Usage
Prerequisites
Install postgresql-hll extension
Creating table with hll field
- Add HllField to your model:
from django.db import models from django_pg_hll import HllField class MyModel(models.Model): hll = HllField()
- Call makemigrations to create a migration
- Call migrate to apply migration.
Hll values
In order to create and update Hll this library introduces a set of functions (corresponding to postgres-hll hash functions), to hash values:
from django_pg_hll import HllField
# Empty hll
HllEmpty()
# Hash from boolean
HllBoolean(True)
# Hash from integer with different ranges
HllSmallInt(1)
HllInteger(65540)
HllBigint(2147483650)
# Hash from bytes sequence
HllByteA(b'test')
# Hash from text
HllText('test')
# Auto detection of type by postgres-hll
HllAny('some data')
To save a value to HllField, you can pass any of these functions as a value:
from django_pg_hll import HllInteger
instance = MyModel.objects.create(hll=HllInteger(123))
instance.hll |= HllInteger(456)
instance.save()
Chaining hll values
Hll values can be chained with each other and functions like django.db.models.F
using |
operator.
The chaining result will be django_pg_hll.values.HllSet
instance, which can be also saved to database.
You can also chain simple values and iterables.
In this case, library will try to detect appropriate hashing function, based on value.
Important: Native django functions can't be used as chain start, as |
operator is redeclared for HllValue instances.
Example:
from django_pg_hll import HllInteger
from django.db.models import F
instance = MyModel.objects.create(hll=HllInteger(123))
# This works
instance.hll |= HllInteger(456)
instance.hll = HllInteger(456) | F('hll')
instance.hll |= 789 # HllSmallInt will be used
instance.hll |= 100500 # HllInteger will be used
instance.hll |= True # HllBoolean will be used
instance.hll |= {1, 2, 3, 4, 5} # set. HllSmallInt will be used.
# This throws exception, as F function doesn't support bitor operator
instance.hll = F('hll') | HllInteger(456)
Hashing seed
You can pass hash_seed
optional argument to any HllValue, expecting data.
Look here for more details about hashing.
Filtering QuerySet
HllField realizes cardinality
lookup (returning integer value) in order to make filtering easier:
MyModel.objects.filter(hll__cardinality=3).count()
Aggregate functions
In order to count aggregations and annotations, library provides 4 aggregate functions:
django_pg_hll.aggregate.Cardinality
Counts cardinality of hll fielddjango_pg_hll.aggregate.UnionAgg
Aggregates multiple hll fields to one hll.django_pg_hll.aggregate.UnionAggCardinality
Counts cardinality of hll, combined by UnionAgg function. In fact, it doesCardinality(UnionAgg(hll))
.
P. s. django doesn't give ability to use function inside function.django_pg_hll.aggregate.CardinalitySum
Counts sum of multiple rows hll cardinalities. In fact, it doesSum(Cardinality(hll))
.
P. s. django doesn't give ability to use function inside function.
from django.db import models
from django_pg_hll import HllField, HllInteger
from django_pg_hll.aggregate import Cardinality, UnionAggCardinality, CardinalitySum
class ForeignModel(models.Model):
pass
class MyModel(models.Model):
hll = HllField()
fk = models.ForeignKey(ForeignModel)
MyModel.objects.bulk_create([
MyModel(fk=1, hll=HllInteger(1)),
MyModel(fk=2, hll=HllInteger(2) | HllInteger(3) | HllInteger(4)),
MyModel(fk=3, hll=HllInteger(4))
])
MyModel.objects.annotate(card=Cardinality('hll_field')).values_list('id', 'card')
# outputs (1, 1), (2, 3), (3, 1)
# Count cardinality for hll, build by union of all rows
# 4 element exists in rows with fk=2 and fk=3. After union it gives single result
ForeignModel.objects.annotate(card=UnionAggCardinality('testmodel__hll_field')).values_list('card', flat=True)
# outputs [4]
# Count sum of cardinalities for each row
ForeignModel.objects.annotate(card=CardinalitySum('testmodel__hll_field')).values_list('card', flat=True)
# outputs [5]
django-pg-bulk-update integration
This library provides a hll_concat
set function,
allowing to use hll in bulk_update
and bulk_update_or_create
queries.
# !!! Don't forget to import function, or django_pg_bulk_update will not find it
from django_pg_hll.bulk_update import HllConcatFunction
MyModel.objects.bulk_update_or_create([
{'id': 100501, 'hll_field': HllInteger(1)},
{'id': 100502, 'hll_field': HllInteger(2) | HllInteger(3)}
], set_functions={'hll_field': 'hll_concat'}
)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for django_pg_hll-1.2.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 232d0b5b4ea369e5fc23b0ac7ea5d0f2a738664d11ccd4d88a95efbcda04429a |
|
MD5 | 375a325d8f8413ea138fe8d37daab182 |
|
BLAKE2b-256 | 3a49b38de1a4b5e9c74966a9336c6ccb0b62904e34129d097ecdc2c6ad8ea41a |