Django clickHouse database backend.
Project description
Django ClickHouse Database Backend
Django clickhouse backend is a django database backend for clickhouse database. This project allows using django ORM to interact with clickhouse.
Thanks to clickhouse driver, django clickhouse backend use it as DBAPI. Thanks to clickhouse pool, it makes clickhouse connection pool.
features:
- Support Clickhouse native interface and connection pool.
- Define clickhouse specific schema features such as Engine and Index in django ORM.
- Support table migrations.
- Support creating test database and table, working with django TestCase and pytest-django.
- Support most types of query and data types, full feature is under developing.
- Support SETTINGS in SELECT Query.
Get started
Installation
pip install django-clickhouse-backend
or
git clone https://github.com/jayvynl/django-clickhouse-backend
cd django-clickhouse-backend
python setup.py install
Configuration
Only ENGINE
is required, other options have default values.
-
ENGINE: required, set to
clickhouse_backend.backend
. -
NAME: database name, default
default
. -
HOST: database host, default
localhost
. -
PORT: database port, default
9000
. -
USER: database user, default
default
. -
PASSWORD: database password, default empty.
DATABASES = { 'default': { 'ENGINE': 'clickhouse_backend.backend', 'NAME': 'default', 'HOST': 'localhost', 'USER': 'DB_USER', 'PASSWORD': 'DB_PASSWORD', 'TEST': { 'fake_transaction': True } } } DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'
DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'
IS REQUIRED TO WORKING WITH DJANGO MIGRATION.
More details will be covered in [Primary key](#Primary key).
Model
from django.db import models
from django.utils import timezone
from clickhouse_backend import models as chm
from clickhouse_backend.models import indexes, engines
class Event(chm.ClickhouseModel):
src_ip = chm.GenericIPAddressField(default='::')
sport = chm.PositiveSmallIntegerField(default=0)
dst_ip = chm.GenericIPAddressField(default='::')
dport = chm.PositiveSmallIntegerField(default=0)
transport = models.CharField(max_length=3, default='')
protocol = models.TextField(default='')
content = models.TextField(default='')
timestamp = models.DateTimeField(default=timezone.now)
created_at = models.DateTimeField(auto_now_add=True)
length = chm.PositiveIntegerField(default=0)
count = chm.PositiveIntegerField(default=1)
class Meta:
verbose_name = 'Network event'
ordering = ['-id']
db_table = 'event'
engine = engines.ReplacingMergeTree(
order_by=('dst_ip', 'timestamp'),
partition_by=models.Func('timestamp', function='toYYYYMMDD')
)
indexes = [
indexes.Index(
fields=('src_ip', 'dst_ip'),
type=indexes.Set(1000),
granularity=4
)
]
constraints = (
models.CheckConstraint(
name='sport_range',
check=models.Q(sport__gte=0, dport__lte=65535),
),
)
Migration
python manage.py makemigrations
Testing
Writing testcase is all the same as normal django project. You can use django TestCase or pytest-django. Notice: clickhouse use mutations for deleting or updating. By default, data mutations is processed asynchronously, so you should change this default behavior in testing for deleting or updating. There are 2 ways to do that:
- Config database engine as follows, this sets
mutations_sync=1
at session scope.DATABASES = { 'default': { 'ENGINE': 'clickhouse_backend.backend', 'OPTIONS': { 'settings': { 'mutations_sync': 1, } } } }
- Use SETTINGS in SELECT Query.
Event.objects.filter(transport='UDP').settings(mutations_sync=1).delete()
Sample test case.
from django.test import TestCase
class TestEvent(TestCase):
def test_spam(self):
assert Event.objects.count() == 0
Topics
Primary key
Django ORM depends heavily on single column primary key, this primary key is a unique identifier of an ORM object.
All get
save
delete
actions depend on primary key.
But in ClickHouse primary key has different meaning with django primary key. ClickHouse does not require a unique primary key. You can insert multiple rows with the same primary key.
There is no unique constraint or auto increasing column in clickhouse.
By default, django will add a field named id
as auto increasing primary key.
-
AutoField
Mapped to clickhouse Int32 data type. You should generate this unique id yourself
-
BigAutoField
Mapped to clickhouse Int64 data type. If primary key is not specified when insert data, then
clickhouse_driver.idworker.id_worker
is used to generate this unique key.Default id_worker is an instance of
clickhouse.idworker.snowflake.SnowflakeIDWorker
which implement twitter snowflake id. If data insertions happen on multiple datacenter, server, process or thread, you should ensure uniqueness of (CLICKHOUSE_WORKER_ID, CLICKHOUSE_DATACENTER_ID) environment variable. Because work_id and datacenter_id are 5 bits, they should be an integer between 0 and 31. CLICKHOUSE_WORKER_ID default to 0, CLICKHOUSE_DATACENTER_ID will be generated randomly if not provided.clickhouse.idworker.snowflake.SnowflakeIDWorker
is not thread safe. You could inheritclickhouse.idworker.base.BaseIDWorker
and implement one, and setCLICKHOUSE_ID_WORKER
to doted import path of your IDWorker instance.
Django use a table named django_migrations
to track migration files. ID field should be BigAutoField, so that IDWorker can generate unique id for you.
After Django 3.2,a new config DEFAULT_AUTO_FIELD
is introduced to control field type of default primary key.
So DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'
is required if you want to use migrations with django clickhouse backend.
Fields
Nullable
null=True
will make Nullable type in clickhouse database.
Note Using Nullable almost always negatively affects performance, keep this in mind when designing your databases.
GenericIPAddressField
Clickhouse backend has its own implementation in clickhouse_backend.models.fields.GenericIPAddressField
.
If protocol='ipv4'
, a column of IPv4 is generated, else IPv6 is generated.
PositiveSmallIntegerField
PositiveIntegerField
PositiveBigIntegerField
clickhouse_backend.models.fields.PositiveSmallIntegerField
maps to UInt16.
clickhouse_backend.models.fields.PositiveIntegerField
maps to UInt32.
clickhouse_backend.models.fields.PositiveBigIntegerField
maps to UInt64.
Clickhouse have unsigned integer type, these fields will have right integer range validators.
Engines
Lays in clickhouse_backend.models.engines
.
Indexes
Lays in clickouse_backend.models.indexes
.
Test
To run test for this project:
git clone https://github.com/jayvynl/django-clickhouse-backend
cd django-clickhouse-backend
# docker and docker-compose are required.
docker-compose up -d
python tests/runtests.py
Note This project is not fully tested yet and should be used with caution in production.
License
Django clickhouse backend is distributed under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file django-clickhouse-backend-0.2.1.tar.gz
.
File metadata
- Download URL: django-clickhouse-backend-0.2.1.tar.gz
- Upload date:
- Size: 33.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5afc7d7e96b4d3c8f7b226c1d0714243da0cc9178296114106b824aec9c3e519 |
|
MD5 | 6a79164c15ffaef1b9138c6e8eb18592 |
|
BLAKE2b-256 | 7651e7feafc74e6b6c12762404c22d828220cd0d3fad157fecd12a4820ac131b |
Provenance
File details
Details for the file django_clickhouse_backend-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: django_clickhouse_backend-0.2.1-py3-none-any.whl
- Upload date:
- Size: 38.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f679fcc87aeadda8e6fb1b2699f8af818d7ceb6d173eb50a33859824d47f9d3 |
|
MD5 | 93823e5974151a55880793dfda67fbc9 |
|
BLAKE2b-256 | 03bc901c4675c658ff8b8edffc59deff7007698148fcb7e94970618722167756 |