Skip to main content

Django clickHouse database backend.

Project description

Django ClickHouse Database Backend

中文文档

Django clickhouse backend is a django database backend for clickhouse database. This project allows using django ORM to interact with clickhouse.

Thanks to clickhouse driver, django clickhouse backend use it as DBAPI. Thanks to clickhouse pool, it makes clickhouse connection pool.

features:

  • Support Clickhouse native interface and connection pool.
  • Define clickhouse specific schema features such as Engine and Index in django ORM.
  • Support table migrations.
  • Support creating test database and table, working with django TestCase and pytest-django.
  • Support most types of query and data types, full feature is under developing.
  • Support SETTINGS in SELECT Query.

Get started

Installation

pip install django-clickhouse-backend

or

git clone https://github.com/jayvynl/django-clickhouse-backend
cd django-clickhouse-backend
python setup.py install

Configuration

Only ENGINE is required, other options have default values.

  • ENGINE: required, set to clickhouse_backend.backend.

  • NAME: database name, default default.

  • HOST: database host, default localhost.

  • PORT: database port, default 9000.

  • USER: database user, default default.

  • PASSWORD: database password, default empty.

    DATABASES = {
        'default': {
            'ENGINE': 'clickhouse_backend.backend',
            'NAME': 'default',
            'HOST': 'localhost',
            'USER': 'DB_USER',
            'PASSWORD': 'DB_PASSWORD',
            'TEST': {
                'fake_transaction': True
            }
        }
    }
    DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'
    

DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField' IS REQUIRED TO WORKING WITH DJANGO MIGRATION. More details will be covered in [Primary key](#Primary key).

Model

from django.db import models
from django.utils import timezone

from clickhouse_backend import models as chm
from clickhouse_backend.models import indexes, engines


class Event(chm.ClickhouseModel):
    src_ip = chm.GenericIPAddressField(default='::')
    sport = chm.PositiveSmallIntegerField(default=0)
    dst_ip = chm.GenericIPAddressField(default='::')
    dport = chm.PositiveSmallIntegerField(default=0)
    transport = models.CharField(max_length=3, default='')
    protocol = models.TextField(default='')
    content = models.TextField(default='')
    timestamp = models.DateTimeField(default=timezone.now)
    created_at = models.DateTimeField(auto_now_add=True)
    length = chm.PositiveIntegerField(default=0)
    count = chm.PositiveIntegerField(default=1)

    class Meta:
        verbose_name = 'Network event'
        ordering = ['-id']
        db_table = 'event'
        engine = engines.ReplacingMergeTree(
            order_by=('dst_ip', 'timestamp'),
            partition_by=models.Func('timestamp', function='toYYYYMMDD')
        )
        indexes = [
            indexes.Index(
                fields=('src_ip', 'dst_ip'),
                type=indexes.Set(1000),
                granularity=4
            )
        ]
        constraints = (
            models.CheckConstraint(
                name='sport_range',
                check=models.Q(sport__gte=0, dport__lte=65535),
            ),
        )

Migration

python manage.py makemigrations

Testing

Writing testcase is all the same as normal django project. You can use django TestCase or pytest-django. Notice: clickhouse use mutations for deleting or updating. By default, data mutations is processed asynchronously, so you should change this default behavior in testing for deleting or updating. There are 2 ways to do that:

  • Config database engine as follows, this sets mutations_sync=1 at session scope.
    DATABASES = {
        'default': {
            'ENGINE': 'clickhouse_backend.backend',
            'OPTIONS': {
                'settings': {
                    'mutations_sync': 1,
                }
            }
        }
    }
    
  • Use SETTINGS in SELECT Query.
    Event.objects.filter(transport='UDP').settings(mutations_sync=1).delete()
    

Sample test case.

from django.test import TestCase

class TestEvent(TestCase):
    def test_spam(self):
        assert Event.objects.count() == 0

Topics

Primary key

Django ORM depends heavily on single column primary key, this primary key is a unique identifier of an ORM object. All get save delete actions depend on primary key.

But in ClickHouse primary key has different meaning with django primary key. ClickHouse does not require a unique primary key. You can insert multiple rows with the same primary key.

There is no unique constraint or auto increasing column in clickhouse.

By default, django will add a field named id as auto increasing primary key.

  • AutoField

    Mapped to clickhouse Int32 data type. You should generate this unique id yourself

  • BigAutoField

    Mapped to clickhouse Int64 data type. If primary key is not specified when insert data, then clickhouse_driver.idworker.id_worker is used to generate this unique key.

    Default id_worker is an instance of clickhouse.idworker.snowflake.SnowflakeIDWorker which implement twitter snowflake id. If data insertions happen on multiple datacenter, server, process or thread, you should ensure uniqueness of (CLICKHOUSE_WORKER_ID, CLICKHOUSE_DATACENTER_ID) environment variable. Because work_id and datacenter_id are 5 bits, they should be an integer between 0 and 31. CLICKHOUSE_WORKER_ID default to 0, CLICKHOUSE_DATACENTER_ID will be generated randomly if not provided.

    clickhouse.idworker.snowflake.SnowflakeIDWorker is not thread safe. You could inherit clickhouse.idworker.base.BaseIDWorker and implement one, and set CLICKHOUSE_ID_WORKER to doted import path of your IDWorker instance.

Django use a table named django_migrations to track migration files. ID field should be BigAutoField, so that IDWorker can generate unique id for you. After Django 3.2,a new config DEFAULT_AUTO_FIELD is introduced to control field type of default primary key. So DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField' is required if you want to use migrations with django clickhouse backend.

Fields

Nullable

null=True will make Nullable type in clickhouse database.

Note Using Nullable almost always negatively affects performance, keep this in mind when designing your databases.

GenericIPAddressField

Clickhouse backend has its own implementation in clickhouse_backend.models.fields.GenericIPAddressField. If protocol='ipv4', a column of IPv4 is generated, else IPv6 is generated.

PositiveSmallIntegerField

PositiveIntegerField

PositiveBigIntegerField

clickhouse_backend.models.fields.PositiveSmallIntegerField maps to UInt16. clickhouse_backend.models.fields.PositiveIntegerField maps to UInt32. clickhouse_backend.models.fields.PositiveBigIntegerField maps to UInt64. Clickhouse have unsigned integer type, these fields will have right integer range validators.

Engines

Lays in clickhouse_backend.models.engines.

Indexes

Lays in clickouse_backend.models.indexes.

Test

To run test for this project:

git clone https://github.com/jayvynl/django-clickhouse-backend
cd django-clickhouse-backend
# docker and docker-compose are required.
docker-compose up -d
python tests/runtests.py

Note This project is not fully tested yet and should be used with caution in production.

License

Django clickhouse backend is distributed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-clickhouse-backend-0.2.1.tar.gz (33.8 kB view hashes)

Uploaded Source

Built Distribution

django_clickhouse_backend-0.2.1-py3-none-any.whl (38.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page