A Python library for working with the ClickHouse database
Project description
Overview
This project is simple ORM for working with the ClickHouse database. It allows you to define model classes whose instances can be written to the database and read from it.
Installation
To install infi.clickhouse_orm:
pip install infi.clickhouse_orm
Usage
Defining Models
Models are defined in a way reminiscent of Django’s ORM:
from infi.clickhouse_orm import models, fields, engines class Person(models.Model): first_name = fields.StringField() last_name = fields.StringField() birthday = fields.DateField() height = fields.Float32Field() engine = engines.MergeTree('birthday', ('first_name', 'last_name', 'birthday'))
It is possible to provide a default value for a field, instead of its “natural” default (empty string for string fields, zero for numeric fields etc.).
See below for the supported field types and table engines.
Using Models
Once you have a model, you can create model instances:
>>> dan = Person(first_name='Dan', last_name='Schwartz') >>> suzy = Person(first_name='Suzy', last_name='Jones') >>> dan.first_name u'Dan'
When values are assigned to model fields, they are immediately converted to their Pythonic data type. In case the value is invalid, a ValueError is raised:
>>> suzy.birthday = '1980-01-17' >>> suzy.birthday datetime.date(1980, 1, 17) >>> suzy.birthday = 0.5 ValueError: Invalid value for DateField - 0.5 >>> suzy.birthday = '1922-05-31' ValueError: DateField out of range - 1922-05-31 is not between 1970-01-01 and 2038-01-19
Inserting to the Database
To write your instances to ClickHouse, you need a Database instance:
from infi.clickhouse_orm.database import Database db = Database('my_test_db')
This automatically connects to http://localhost:8123 and creates a database called my_test_db, unless it already exists. If necessary, you can specify a different database URL and optional credentials:
db = Database('my_test_db', db_url='http://192.168.1.1:8050', username='scott', password='tiger')
Using the Database instance you can create a table for your model, and insert instances to it:
db.create_table(Person) db.insert([dan, suzy])
The insert method can take any iterable of model instances, but they all must belong to the same model class.
Reading from the Database
Loading model instances from the database is simple:
for person in db.select("SELECT * FROM my_test_db.person", model_class=Person): print person.first_name, person.last_name
Do not include a FORMAT clause in the query, since the ORM automatically sets the format to TabSeparatedWithNamesAndTypes.
It is possible to select only a subset of the columns, and the rest will receive their default values:
for person in db.select("SELECT first_name FROM my_test_db.person WHERE last_name='Smith'", model_class=Person): print person.first_name
Ad-Hoc Models
Specifying a model class is not required. In case you do not provide a model class, an ad-hoc class will be defined based on the column names and types returned by the query:
for row in db.select("SELECT max(height) as max_height FROM my_test_db.person"): print row.max_height
This is a very convenient feature that saves you the need to define a model for each query, while still letting you work with Pythonic column values and an elegant syntax.
Counting
The Database class also supports counting records easily:
>>> db.count(Person) 117 >>> db.count(Person, conditions="height > 1.90") 6
Field Types
Currently the following field types are supported:
Class |
DB Type |
Pythonic Type |
Comments |
---|---|---|---|
StringField |
String |
unicode |
Encoded as UTF-8 when written to ClickHouse |
DateField |
Date |
datetime.date |
Range 1970-01-01 to 2038-01-19 |
DateTimeField |
DateTime |
datetime.datetime |
Minimal value is 1970-01-01 00:00:00; Always in UTC |
Int8Field |
Int8 |
int |
Range -128 to 127 |
Int16Field |
Int16 |
int |
Range -32768 to 32767 |
Int32Field |
Int32 |
int |
Range -2147483648 to 2147483647 |
Int64Field |
Int64 |
int/long |
Range -9223372036854775808 to 9223372036854775807 |
UInt8Field |
UInt8 |
int |
Range 0 to 255 |
UInt16Field |
UInt16 |
int |
Range 0 to 65535 |
UInt32Field |
UInt32 |
int |
Range 0 to 4294967295 |
UInt64Field |
UInt64 |
int/long |
Range 0 to 18446744073709551615 |
Float32Field |
Float32 |
float |
|
Float64Field |
Float64 |
float |
Table Engines
Each model must have an engine instance, used when creating the table in ClickHouse.
To define a MergeTree engine, supply the date column name and the names (or expressions) for the key columns:
engine = engines.MergeTree('EventDate', ('CounterID', 'EventDate'))
You may also provide a sampling expression:
engine = engines.MergeTree('EventDate', ('CounterID', 'EventDate'), sampling_expr='intHash32(UserID)')
A CollapsingMergeTree engine is defined in a similar manner, but requires also a sign column:
engine = engines.CollapsingMergeTree('EventDate', ('CounterID', 'EventDate'), 'Sign')
For a SummingMergeTree you can optionally specify the summing columns:
engine = engines.SummingMergeTree('EventDate', ('OrderID', 'EventDate', 'BannerID'), summing_cols=('Shows', 'Clicks', 'Cost'))
Data Replication
Any of the above engines can be converted to a replicated engine (e.g. ReplicatedMergeTree) by adding two parameters, replica_table_path and replica_name:
engine = engines.MergeTree('EventDate', ('CounterID', 'EventDate'), replica_table_path='/clickhouse/tables/{layer}-{shard}/hits', replica_name='{replica}')
Development
After cloning the project, run the following commands:
easy_install -U infi.projector cd infi.clickhouse_orm projector devenv build
To run the tests, ensure that the ClickHouse server is running on http://localhost:8123/ (this is the default), and run:
bin/nosetests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for infi.clickhouse_orm-0.5.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0499f8537d9b8dcaef9b46de58ab025fe20bc84f001918260132fa9e58c9fb8f |
|
MD5 | 205b7f47973b2600e6c06e352c8f12b2 |
|
BLAKE2b-256 | 48c408bd7b26bbe7a5871a024cfc01447cd6dbde7e288fd9e263ba4dbebf7936 |
Hashes for infi.clickhouse_orm-0.5.1-py2.7.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd0ae0e8f31247ae063e39524ae2dd708ad937ab6b15a88dd29fcd6fe7cbce2b |
|
MD5 | 000931c3ebd67dccbe8a18ab7c8b9dc2 |
|
BLAKE2b-256 | a29386eab61f304e93a41448489ea1480111367eef76104d031c64b928a5fc89 |