Skip to main content

ClickHouse core driver, SqlAlchemy, and Superset libraries

Project description

ClickHouse Connect

A suite of Python packages for connecting Python to ClickHouse, initially supporting Apache Superset using a minimal read only SQLAlchemy dialect. Uses the ClickHouse HTTP interface.

Installation

pip install clickhouse-connect

ClickHouse Connect requires Python 3.7 or higher. The cython package must be installed prior to installing clickhouse_connect to build and install the optional Cython/C extensions used for improving read and write performance using the ClickHouse Native format. After installing cython if desired, clone this repository and run python setup.py installfrom the project directory.

Getting Started

Basic query:

import clickhouse_connect
client = clickhouse_connect.get_client(host='play.clickhouse.com', port=443, username='play')
query_result = client.query('SELECT * FROM system.tables')
print (query_result.result_set) 

Simple 'command' that does not return a result set.

import clickhouse_connect
client = clickhouse_connect.get_client()
client.command ('CREATE TABLE test_table (key UInt16, value String) ENGINE Memory')

Bulk insert of a matrix of rows and columns.

data = [[100, 'value1'], [200, 'value2']]
client.insert('test_table', data)
print(client.query('SELECT * FROM test_table').result_set)
->  [(100, 'value1'), (200, 'value2')]

Minimal SQLAlchemy Support

On installation ClickHouse Connect registers the clickhousedb SQLAlchemy Dialect entry point. This dialect supports basic table reflection for table columns and datatypes, and command and query execution using DB API 2.0 cursors. Most ClickHouse datatypes have full query/cursor support.

ClickHouse Connect does not yet implement the full SQLAlchemy API for DDL (Data Definition Language) or ORM (Object Relational Mapping). These features are in development.

Superset Support

On installation ClickHouse Connect registers the clickhousedb Superset Database Engine Spec entry point. Using the clickhousedb SQLAlchemy dialect, the engine spec supports complete data exploration and Superset SQL Lab functionality with all standard ClickHouse data types. See Connecting Superset for complete instructions.

ClickHouse Enum, UUID, and IP Address datatypes are treated as strings. For compatibility with Superset Pandas dataframes, unsigned UInt64 data types are interpreted as signed Int64 values. ClickHouse CSV Upload via SuperSet is not yet implemented.

Optional Features

SQLAlchemy and Superset require the corresponding SQLAlchemy and Apache Superset packages to be included in your Python installation. ClickHouse connect also includes C/Cython extensions for improved performance reading String and FixedString datatypes. These extensions will be installed automatically by setup.py if a C compiler is available.

Query results can be returned as either a numpy array or a pandas DataFrame if the numpy and pandas libraries are available. Use the client methods query_np and query_df respectively.

Tests

The tests directory contains a standard pytest test suite. Integration tests require docker to be installed and run against the current clickhouse/clickhouse-server image. The test suite includes "fuzz" testing for reading/writing all supported datatypes with randomly generated data. To run the full test suite all libraries supporting optional features (Superset and SQLAlchemy) should be available. To install the C/Cython libaries required for testing in the project, run python setup.py build_ext --inplace.

Main Client Interface

Interaction with the ClickHouse server is done through a clickhouse_connect Client instance. At this point only an HTTP(s) based Client is supported.

HTTP Client constructor/initialization parameters

Create a ClickHouse client using the clickhouse_connect.driver.create_client(...) function or clickhouse_connect.get_client(...) wrapper. All parameters are optional:

  • interface:str https or http
    Defaults to https if a recognized secure port (443 or 8443), otherwise http
  • host:str ClickHouse server hostname or ip address
    Defaults to localhost
  • port:int ClickHouse server port number
    Defaults to 8123 or 8443 (if the interface is https)
  • username:str ClickHouse user
  • password:str ClickHouse password
  • database:str Default database to use for the client.
    Defaults to the default database for the ClickHouse user
  • compress:bool Accept compressed data from the ClickHouse server.
    Defaults to True
  • format:str native (ClickHouse Native) or rb (ClickHouse Row Binary)
    Native format is preferred for performance reasons
  • query_limit:int LIMIT value added to all queries.
    Defaults to 5,000 rows. Setting query_limit=0 will return unlimited results, at the risk of running out of memory
  • connect_timeout:int HTTP connection timeout in seconds. Default 10 seconds.
  • send_receive_timeout:int HTTP read timeout in seconds. Default 300 seconds.
  • client_name:str HTTP User-Agent header. Defaults to clickhouse-connect
  • verify:bool For HTTPS connections, validate the ClickHouse server TLS certificate, including matching hostname, expiration, and signed by a trusted Certificate Authority. Defaults to True.
  • ca_cert:str File path to private/Self-Signed Certificate Authority TLS certificate to validate ClickHouse server
  • client_cert:str File path to Client Certificate for mutual TLS authentication (including any intermediate certificates)
  • client_cert_key:str File path to Client Certificate private key for mutual TLS authentication. This parameter is not required if the client_cert is a .pem file containing both the certificate(s) and private key

Any remaining keyword parameters are interpreted as 'setting' parameters to send to the ClickHouse server with every query/request

Querying data

Use the client query method to retrieve a QueryResult from ClickHouse. Parameters:

  • query:str Required. ClickHouse query statement (either SELECT, DESCRIBE, or other system commands that result a result set.)
  • parameters:Dict[str, Any] Optional. Dictionary of str keys with values of any Python scalar datatype. These values will be replaced in the query string using Python %s type formatting
  • settings:Dict[str, str] Optional. Dictionary of str keys and str values for ClickHouse query settings
  • use_none:bool Optional, defaults to true. Use the Python None type for ClickHouse NULLS. Otherwise the QueryResult will include 0/empty values for ClickHouse nulls. This is useful for populating Numpy arrays or Pandas dataframes, which do not accept nulls

The query method results a QueryResult object with the following fields:

  • result_set:Sequence[Sequence] A sequence of rows of column values containing the Python native data types from the query.
  • column_names:list A list of column names for the rows in the result_set
  • column_types:list A list of ClickHouseType objects for each column
  • query_id:str The ClickHouse query id. Note that this can be specified for ClickHouse by using the settings parameter, e.g. client.query('SELECT * FROM system.tables', settings={'query_id': 'test_query_id'}) Otherwise ClickHouse will assign a random UUID as the query id.
  • summary:Dict[str, Any] The final contents of the X-ClickHouse-Summary header. Note that this is empty unless the setting send_progress_in_http_headers is enabled.

Numpy and Pandas queries

If Numpy is installed, the driver can return a complete Numpy array by using the client query_np method instead of query. The parameters for query and query_np are the same, except query_np does not need or accept the use_none argument as Numpy arrays do not support Python None.

If Pandas is installed as well as Numpy, a populated pandas DataFrame will be returned by the client query_df method instead of query. The three parameters (query, parameters, and settings) are identical to the parameters for query_np.

Datatype options for queries

There are some convenience methods in the clickhouse_connect.driver package that control the format of some ClickHouse datatypes. These are included in part to improve Superset compatibility.

  • fixed_string_format Format for FixedString datatypes. Options are bytes and string, defaults to Python byte objects. Set to string when the SuperSet packages are initialized for datasets that used FixedString objects as actual strings.
  • big_int_format Format for U/Int128 and U/Int256 datatypes. Options are int and string, defaults to int datatypes. Set to string when SupersetSet packages are initialized because SuperSet dataframes do not handle Python integers larger than 64 bits.
  • uint64_format Format for UInt64 ClickHouse types. Options are signed and unsigned, defaults to _unsigned. Set to signed when SuperSet packages are initialized because SuperSet dataframes do not handle unsigned 64 bit integers.
  • uuid_format Format for UUID ClickHouse types. Options are uuid and string, defaults to Python UUID datatypes. Set to string when SuperSet packages are initialized because SuperSet dataframes do not provide special handling for UUID types
  • ip_format Format for IPv4/IPv6 ClickHouse types. Options are ip and string, default to Python IP4Address/ IPv6Address types. Set to string for compatibility with SuperSet dataframes

Inserting data

Use the client insert method to insert data into a ClickHouse table. Parameters:

  • table:str Required. ClickHouse table name to insert data into. This can be either the full database.table or just the table name. In the latter case the database is determined by either the database parameter or the default database for the client/connection
  • data:Iterable:[Iterable[Any]] Required. The matrix of rows and columns of native Python datatypes to insert.
  • column_names:Union[str, Iterable[str]] Required. Either a single column name or list of columns. If *, the driver will retrieve the list of columns from the ClickHouse Server in position order, which is fragile and not recommended. Column names should be in the same order as columns in the data collection.
  • column_types:[Iterable[ClickHouseType]] Optional. List of driver ClickHouseType objects that match the column_names parameter. If not specified (and column_type_names is not specified), column types will be retrieved from the ClickHouse server using the DESCRIBE TABLE.
  • column_type_names:[Iterable[str]] Optional. List of column type names as required/returned by the ClickHouse server. These can be used to populate the column_types parameter without calling the ClickHouse Server.
  • column_oriented:bool Default False. If True the data parameter is processed as a sequence of equal length columns, instead of a list of rows. This eliminates the need to "pivot" the matrix when using the Native data format.
  • settings:Dict[str, str] Optional. Dictionary of ClickHouse settings to be applied to the insert query.

Notes on data inserts

The client insert_df can be used to insert a Pandas DataFrame, assuming the column names in the DataFrame match the ClickHouse table column names. Note that a Numpy array can be passed directly as the data parameter to the primary insert method so there is no separate insert_np method.

For column types that can be different native Python types (for example, UUIDs or IP Addresses), the driver will assume that the data type for the whole column matches the first non "None" value in the column and process insert data accordingly. So if the first data value for insert into a ClickHouse UUID column is a string, the driver will assume all data values in that insert column are strings.

DDL and other "simple" SQL statements

The client command method can be used for ClickHouse commands/queries that return a single result or row of results values. In this case the result is returned as a single row TabSeparated values and are cast to a single string, int, or list of string values. The command method parameters are:

  • cmd:str Required. ClickHouse SQL command/query.
  • parameters:Dict[str, Any] Optional. Dictionary of str keys with values of any Python scalar datatype. These values will be replaced in the query string using Python %s type formatting
  • use_database:bool Optional, defaults to True. Use the client default database (as set when the client is created or the user's default database) when sending the query. This is set to False in order to determine the users default database on connection.
  • settings:Dict[str, Any] Optional. Dictionary of ClickHouse settings to be applied to the insert query.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clickhouse-connect-0.2.8.tar.gz (91.5 kB view hashes)

Uploaded Source

Built Distributions

clickhouse_connect-0.2.8-pp39-pypy39_pp73-win_amd64.whl (90.2 kB view hashes)

Uploaded PyPy Windows x86-64

clickhouse_connect-0.2.8-pp39-pypy39_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (92.8 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

clickhouse_connect-0.2.8-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (93.6 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

clickhouse_connect-0.2.8-pp39-pypy39_pp73-macosx_10_9_x86_64.whl (87.7 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

clickhouse_connect-0.2.8-pp38-pypy38_pp73-win_amd64.whl (90.4 kB view hashes)

Uploaded PyPy Windows x86-64

clickhouse_connect-0.2.8-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (92.8 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

clickhouse_connect-0.2.8-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (93.6 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

clickhouse_connect-0.2.8-pp38-pypy38_pp73-macosx_10_9_x86_64.whl (87.7 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

clickhouse_connect-0.2.8-pp37-pypy37_pp73-win_amd64.whl (90.4 kB view hashes)

Uploaded PyPy Windows x86-64

clickhouse_connect-0.2.8-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (92.8 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

clickhouse_connect-0.2.8-pp37-pypy37_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (93.6 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

clickhouse_connect-0.2.8-pp37-pypy37_pp73-macosx_10_9_x86_64.whl (87.7 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

clickhouse_connect-0.2.8-cp311-cp311-win_amd64.whl (92.1 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

clickhouse_connect-0.2.8-cp311-cp311-win32.whl (90.2 kB view hashes)

Uploaded CPython 3.11 Windows x86

clickhouse_connect-0.2.8-cp311-cp311-musllinux_1_1_x86_64.whl (187.1 kB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

clickhouse_connect-0.2.8-cp311-cp311-musllinux_1_1_i686.whl (184.2 kB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

clickhouse_connect-0.2.8-cp311-cp311-musllinux_1_1_aarch64.whl (183.8 kB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ ARM64

clickhouse_connect-0.2.8-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (183.3 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

clickhouse_connect-0.2.8-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (184.3 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

clickhouse_connect-0.2.8-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (180.6 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

clickhouse_connect-0.2.8-cp311-cp311-macosx_11_0_arm64.whl (90.1 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

clickhouse_connect-0.2.8-cp311-cp311-macosx_10_9_x86_64.whl (91.2 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

clickhouse_connect-0.2.8-cp310-cp310-win_amd64.whl (92.5 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

clickhouse_connect-0.2.8-cp310-cp310-win32.whl (90.8 kB view hashes)

Uploaded CPython 3.10 Windows x86

clickhouse_connect-0.2.8-cp310-cp310-musllinux_1_1_x86_64.whl (193.3 kB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

clickhouse_connect-0.2.8-cp310-cp310-musllinux_1_1_i686.whl (195.2 kB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

clickhouse_connect-0.2.8-cp310-cp310-musllinux_1_1_aarch64.whl (192.4 kB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ ARM64

clickhouse_connect-0.2.8-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (187.3 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

clickhouse_connect-0.2.8-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (189.2 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

clickhouse_connect-0.2.8-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (192.2 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

clickhouse_connect-0.2.8-cp310-cp310-macosx_11_0_arm64.whl (91.0 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

clickhouse_connect-0.2.8-cp310-cp310-macosx_10_9_x86_64.whl (91.7 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

clickhouse_connect-0.2.8-cp39-cp39-win_amd64.whl (93.1 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

clickhouse_connect-0.2.8-cp39-cp39-win32.whl (91.2 kB view hashes)

Uploaded CPython 3.9 Windows x86

clickhouse_connect-0.2.8-cp39-cp39-musllinux_1_1_x86_64.whl (198.6 kB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

clickhouse_connect-0.2.8-cp39-cp39-musllinux_1_1_i686.whl (198.4 kB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

clickhouse_connect-0.2.8-cp39-cp39-musllinux_1_1_aarch64.whl (198.0 kB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ ARM64

clickhouse_connect-0.2.8-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (192.5 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

clickhouse_connect-0.2.8-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194.0 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

clickhouse_connect-0.2.8-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (196.6 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

clickhouse_connect-0.2.8-cp39-cp39-macosx_11_0_arm64.whl (91.6 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

clickhouse_connect-0.2.8-cp39-cp39-macosx_10_9_x86_64.whl (92.4 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

clickhouse_connect-0.2.8-cp38-cp38-win_amd64.whl (93.2 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

clickhouse_connect-0.2.8-cp38-cp38-win32.whl (91.2 kB view hashes)

Uploaded CPython 3.8 Windows x86

clickhouse_connect-0.2.8-cp38-cp38-musllinux_1_1_x86_64.whl (197.3 kB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

clickhouse_connect-0.2.8-cp38-cp38-musllinux_1_1_i686.whl (198.5 kB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

clickhouse_connect-0.2.8-cp38-cp38-musllinux_1_1_aarch64.whl (194.7 kB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ ARM64

clickhouse_connect-0.2.8-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (193.6 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

clickhouse_connect-0.2.8-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (195.1 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

clickhouse_connect-0.2.8-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (199.0 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

clickhouse_connect-0.2.8-cp38-cp38-macosx_11_0_arm64.whl (91.7 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

clickhouse_connect-0.2.8-cp38-cp38-macosx_10_9_x86_64.whl (92.5 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

clickhouse_connect-0.2.8-cp37-cp37m-win_amd64.whl (93.2 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

clickhouse_connect-0.2.8-cp37-cp37m-win32.whl (91.1 kB view hashes)

Uploaded CPython 3.7m Windows x86

clickhouse_connect-0.2.8-cp37-cp37m-musllinux_1_1_x86_64.whl (186.1 kB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ x86-64

clickhouse_connect-0.2.8-cp37-cp37m-musllinux_1_1_i686.whl (189.7 kB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ i686

clickhouse_connect-0.2.8-cp37-cp37m-musllinux_1_1_aarch64.whl (183.0 kB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ ARM64

clickhouse_connect-0.2.8-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (180.1 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

clickhouse_connect-0.2.8-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (183.1 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

clickhouse_connect-0.2.8-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (186.4 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

clickhouse_connect-0.2.8-cp37-cp37m-macosx_10_9_x86_64.whl (92.3 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page