The Starrocks adapter plugin for dbt
Project description
dbt-starrocks
This project is under development.
The dbt-starrocks
package contains all the code to enable dbt to work with StarRocks.
This is an experimental plugin:
- We have not tested it extensively
- Requires StarRocks version 2.5.0 or higher
- version 3.1.x is recommended
- StarRocks versions 2.4 and below are no longer supported
Installation
This plugin can be installed via pip:
$ pip install dbt-starrocks
Supported features
Starrocks <= 2.5 | Starrocks 2.5 ~ 3.1 | Starrocks >= 3.1 | Feature |
---|---|---|---|
✅ | ✅ | ✅ | Table materialization |
✅ | ✅ | ✅ | View materialization |
❌ | ❌ | ✅ | Materialized View materialization |
❌ | ✅ | ✅ | Incremental materialization |
❌ | ✅ | ✅ | Primary Key Model |
✅ | ✅ | ✅ | Sources |
✅ | ✅ | ✅ | Custom data tests |
✅ | ✅ | ✅ | Docs generate |
❌ | ❌ | ✅ | Expression Partition |
❌ | ❌ | ❌ | Kafka |
Notice
- When StarRocks Version < 2.5,
Create table as
can only set engine='OLAP' and table_type='DUPLICATE' - When StarRocks Version >= 2.5,
Create table as
supports table_type='PRIMARY' - When StarRocks Version < 3.1 distributed_by is required
Profile Configuration
Example entry for profiles.yml:
starrocks:
target: dev
outputs:
dev:
type: starrocks
host: localhost
port: 9030
schema: analytics
username: your_starrocks_username
password: your_starrocks_password
Option | Description | Required? | Example |
---|---|---|---|
type | The specific adapter to use | Required | starrocks |
host | The hostname to connect to | Required | 192.168.100.28 |
port | The port to use | Required | 9030 |
schema | Specify the schema (database) to build models into | Required | analytics |
username | The username to use to connect to the server | Required | dbt_admin |
password | The password to use for authenticating to the server | Required | correct-horse-battery-staple |
version | Let Plugin try to go to a compatible starrocks version | Optional | 3.1.0 |
use_pure | set to "true" to use C extensions | Optional | true |
More details about setting use_pure
and other connection arguments here
Example
dbt seed properties(yml):
Complete configuration:
models:
materialized: table // table or view or materialized_view
engine: 'OLAP'
keys: ['id', 'name', 'some_date']
table_type: 'PRIMARY' // PRIMARY or DUPLICATE or UNIQUE
distributed_by: ['id']
buckets: 3 // leave empty for auto bucketing
indexs=[{ 'columns': 'idx_column' }]
partition_by: ['some_date']
partition_by_init: ["PARTITION p1 VALUES [('1971-01-01 00:00:00'), ('1991-01-01 00:00:00')),PARTITION p1972 VALUES [('1991-01-01 00:00:00'), ('1999-01-01 00:00:00'))"]
// RANGE, LIST, or Expr partition types should be used in conjunction with partition_by configuration
// Expr partition type requires an expression (e.g., date_trunc) specified in partition_by
order_by: ['some_column'] // only for PRIMARY table_type
partition_type: 'RANGE' // RANGE or LIST or Expr Need to be used in combination with partition_by configuration
properties: [{"replication_num":"1", "in_memory": "true"}]
refresh_method: 'async' // only for materialized view default manual
dbt run config:
Example configuration:
{{ config(materialized='view') }}
{{ config(materialized='table', engine='OLAP', buckets=32, distributed_by=['id']) }}
{{ config(materialized='table', indexs=[{ 'columns': 'idx_column' }]) }}
{{ config(materialized='table', partition_by=['date_trunc("day", first_order)'], partition_type='Expr') }}
{{ config(materialized='table', table_type='PRIMARY', keys=['customer_id'], order_by=['first_name', 'last_name'] }}
{{ config(materialized='incremental', table_type='PRIMARY', engine='OLAP', buckets=32, distributed_by=['id']) }}
{{ config(materialized='materialized_view') }}
{{ config(materialized='materialized_view', properties={"storage_medium":"SSD"}) }}
{{ config(materialized='materialized_view', refresh_method="ASYNC START('2022-09-01 10:00:00') EVERY (interval 1 day)") }}
For materialized view only support partition_by、buckets、distributed_by、properties、refresh_method configuration.
Read From Catalog
First you need to add this catalog to starrocks. The following is an example of hive.
CREATE EXTERNAL CATALOG `hive_catalog`
PROPERTIES (
"hive.metastore.uris" = "thrift://127.0.0.1:8087",
"type"="hive"
);
How to add other types of catalogs can be found in the documentation. https://docs.starrocks.io/en-us/latest/data_source/catalog/catalog_overview Then write the sources.yaml file.
sources:
- name: external_example
schema: hive_catalog.hive_db
tables:
- name: hive_table_name
Finally, you might use below marco quote
{{ source('external_example', 'hive_table_name') }}
Test Adapter
Run the following
python3 -m pytest tests/functional
consult the project
Contributing
We welcome you to contribute to dbt-starrocks. Please see the Contributing Guide for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dbt_starrocks-1.7.0.tar.gz
.
File metadata
- Download URL: dbt_starrocks-1.7.0.tar.gz
- Upload date:
- Size: 19.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c75cf89198b0697cce814b242d283e9e1dc49433d87dd015665fe69735a0a88c |
|
MD5 | cfcfb98b5ca5f610b9b081638cbf2f6c |
|
BLAKE2b-256 | 20aaacd33a85e3ad9fc27676016281cbde5521a06aeb5dc0d82c94c805df2758 |
Provenance
File details
Details for the file dbt_starrocks-1.7.0-py3-none-any.whl
.
File metadata
- Download URL: dbt_starrocks-1.7.0-py3-none-any.whl
- Upload date:
- Size: 31.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b740335273a7efa730c8189ec05ea03d30561213f7c23b2128381770b03e9d63 |
|
MD5 | 0e08b7e80db17c3e2ccdcc8616ea2009 |
|
BLAKE2b-256 | 583320828e76fde6b8629291b5524638de1029ea5b222dfd0b9787d23e788efe |