Skip to main content

The MaxCompute adapter plugin for dbt

Project description

MaxCompute logo dbt logo

dbt-maxcompute

PyPI version License Unit Tests Badge

Welcome to the dbt-maxCompute repository! This project aims to extend the capabilities of dbt (data build tool) for users of Alibaba MaxCompute, a cutting-edge data processing platform.

What is dbt?

dbt empowers data analysts and engineers to transform their data using software engineering best practices. It serves as the T in the ELT (Extract, Load, Transform) process, allowing users to organize, cleanse, denormalize, filter, rename, and pre-aggregate raw data, making it analysis-ready.

About MaxCompute

MaxCompute is Alibaba Group's cloud data warehouse and big data processing platform, supporting massive data storage and computation, widely used for data analysis and business intelligence. With MaxCompute, users can efficiently manage and analyze large volumes of data and gain real-time business insights.

This repository contains the foundational code for the dbt-maxcompute adapter plugin. For guidance on developing the adapter, please refer to the official documentation.

Important Note

The README you are currently viewing will be updated with specific instructions and details on how to utilize the adapter as development progresses.

Adapter Versioning

This adapter plugin follows semantic versioning. The initial version is v1.8.0-a0, designed for compatibility with dbt Core v1.8.0. Since the plugin is in its early stages, the version number a0 indicates that it is an Alpha release. A stable version will be released in the future, focusing on MaxCompute-specific functionality and aiming for backwards compatibility.

Getting Started

Install the plugin

# we use conda and python 3.10 for this example
conda create --name dbt-maxcompute-example python=3.10
conda activate dbt-maxcompute-example

pip install dbt-core
pip install dbt-maxcompute

Configure dbt profile:

  1. Create a file in the ~/.dbt/ directory named profiles.yml.
  2. Copy the following and paste into the new profiles.yml file. Make sure you update the values where noted.
jaffle_shop: # this needs to match the profile in your dbt_project.yml file
  target: dev
  outputs:
    dev:
      type: maxcompute
      project: dbt-example # Replace this with your project name
      schema: default # Replace this with schema name, e.g. dbt_bilbo
      endpoint: http://service.cn-shanghai.maxcompute.aliyun.com/api # Replace this with your maxcompute endpoint
      auth_type: access_key
      access_key_id: XXX # Replace this with your accessId(ak)
      access_key_secret: XXX # Replace this with your accessKey(sk)

Currently we support the following parameters:

Field Description Default Value
type The type of database connection. Must be set to "maxcompute" for MaxCompute connections. "maxcompute"
project The name of your MaxCompute project. Required (no default)
endpoint The endpoint URL used to connect to MaxCompute. Required (no default)
schema The namespace schema that the models will use in MaxCompute. Required (no default)
auth_type Authentication method for accessing MaxCompute. "access_key"
access_key_id Access ID used for authentication. Required if using access key auth
access_key_secret Access Key Secret used for authentication. Required if using access key auth
timezone The Timezone used for MaxCompute. "GMT"
tunnel_endpoint The tunnel endpoint URL used to fetch result from MaxCompute. Auto detected by endpoint
Other auth options Alternative authentication methods such as STS. See Authentication Configuration. Varies by auth type

Note: Fields marked with "Required" must be explicitly specified in your configuration.

Run your dbt models

If you are new to DBT, we have prepared a Tutorial document for your reference. Of course, you can also access the official documentation provided by DBT (but some additional adaptations may be required for MaxCompute)

Configure Your dbt Models

You can customize dbt materialization behavior through model configurations. For general dbt configuration reference, see the official documentation: dbt Model Configs.

While dbt core provides native configurations like materialized and sql_header, this section focuses on dbt-maxcompute specific configurations that control table creation behavior during materialization.

dbt-maxcompute Specific Configurations

Parameter Type Default Description
tblproperties Map[String,String] - Additional table properties. Example: {'table.format.version'='2'} creates an Append2 table.
transactional Boolean false Equivalent to tblproperties ('transactional' = 'true'). Indicates whether to create a transactional table.
delta Boolean false Same to transactional, additional primary key validation.
primary_keys List[String] - List of primary key column names (e.g., ['c1']). Required when delta=true.
delta_table_bucket_num Integer 16 Equivalent to tblproperties ('write.bucket.num' = 'xx'). Controls bucket count for Delta tables.
partition_by Map - Defines partitioning strategy with two fields:
fields: Comma-separated partition columns
data_types: Optional data types (default: string). When specifying time types (date, datetime, timestamp), creates auto-partitioned tables.
Example: {"fields": "name,some_date", "data_types": "string,string"}
lifecycle Integer - Table retention period in days (e.g., 30 for 30-day lifecycle).
sql_hints Map[String,String] See below for defaults SQL hints applied to all queries for optimization or compatibility.

Default SQL Hints

MaxCompute supports global SQL hints to control query behavior and optimize performance. The following are the default global hints used by our system:

odps.sql.type.system.odps2: "true"
odps.sql.decimal.odps2: "true"
odps.sql.allow.fullscan: "true"
odps.sql.select.output.format: "csv"
odps.sql.submit.mode: "script"
odps.sql.allow.cartesian: "true"
odps.sql.allow.schema.evolution: "true"
odps.table.append2.enable": "true"

You can override these defaults by specifying your own sql_hints use model config. Your custom hints will be merged with the defaults — you do not need to repeat the entire list unless you want to change specific values.

Compatible dbt Packages for MaxCompute

The following community-maintained dbt packages have been verified to work with dbt-maxcompute:

  1. dbt-date (MaxCompute Edition)
  2. dbt-utils (MaxCompute Edition)
  3. dbt-expectations (MaxCompute Edition)
  4. elementary (MaxCompute Edition)
  5. dbt-project-evaluator (MaxCompute Edition)

Known Limitations

Due to MaxCompute engine characteristics, the following limitations apply:

Limitation Description
No rowcount support MaxCompute does not return the number of affected rows after DML operations. The rows_affected field in adapter responses will not be available.
No transaction support MaxCompute does not support traditional database transactions. BEGIN, COMMIT, and ROLLBACK operations are no-ops.

Developers Guide

If you want to contribute or develop the adapter, use the following command to set up your environment:

pip install -r dev-requirements.txt

Reporting Bugs and Contributing

Your feedback helps improve the project:

  • To report bugs or request features, please open a new issue on GitHub.

Code of Conduct

We are committed to fostering a welcoming and inclusive environment. All community members are expected to adhere to the dbt Code of Conduct.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_maxcompute-1.11.0.tar.gz (45.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_maxcompute-1.11.0-py3-none-any.whl (59.5 kB view details)

Uploaded Python 3

File details

Details for the file dbt_maxcompute-1.11.0.tar.gz.

File metadata

  • Download URL: dbt_maxcompute-1.11.0.tar.gz
  • Upload date:
  • Size: 45.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for dbt_maxcompute-1.11.0.tar.gz
Algorithm Hash digest
SHA256 ebd9afd4b1352f4c12a19a5fdc33c25e21c991b07b401648721012c63bd8aa97
MD5 c8894e6743b4370d1235c5dee64fc3cf
BLAKE2b-256 18a989ab2b5a12fa7712c5600a6b3bc66fc6c2dd24d4cfbe44760d3b3d375426

See more details on using hashes here.

File details

Details for the file dbt_maxcompute-1.11.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dbt_maxcompute-1.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e95e8b05ea8d7899baf34c0b39d9a9fa14a612944bfcc1df2d9cd579d94a602f
MD5 c77a6576d278cc6578c012d158ef7cd1
BLAKE2b-256 67ddf22c89757015952029479e363d329a3bcb3017cc8021d8b7fe5de5d6dd3a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page