Skip to main content

A Substrait command line tool

Project description

Substrait Tools

Table of Contents

Overview

This repository provides tools for substrait

Setup

Create and activate your conda environment with python3.9:

conda create -y -n substrait_tools -c conda-forge python=3.9 openjdk
conda activate substrait_tools

pip install substrait-tools

Command Line Tools

Create TPC-H Data

prepare_tpch_data                                                                                 
Parquet data written to /Users/richardtia/substrait-tools/tpch_data

Generate Substrait Plans

Example usage with SQL:

produce_substrait --producer IsthmusProducer --schema ./schema.sql --query "select * from lineitem" --validate --validator_overrides 1002
Validating plan: True
Validator override levels: [1002]
substrait plan written to: Isthmus_substrait.json

Example usage with Ibis:

produce_substrait --producer IbisProducer --schema ./tables.py --query_type ibis --ibis_expr expr=./expr.py --validate --validator_overrides 1002 2001 1
Validating plan: True
Validator override levels: [1002, 2001, 1]
substrait plan written to: Ibis_substrait.json

Arguments:
--producer: Which substrait producer to generate the plan with.
--schema: SQL Schema. Each create table command should be on its own line.
--query: SQL query.
--ibis_expr: Ibis expression. Argument should passed as a key value pair with the ibis expression function name and the python file with the function definition (--ibis_expr test_expr=./expression.py)
--validate: Default is false. If set, the plan will run against the substrait validator.
--validator_overrides: Bypass error codes found by the substrait validator. One or more integers separates by spaces.

Sample schemas:

SQL: schema.sql

CREATE TABLE lineitem(l_orderkey INTEGER NOT NULL, l_partkey INTEGER NOT NULL, l_suppkey INTEGER NOT NULL, l_linenumber INTEGER NOT NULL, l_quantity INTEGER NOT NULL, l_extendedprice DECIMAL(15,2) NOT NULL, l_discount DECIMAL(15,2) NOT NULL, l_tax DECIMAL(15,2) NOT NULL, l_returnflag VARCHAR NOT NULL, l_linestatus VARCHAR NOT NULL, l_shipdate DATE NOT NULL, l_commitdate DATE NOT NULL, l_receiptdate DATE NOT NULL, l_shipinstruct VARCHAR NOT NULL, l_shipmode VARCHAR NOT NULL, l_comment VARCHAR NOT NULL);

**Note: Each CREATE TABLE command should be on its own line.

Ibis: ibis_schema.py

lineitem = ibis.table(
        [
            ("l_orderkey", dt.int64),
            ("l_partkey", dt.int64),
            ("l_suppkey", dt.int64),
            ("l_linenumber", dt.int64),
            ("l_quantity", dt.Decimal(15, 2)),
            ("l_extendedprice", dt.Decimal(15, 2)),
            ("l_discount", dt.Decimal(15, 2)),
            ("l_tax", dt.Decimal(15, 2)),
            ("l_returnflag", dt.string),
            ("l_linestatus", dt.string),
            ("l_shipdate", dt.date),
            ("l_commitdate", dt.date),
            ("l_receiptdate", dt.date),
            ("l_shipinstruct", dt.string),
            ("l_shipmode", dt.string),
            ("l_comment", dt.string),
        ],
        name="lineitem",
    )

--schema ./ibis_schema.py

Sample ibis expression file:
expression.py

def expr(lineitem):
    new_col = lineitem.l_tax.acos().name("ACOS_TAX")
    return lineitem[new_col]

--query_type ibis --ibis_expr expr=./expression.py
**Note: The key name (expr) should be the same as the function.

Consume Substrait Plans

Example usage

consume_substrait --consumer DuckDBConsumer --substrait_plan ./Isthmus_substrait.json --table_file_pair lineitem=./lineitem.parquet region=./region.parquet

Arguments:
--consumer: Which substrait consumer to consume the plan with.
--substrait_plan: Json formatted substrait plan.
--table_file_pair: One or more table name and file pairings. Table name and file should be separated by an equal (=) sign.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

substrait_tools-0.0.3-py3-none-any.whl (43.8 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page