A Substrait command line tool
Project description
Substrait Tools
Table of Contents
Overview
This repository provides tools for substrait
Setup
Create and activate your conda environment with python3.9:
conda create -y -n substrait_tools -c conda-forge python=3.9 openjdk
conda activate substrait_tools
pip install substrait-tools
Command Line Tools
Create TPC-H Data
prepare_tpch_data
Parquet data written to /Users/richardtia/substrait-tools/tpch_data
Generate Substrait Plans
Example usage with SQL:
produce_substrait --producer IsthmusProducer --schema ./schema.sql --query "select * from lineitem" --validate --validator_overrides 1002
Validating plan: True
Validator override levels: [1002]
substrait plan written to: Isthmus_substrait.json
Example usage with Ibis:
produce_substrait --producer IbisProducer --schema ./tables.py --query_type ibis --ibis_expr expr=./expr.py --validate --validator_overrides 1002 2001 1
Validating plan: True
Validator override levels: [1002, 2001, 1]
substrait plan written to: Ibis_substrait.json
Arguments:
--producer: Which substrait producer to generate the plan with.
--schema: SQL Schema. Each create table command should be on its own line.
--query: SQL query.
--ibis_expr: Ibis expression. Argument should passed as a key value pair with the ibis expression function name and the python file with the function definition (--ibis_expr test_expr=./expression.py
)
--validate: Default is false. If set, the plan will run against the substrait validator.
--validator_overrides: Bypass error codes found by the substrait validator. One or more integers separates by spaces.
Sample schemas:
SQL: schema.sql
CREATE TABLE lineitem(l_orderkey INTEGER NOT NULL, l_partkey INTEGER NOT NULL, l_suppkey INTEGER NOT NULL, l_linenumber INTEGER NOT NULL, l_quantity INTEGER NOT NULL, l_extendedprice DECIMAL(15,2) NOT NULL, l_discount DECIMAL(15,2) NOT NULL, l_tax DECIMAL(15,2) NOT NULL, l_returnflag VARCHAR NOT NULL, l_linestatus VARCHAR NOT NULL, l_shipdate DATE NOT NULL, l_commitdate DATE NOT NULL, l_receiptdate DATE NOT NULL, l_shipinstruct VARCHAR NOT NULL, l_shipmode VARCHAR NOT NULL, l_comment VARCHAR NOT NULL);
**Note: Each CREATE TABLE
command should be on its own line.
Ibis: ibis_schema.py
lineitem = ibis.table(
[
("l_orderkey", dt.int64),
("l_partkey", dt.int64),
("l_suppkey", dt.int64),
("l_linenumber", dt.int64),
("l_quantity", dt.Decimal(15, 2)),
("l_extendedprice", dt.Decimal(15, 2)),
("l_discount", dt.Decimal(15, 2)),
("l_tax", dt.Decimal(15, 2)),
("l_returnflag", dt.string),
("l_linestatus", dt.string),
("l_shipdate", dt.date),
("l_commitdate", dt.date),
("l_receiptdate", dt.date),
("l_shipinstruct", dt.string),
("l_shipmode", dt.string),
("l_comment", dt.string),
],
name="lineitem",
)
--schema ./ibis_schema.py
Sample ibis expression file:
expression.py
def expr(lineitem):
new_col = lineitem.l_tax.acos().name("ACOS_TAX")
return lineitem[new_col]
--query_type ibis --ibis_expr expr=./expression.py
**Note: The key name (expr
) should be the same as the function.
Consume Substrait Plans
Example usage
consume_substrait --consumer DuckDBConsumer --substrait_plan ./Isthmus_substrait.json --table_file_pair lineitem=./lineitem.parquet region=./region.parquet
Arguments:
--consumer: Which substrait consumer to consume the plan with.
--substrait_plan: Json formatted substrait plan.
--table_file_pair: One or more table name and file pairings. Table name and file should be separated by an equal (=) sign.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for substrait_tools-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7432161389d019fe975ef8780040008d1603bb4be9ac2378d2a201128ba91f1c |
|
MD5 | c02a9eec1241756574080bf9e525acb0 |
|
BLAKE2b-256 | 31510bce071fb2a02e1c9c0eef1924ef666b8bff392d546b1527d67da751e7c6 |