A Substrait command line tool
Project description
Substrait Tools
Table of Contents
Overview
This repository provides tools for substrait
Setup
Create and activate your conda environment with python3.9:
conda create -y -n substrait_tools -c conda-forge python=3.9 openjdk
conda activate substrait_tools
pip install substrait-tools
Command Line Tools
Create TPC-H Data
prepare_tpch_data
Parquet data written to /Users/richardtia/substrait-tools/tpch_data
Generate Substrait Plans
Example usage with SQL:
produce_substrait --producer IsthmusProducer --schema ./schema.sql --query "select * from lineitem" --validate --validator_overrides 1002
Validating plan: True
Validator override levels: [1002]
substrait plan written to: Isthmus_substrait.json
Example usage with Ibis:
produce_substrait --producer IbisProducer --schema ./tables.py --query_type ibis --ibis_expr expr=./expr.py --validate --validator_overrides 1002 2001 1
Validating plan: True
Validator override levels: [1002, 2001, 1]
substrait plan written to: Ibis_substrait.json
Arguments:
--producer: Which substrait producer to generate the plan with.
--schema: SQL Schema. Each create table command should be on its own line.
--query: SQL query.
--ibis_expr: Ibis expression. Argument should passed as a key value pair with the ibis expression function name and the python file with the function definition (--ibis_expr test_expr=./expression.py
)
--validate: Default is false. If set, the plan will run against the substrait validator.
--validator_overrides: Bypass error codes found by the substrait validator. One or more integers separates by spaces.
Sample schemas:
SQL: schema.sql
CREATE TABLE lineitem(l_orderkey INTEGER NOT NULL, l_partkey INTEGER NOT NULL, l_suppkey INTEGER NOT NULL, l_linenumber INTEGER NOT NULL, l_quantity INTEGER NOT NULL, l_extendedprice DECIMAL(15,2) NOT NULL, l_discount DECIMAL(15,2) NOT NULL, l_tax DECIMAL(15,2) NOT NULL, l_returnflag VARCHAR NOT NULL, l_linestatus VARCHAR NOT NULL, l_shipdate DATE NOT NULL, l_commitdate DATE NOT NULL, l_receiptdate DATE NOT NULL, l_shipinstruct VARCHAR NOT NULL, l_shipmode VARCHAR NOT NULL, l_comment VARCHAR NOT NULL);
**Note: Each CREATE TABLE
command should be on its own line.
Ibis: ibis_schema.py
lineitem = ibis.table(
[
("l_orderkey", dt.int64),
("l_partkey", dt.int64),
("l_suppkey", dt.int64),
("l_linenumber", dt.int64),
("l_quantity", dt.Decimal(15, 2)),
("l_extendedprice", dt.Decimal(15, 2)),
("l_discount", dt.Decimal(15, 2)),
("l_tax", dt.Decimal(15, 2)),
("l_returnflag", dt.string),
("l_linestatus", dt.string),
("l_shipdate", dt.date),
("l_commitdate", dt.date),
("l_receiptdate", dt.date),
("l_shipinstruct", dt.string),
("l_shipmode", dt.string),
("l_comment", dt.string),
],
name="lineitem",
)
--schema ./ibis_schema.py
Sample ibis expression file:
expression.py
def expr(lineitem):
new_col = lineitem.l_tax.acos().name("ACOS_TAX")
return lineitem[new_col]
--query_type ibis --ibis_expr expr=./expression.py
**Note: The key name (expr
) should be the same as the function.
Consume Substrait Plans
Example usage
consume_substrait --consumer DuckDBConsumer --substrait_plan ./Isthmus_substrait.json --table_file_pair lineitem=./lineitem.parquet region=./region.parquet
Arguments:
--consumer: Which substrait consumer to consume the plan with.
--substrait_plan: Json formatted substrait plan.
--table_file_pair: One or more table name and file pairings. Table name and file should be separated by an equal (=) sign.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file substrait_tools-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: substrait_tools-0.0.3-py3-none-any.whl
- Upload date:
- Size: 43.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 180295d18188d3fc8a5d9f63c311bb91ec956f915dd2c353ee7d9e2fd54c299f |
|
MD5 | b1b95c405aa89ed6e095c29c56e601a5 |
|
BLAKE2b-256 | f9b3f26e605ec2298ec6dd057c035cb3cf4b406f64ea1f38d945b4ca600208de |