just a engine template tool
Project description
Jett
Just a Engine Template Tool that easy to use and develop for Data Engineer.
This project support the ETL template for multiple DataFrame engine like
PySpark, Duckdb, Polars, etc.
Supported Features:
- Dynamic Supported Engines via YAML template
- JSON Schema Validation support
📦 Installation
uv pip install -U jett
Engine Supported:
| Name | Status | Description |
|---|---|---|
| Pyspark | ✅ | Pyspark and Spark submit CLI for distributed workload |
| DuckDB | ✅ | DuckDB and Spark API DuckDB |
| Polars | ✅ | Polars for Python workload |
| Arrow | ✅ | Arrow for Python workflow with Columnar |
| Daft | ❌ | Daft for Python distributed workload |
| DBT | ❌ | DBT for SQL workload |
| GX | ❌ | Great Expectation for data quality |
[!NOTE] Version Tracking:
Package Version Next Support Python 3.10.13>=3.11.0Spark 3.4.2>=4.0.0Hadoop 33Java openjdk@11openjdk@17Pyspark 3.4.1>=4.0.0Scala 2.12.172.12.17DuckDB 1.3.2Polars 1.32.0Arrow 21.0.0
📝 Usage
For example, making file, etl.polars.tool (I use .tool be file extension for validate
it with the JSON schema with pattern *.tool), for ETL state like:
type: polars
name: Load CSV to GGSheet
app_name: load_csv_to_ggsheet
master: local
# 1) 🚰 Load data from source
source:
type: local
file_format: csv
path: ./assets/data/customer.csv
# 2) ⚙️ Transform this data.
transforms:
- op: rename_to_snakecase
- op: group
transforms:
- op: expr
sql: "CAST(id AS string)"
# 3) 🎯 Sink result to target
sink:
type: local
file_type: google_sheet
path: ./assets/landing/customer.gsheet
# 4) 📩 Metric that will send after execution.
metric:
- type: console
convertor: basic
- type: restapi
convertor: basic
host: "localhost"
port: 1234
Use by Python API:
from jett import Tool
tool = Tool(path="./etl.spark.tool")
tool.execute(allow_raise=True)
📖 Documents
This project will reference emoji from the Pipeline Emojis.
💬 Contribute
I do not think this project will go around the world because it has specific propose, and you can create by your coding without this project dependency for long term solution. So, on this time, you can open the GitHub issue on this project 🙌 for fix bug or request new feature if you want it.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jett-0.0.2.tar.gz.
File metadata
- Download URL: jett-0.0.2.tar.gz
- Upload date:
- Size: 773.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23b1a4c669521461520f288f5a7bb42a3686ca7bb91087aafd6fa8702a60b712
|
|
| MD5 |
4386b8f0e1a37f4e08c48b0e8660075c
|
|
| BLAKE2b-256 |
ff8e031be7fa4a533a60616062f7ae30b9b14d8de56c3984f56a90c17a98f649
|
Provenance
The following attestation bundles were made for jett-0.0.2.tar.gz:
Publisher:
publish.yml on ddeutils/jett
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jett-0.0.2.tar.gz -
Subject digest:
23b1a4c669521461520f288f5a7bb42a3686ca7bb91087aafd6fa8702a60b712 - Sigstore transparency entry: 444498385
- Sigstore integration time:
-
Permalink:
ddeutils/jett@6f8578b7399eb84071f255145178cf227460df7b -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/ddeutils
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6f8578b7399eb84071f255145178cf227460df7b -
Trigger Event:
release
-
Statement type:
File details
Details for the file jett-0.0.2-py3-none-any.whl.
File metadata
- Download URL: jett-0.0.2-py3-none-any.whl
- Upload date:
- Size: 122.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55326dd7e0d4af1ad77967dd21274111763508a5e73aa0569f7a469af0fd0d1d
|
|
| MD5 |
950c4a3b644bd366cd9976d9825785a6
|
|
| BLAKE2b-256 |
cc45795e9c3dcdec980b933c2cfca16e54f7c58f9ddea64802f75c6341e40d65
|
Provenance
The following attestation bundles were made for jett-0.0.2-py3-none-any.whl:
Publisher:
publish.yml on ddeutils/jett
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jett-0.0.2-py3-none-any.whl -
Subject digest:
55326dd7e0d4af1ad77967dd21274111763508a5e73aa0569f7a469af0fd0d1d - Sigstore transparency entry: 444498405
- Sigstore integration time:
-
Permalink:
ddeutils/jett@6f8578b7399eb84071f255145178cf227460df7b -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/ddeutils
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6f8578b7399eb84071f255145178cf227460df7b -
Trigger Event:
release
-
Statement type: