All-in-one platform for data and AI/ML engineering
Project description
Seeknal
Transform data with SQL and Python. Build ML features with point-in-time joins. Materialize to PostgreSQL and Iceberg — all from one CLI.
Seeknal is an all-in-one platform for data and AI/ML engineering. Define pipelines in YAML or Python, run them through a safe draft → dry-run → apply workflow, and materialize outputs to PostgreSQL and Apache Iceberg simultaneously. Python 3.11+ required.
Quick Start
pip install seeknal
seeknal init --name my_project
seeknal draft --name my_pipeline --type transform
seeknal dry-run
seeknal apply
Explore your data interactively or search docs from the terminal:
seeknal repl # Interactive SQL on pipeline outputs
seeknal docs query # Search documentation from the CLI
SELECT customer_id, COUNT(*) as order_count
FROM target.my_transform
GROUP BY customer_id;
Key Features
Dual Pipeline Authoring — Write pipelines in YAML, Python decorators, or both:
from seeknal.pipeline import source, transform
@source(name="orders", source="csv", table="data/orders.csv")
def orders():
pass
@transform(name="order_metrics", inputs=["source.orders"])
def order_metrics(ctx):
df = ctx.ref("source.orders")
return ctx.duckdb.sql(
"SELECT customer_id, SUM(amount) as total FROM df GROUP BY customer_id"
).df()
Multi-Target Materialization — Write to PostgreSQL and Iceberg from a single node:
materializations:
- type: postgresql
connection: local_pg
table: analytics.my_table
mode: upsert_by_key
unique_keys: [id]
- type: iceberg
table: atlas.namespace.my_table
Environment Management — Isolated namespaces with per-environment profiles:
seeknal env plan dev --profile profiles-dev.yml
seeknal env apply dev
seeknal run --env dev
Feature Store — Point-in-time joins, automatic versioning, offline and online serving. Powered by DuckDB (single-node, <100M rows) or Apache Spark (distributed).
from seeknal.featurestore.duckdbengine.feature_group import FeatureGroupDuckDB, FeatureLookup, Materialization, HistoricalFeaturesDuckDB
from seeknal.entity import Entity
fg = FeatureGroupDuckDB(
name="user_features",
entity=Entity(name="user", join_keys=["user_id"]),
materialization=Materialization(event_time_col="event_time"),
)
fg.set_dataframe(df).set_features()
fg.write(feature_start_time=datetime(2024, 1, 1))
# Point-in-time join (prevents data leakage)
hist = HistoricalFeaturesDuckDB(lookups=[FeatureLookup(source=fg)])
training_df = hist.to_dataframe(feature_start_time=datetime(2024, 1, 1))
Interactive SQL REPL — Auto-registers parquets, PostgreSQL, and Iceberg sources at startup. Query pipeline outputs, explore data, iterate on SQL — all without leaving the terminal.
Documentation
| Getting Started | Installation, configuration, first pipeline |
| CLI Reference | All commands and flags |
| YAML Schema | Pipeline YAML reference |
| CLI Docs Search | Search documentation from the terminal (seeknal docs) |
| Tutorials | YAML Pipelines · Python Pipelines · Mixed |
| Guides | Python Pipelines · Testing & Audits · Iceberg Materialization · Training to Serving |
| Concepts | Point-in-Time Joins · Virtual Environments · Glossary |
Changelog
v2.3.0 (March 2026)
Incremental Detection — Automatically skip unchanged data sources and process only new data:
# PostgreSQL watermark-based incremental detection
- kind: source
name: events
source: postgresql
table: public.events
freshness:
time_column: created_at # Tracks MAX(created_at) watermark
params:
connection: my_pg
- PostgreSQL Incremental: Watermark-based detection using
MAX(time_column)comparison. Automatically generatesWHERE time_col > 'watermark' OR time_col IS NULLfor incremental reads. - Iceberg Incremental: Snapshot-based detection comparing current snapshot ID. Supports partition pruning for time-partitioned tables.
- Skip Optimization: If fingerprint and watermark match, source execution is skipped entirely.
- Cascade Invalidation: Dependent nodes are automatically invalidated when source data changes.
- Full Refresh: Use
--fullflag to ignore stored watermarks and reload all data.
Other Changes:
- Enhanced QA automation with multi-spec execution support
- Pipeline error logging with
--verbosemode - Security fix: Updated
cryptographyto 46.0.5 (CVE-2026-26007)
v2.2.2 (February 2026)
- Entity consolidation for per-entity feature views
- Multi-target materialization (PostgreSQL + Iceberg from single node)
- Environment-aware execution with namespace prefixing
Install from Source
For development or contributing:
git clone https://github.com/mta-tech/seeknal.git
cd seeknal
uv venv --python 3.11 && source .venv/bin/activate
uv pip install -e ".[all]"
Contributing
Contributions are welcome! See CONTRIBUTING.md for setup, code style, testing, and PR guidelines.
License
Seeknal is Apache 2.0 licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seeknal-2.3.0.tar.gz.
File metadata
- Download URL: seeknal-2.3.0.tar.gz
- Upload date:
- Size: 469.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8a5ef6760e61121f2e504b2b20eedbc3eee74f71e4f642b7fa4ee96c127b648
|
|
| MD5 |
88a3c35c58e8ec235d1491408e4924e0
|
|
| BLAKE2b-256 |
e11cdce7a608fd546ad5bf7a01391c95184f038a5a420b85de46120345b6b993
|
Provenance
The following attestation bundles were made for seeknal-2.3.0.tar.gz:
Publisher:
release.yml on mta-tech/seeknal
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
seeknal-2.3.0.tar.gz -
Subject digest:
f8a5ef6760e61121f2e504b2b20eedbc3eee74f71e4f642b7fa4ee96c127b648 - Sigstore transparency entry: 1018834628
- Sigstore integration time:
-
Permalink:
mta-tech/seeknal@f443fee6389fbaa14486f1d6893a33db15cdda0f -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mta-tech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f443fee6389fbaa14486f1d6893a33db15cdda0f -
Trigger Event:
push
-
Statement type:
File details
Details for the file seeknal-2.3.0-py3-none-any.whl.
File metadata
- Download URL: seeknal-2.3.0-py3-none-any.whl
- Upload date:
- Size: 557.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7b77d509c97c4ac162750d2feb34205dc42bd9cebc9572092a7c15eb50093d8
|
|
| MD5 |
5bd9caf09a6ffd61c25b7bce9ef7eac9
|
|
| BLAKE2b-256 |
dd6827b03f18fa2e4587d1bf9d49209dedd9c78b91f3284c49bca8057aa345fa
|
Provenance
The following attestation bundles were made for seeknal-2.3.0-py3-none-any.whl:
Publisher:
release.yml on mta-tech/seeknal
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
seeknal-2.3.0-py3-none-any.whl -
Subject digest:
b7b77d509c97c4ac162750d2feb34205dc42bd9cebc9572092a7c15eb50093d8 - Sigstore transparency entry: 1018834666
- Sigstore integration time:
-
Permalink:
mta-tech/seeknal@f443fee6389fbaa14486f1d6893a33db15cdda0f -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mta-tech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f443fee6389fbaa14486f1d6893a33db15cdda0f -
Trigger Event:
push
-
Statement type: