All-in-one platform for data and AI/ML engineering
Project description
Seeknal
Transform data with SQL and Python. Build ML features with point-in-time joins. Materialize to PostgreSQL and Iceberg — all from one CLI.
Seeknal is an all-in-one platform for data and AI/ML engineering. Define pipelines in YAML or Python, run them through a safe draft → dry-run → apply workflow, and materialize outputs to PostgreSQL and Apache Iceberg simultaneously. Python 3.11+ required.
Quick Start
pip install seeknal
# Optional, only for distributed Spark execution:
# pip install "seeknal[spark]"
seeknal init --name my_project
seeknal draft --name my_pipeline --type transform
seeknal dry-run
seeknal apply
Explore your data interactively or search docs from the terminal:
seeknal repl # Interactive SQL on pipeline outputs
seeknal docs query # Search documentation from the CLI
SELECT customer_id, COUNT(*) as order_count
FROM target.my_transform
GROUP BY customer_id;
Key Features
Dual Pipeline Authoring — Write pipelines in YAML, Python decorators, or both:
from seeknal.pipeline import source, transform
@source(name="orders", source="csv", table="data/orders.csv")
def orders():
pass
@transform(name="order_metrics", inputs=["source.orders"])
def order_metrics(ctx):
df = ctx.ref("source.orders")
return ctx.duckdb.sql(
"SELECT customer_id, SUM(amount) as total FROM df GROUP BY customer_id"
).df()
Multi-Target Materialization — Write to PostgreSQL and Iceberg from a single node:
materializations:
- type: postgresql
connection: local_pg
table: analytics.my_table
mode: upsert_by_key
unique_keys: [id]
- type: iceberg
table: atlas.namespace.my_table
Environment Management — Isolated namespaces with per-environment profiles:
seeknal env plan dev --profile profiles-dev.yml
seeknal env apply dev
seeknal run --env dev
Feature Store — Define ML features in YAML or Python with entity keys, point-in-time joins, and automatic versioning. Supports offline (batch) and online (real-time) serving.
# seeknal/feature_groups/customer_features.yml
kind: feature_group
name: customer_features
entity:
name: customer
join_keys: ["customer_id"]
materialization:
event_time_col: latest_order_date
offline: { enabled: true, format: parquet }
online: { enabled: false, ttl: 7d }
features:
total_orders: { dtype: integer }
total_spent: { dtype: float }
avg_order_value: { dtype: float }
inputs:
- ref: transform.customer_orders
# Or use Python decorators
@feature_group(name="customer_rfm", entity="customer")
def customer_rfm(ctx):
df = ctx.ref("transform.clean_transactions")
return ctx.duckdb.sql("""
SELECT CustomerID, COUNT(DISTINCT InvoiceNo) as frequency,
SUM(TotalAmount) as monetary_value
FROM df GROUP BY CustomerID
""").df()
seeknal entity list # Cross-feature-group consolidation
seeknal entity show customer # Inspect entity schema and feature groups
Interactive SQL REPL — Auto-registers parquets, PostgreSQL, and Iceberg sources at startup. Query pipeline outputs, explore data, iterate on SQL — all without leaving the terminal.
AI-Powered Thinking Partner — seeknal ask chat is your collaborative partner for data work. The agent uses 16 tools for fast data access and 11 built-in skills for multi-step workflows like report generation, pipeline building, and data profiling — all loaded on demand to keep responses fast:
seeknal ask chat # Start a brainstorm / build session
seeknal ask "What are the top 5 customers by revenue?" # Quick one-shot question
seeknal ask report "customer analysis" # Generate interactive HTML dashboard
seeknal ask chat --web # Enable web search for benchmarks
Ask it to build a pipeline from scratch, and it will draft a plan, walk you through the design, and wait for your go-ahead before generating code. Publish reports to a self-hosted Seeknal Report Server and share them with your team via a URL.
seeknal report-server start # Host published reports
seeknal gateway start # Expose ask as an API (WebSocket/SSE/REST)
Supports Google Gemini (default) and Ollama (local). Use --provider ollama for fully local, private analysis.
Documentation
| Getting Started | Installation, configuration, first pipeline |
| CLI Reference | All commands and flags |
| YAML Schema | Pipeline YAML reference |
| CLI Docs Search | Search documentation from the terminal (seeknal docs) |
| Tutorials | YAML Pipelines · Python Pipelines · Mixed · Seeknal Ask Agent · Report Exposures |
| Guides | Python Pipelines · Testing & Audits · Iceberg Materialization · Training to Serving |
| Servers | Gateway Server · Report Server |
| Concepts | Point-in-Time Joins · Virtual Environments · Exposures · Glossary |
Changelog
v2.8.0 (April 2026)
OpenAI/Anthropic providers + SQL safety + context files — Adds two new LLM provider families, execution guards on execute_sql, a pre-execution preview_query tool, persistent context files, and durable preferences.
- OpenAI + Anthropic support:
gpt-4o,claude-*, Azure OpenAI, Together, Groq, vLLM, LM Studio, and any OpenAI-compatible proxy viaSEEKNAL_ASK_OPENAI_BASE_URL/SEEKNAL_ASK_ANTHROPIC_BASE_URL execute_sqlguards: rows capped at 500, columns at 50, per-cell length at 200 chars, 50 KB markdown budget — every truncation emits an actionable notice with accurate total row countpreview_querytool: four pre-execution safety probes (row count, column count, JOIN fan-out, dry-run reachability) — blocks queries returning ≥100k rows; pure aggregations auto-skip- Context files:
list_context_filesandwrite_project_filetools scan/write{project}/context/with path-traversal guards - Durable preferences:
save_preferenceappends topreferences.yml; preferences are injected into the system prompt on every session
v2.7.1 (April 2026)
Gateway pairing + execute_uv_script + pipeline runtime helpers — Additive batch combining gateway Telegram pairing, a new agent tool for running uv-managed scripts, and lightweight per-node runtime helpers.
- Gateway pairing:
FilePairingStore,TelegramLinkStore,PublicSessionStorewired into lifespan;/pairTelegram command for admin-generated codes execute_uv_scripttool: run arbitrary uv-managed Python scripts from the agent with full dependency isolation- Pipeline runtime:
ctx.llm(Ask-aligned text/JSON generation) andctx.state(lightweight per-node persistent state) helpers available inside@transformfunctions - Config discovery:
find_agent_config_path()locatesseeknal_agent.ymlunder project root orseeknal/directory
v2.6.0 (April 2026)
Skills-Powered Agent + Report Server — The ask agent now uses a thin-tools/fat-skills architecture: 16 lean tools for fast data access, 11 built-in skills for multi-step workflows (reports, pipelines, profiling, metrics, publishing). Skills load on demand via progressive disclosure, keeping the agent's context lean.
- Seeknal Report Server (
seeknal report-server start): self-hosted server for publishing and sharing reports via unique URLs — publish from the chat TUI or the agent tool - 11 built-in skills: report generation, pipeline building, data profiling, Python analysis, semantic model bootstrap, metric query/save, report exposure codification, Proof Editor publishing
- Chat enhancements:
--style(concise/explanatory/formal/conversational),--budget(USD cap),--web(DuckDuckGo search),--session/--name(named session resume) - Gateway improvements: cloud-only backend mode, standalone workers, Redis multi-replica, split topology
- Auto
.envloading:--project <path>loads<path>/.envautomatically - Error UX: network errors classified with actionable hints; error logs saved to
~/.seeknal/logs/
Install from Source
For development or contributing:
git clone https://github.com/mta-tech/seeknal.git
cd seeknal
uv venv --python 3.11 && source .venv/bin/activate
uv pip install -e ".[all]"
Contributing
Contributions are welcome! See CONTRIBUTING.md for setup, code style, testing, and PR guidelines.
License
Seeknal is Apache 2.0 licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seeknal-2.8.2.tar.gz.
File metadata
- Download URL: seeknal-2.8.2.tar.gz
- Upload date:
- Size: 764.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f71e44660165e14eaf61a28f14803a3d6714ca715d76c05e575ac298e6d25781
|
|
| MD5 |
e098915ee5c1d051fe0cb3f1d50462b4
|
|
| BLAKE2b-256 |
86ef38b3929dc02a2399c9b149fd1b38570a72a1b5c934bfae5f8b2077c42407
|
Provenance
The following attestation bundles were made for seeknal-2.8.2.tar.gz:
Publisher:
release.yml on mta-tech/seeknal
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
seeknal-2.8.2.tar.gz -
Subject digest:
f71e44660165e14eaf61a28f14803a3d6714ca715d76c05e575ac298e6d25781 - Sigstore transparency entry: 1371340020
- Sigstore integration time:
-
Permalink:
mta-tech/seeknal@2c27d797f905ab289801901db9eaaaacc871da72 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mta-tech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2c27d797f905ab289801901db9eaaaacc871da72 -
Trigger Event:
push
-
Statement type:
File details
Details for the file seeknal-2.8.2-py3-none-any.whl.
File metadata
- Download URL: seeknal-2.8.2-py3-none-any.whl
- Upload date:
- Size: 933.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4a58b77da3d897dc019cb7aa739f837ffb5307402f28c0e9d31d5bf9a8454de
|
|
| MD5 |
3cc6f89532242db6b4767474b2461411
|
|
| BLAKE2b-256 |
3ade9c4cb2d4fb634953d8346ac78727b4689c19ca2f193f62f299dd6d66f8c0
|
Provenance
The following attestation bundles were made for seeknal-2.8.2-py3-none-any.whl:
Publisher:
release.yml on mta-tech/seeknal
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
seeknal-2.8.2-py3-none-any.whl -
Subject digest:
d4a58b77da3d897dc019cb7aa739f837ffb5307402f28c0e9d31d5bf9a8454de - Sigstore transparency entry: 1371340123
- Sigstore integration time:
-
Permalink:
mta-tech/seeknal@2c27d797f905ab289801901db9eaaaacc871da72 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mta-tech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2c27d797f905ab289801901db9eaaaacc871da72 -
Trigger Event:
push
-
Statement type: