All-in-one platform for data and AI/ML engineering
Project description
Seeknal
Transform data with SQL and Python. Build ML features with point-in-time joins. Materialize to PostgreSQL and Iceberg — all from one CLI.
Seeknal is an all-in-one platform for data and AI/ML engineering. Define pipelines in YAML or Python, run them through a safe draft → dry-run → apply workflow, and materialize outputs to PostgreSQL and Apache Iceberg simultaneously. Python 3.11+ required.
Quick Start
pip install seeknal
# Optional, only for distributed Spark execution:
# pip install "seeknal[spark]"
seeknal init --name my_project
seeknal draft --name my_pipeline --type transform
seeknal dry-run
seeknal apply
Explore your data interactively or search docs from the terminal:
seeknal repl # Interactive SQL on pipeline outputs
seeknal docs query # Search documentation from the CLI
SELECT customer_id, COUNT(*) as order_count
FROM target.my_transform
GROUP BY customer_id;
Key Features
Dual Pipeline Authoring — Write pipelines in YAML, Python decorators, or both:
from seeknal.pipeline import source, transform
@source(name="orders", source="csv", table="data/orders.csv")
def orders():
pass
@transform(name="order_metrics", inputs=["source.orders"])
def order_metrics(ctx):
df = ctx.ref("source.orders")
return ctx.duckdb.sql(
"SELECT customer_id, SUM(amount) as total FROM df GROUP BY customer_id"
).df()
Multi-Target Materialization — Write to PostgreSQL and Iceberg from a single node:
materializations:
- type: postgresql
connection: local_pg
table: analytics.my_table
mode: upsert_by_key
unique_keys: [id]
- type: iceberg
table: atlas.namespace.my_table
Environment Management — Isolated namespaces with per-environment profiles:
seeknal env plan dev --profile profiles-dev.yml
seeknal env apply dev
seeknal run --env dev
Feature Store — Define ML features in YAML or Python with entity keys, point-in-time joins, and automatic versioning. Supports offline (batch) and online (real-time) serving.
# seeknal/feature_groups/customer_features.yml
kind: feature_group
name: customer_features
entity:
name: customer
join_keys: ["customer_id"]
materialization:
event_time_col: latest_order_date
offline: { enabled: true, format: parquet }
online: { enabled: false, ttl: 7d }
features:
total_orders: { dtype: integer }
total_spent: { dtype: float }
avg_order_value: { dtype: float }
inputs:
- ref: transform.customer_orders
# Or use Python decorators
@feature_group(name="customer_rfm", entity="customer")
def customer_rfm(ctx):
df = ctx.ref("transform.clean_transactions")
return ctx.duckdb.sql("""
SELECT CustomerID, COUNT(DISTINCT InvoiceNo) as frequency,
SUM(TotalAmount) as monetary_value
FROM df GROUP BY CustomerID
""").df()
seeknal entity list # Cross-feature-group consolidation
seeknal entity show customer # Inspect entity schema and feature groups
Interactive SQL REPL — Auto-registers parquets, PostgreSQL, and Iceberg sources at startup. Query pipeline outputs, explore data, iterate on SQL — all without leaving the terminal.
AI-Powered Thinking Partner — seeknal ask chat is your collaborative partner for data work. The agent uses thin tools for fast data access and fat skills for multi-step workflows like report generation, pipeline building, database analysis, and data profiling — all loaded on demand to keep responses fast:
seeknal ask chat # Start a brainstorm / build session (interactive TUI)
seeknal ask "What are the top 5 customers by revenue?" # Quick one-shot question
seeknal ask report "customer analysis" # Generate interactive HTML dashboard
seeknal ask test --project . --sql-only # Validate project prompt-to-SQL tests
seeknal ask chat --web # Enable web search for benchmarks
seeknal ask chat launches an interactive terminal UI (Bun + React + Ink) with streaming tokens, tool visualization, and arrow-key ask_user picker for approval gates. The TUI is bundled inside the wheel; end users do not need Bun or Node. One-shot (seeknal ask "...") and report (seeknal ask report) commands use Python-only rendering with no TUI.
Ask it to answer questions against existing read-only databases with seeknal source connect, reuse project SQL examples from seeknal/sql_pairs/, and validate important questions with executable seeknal/tests/ QA oracles. Ask it to build a pipeline from scratch, and it will draft a plan, walk you through the design, and wait for your go-ahead before generating code. Publish reports to a self-hosted Seeknal Report Server and share them with your team via a URL.
For editable installs (pip install -e .), set SEEKNAL_TUI_BINARY_PATH to your local TUI build. See src/seeknal/ask/tui/README.md for full TypeScript contributing guide and development workflow.
seeknal report-server start # Host published reports
seeknal gateway start # Expose ask as an API (WebSocket/SSE/REST)
seeknal gateway worker --gateway-url http://gateway:8000 --api-token "$SEEKNAL_API_TOKEN" # Token-routed Temporal worker
Supports Google Gemini (default), OpenAI-compatible providers, Anthropic-compatible providers, and Ollama (local). Use --provider ollama for fully local, private analysis.
Documentation
| Getting Started | Installation, configuration, first pipeline |
| CLI Reference | All commands and flags |
| YAML Schema | Pipeline YAML reference |
| CLI Docs Search | Search documentation from the terminal (seeknal docs) |
| Tutorials | YAML Pipelines · Python Pipelines · Mixed · Seeknal Ask Agent · Report Exposures |
| Guides | Python Pipelines · Testing & Audits · Iceberg Materialization · Training to Serving |
| Servers | Gateway Server · Report Server |
| Concepts | Point-in-Time Joins · Virtual Environments · Exposures · Glossary |
Changelog
v2.9.1 (April 2026)
HTTP-only Ask worker mode — Adds a gateway-routed worker topology where workers only need outbound HTTP(S) to Seeknal Gateway or a compatible kc-service gateway.
- HTTP worker transport:
seeknal gateway worker --transport httplong-polls gateway work-stream endpoints, runs Ask locally near the data, and posts streaming events plus completion back over HTTP. - Gateway broker mode:
seeknal gateway start --temporal --worker-transport httpandseeknal gateway backend --worker-transport httpkeep Temporal routing inside the gateway while external workers avoid Temporal credentials/network access. - Token-routed runtime config: token records can advertise
worker_transport: http; workers still bootstrap fromSEEKNAL_GATEWAY_URL+SEEKNAL_API_TOKEN. - Worker reliability fixes: project
.envis loaded in worker mode, gateway polling retries transient connection failures, and Temporal activities heartbeat while waiting for HTTP workers. - pydantic-deep compatibility: skips unsupported
stuck_loop_detectionpassthrough on current runtime versions while preserving config compatibility in tests/mocks.
v2.9.0 (April 2026)
Read-only Ask source harness + project SQL QA — Adds a TUI-first workflow for users who already have analytical tables in a database and want Seeknal Ask to answer business questions without building a pipeline.
- Connected-source registry:
seeknal source connect/status/inspect/sync/testwritesseeknal_agent.yml, generates.seeknal/context/sources/metadata, and verifies read-only database attachments. - SQL pairs for context:
seeknal/sql_pairs/*.ymlstores prompt-to-SQL examples the Ask agent can discover withlist_sql_pairs/read_sql_pair. - Ask SQL tests:
seeknal ask testruns project-local prompt-to-SQL QA cases fromseeknal/tests/, including SQL-only oracle checks and agent-answer checks. - TUI QA cockpit: Ask chat can list, read, run, and inspect Ask tests via thin tools over the same test engine.
- Structured grading: Ask tests support
assert.compare: dataframefor markdown/JSON table comparison against expected SQL rows. - Init guidance:
seeknal initnow scaffoldsAGENTS.md,CLAUDE.md,seeknal/sql_pairs/, andseeknal/tests/for project-local agent conventions.
v2.8.0 (April 2026)
OpenAI/Anthropic providers + SQL safety + context files — Adds two new LLM provider families, execution guards on execute_sql, a pre-execution preview_query tool, persistent context files, and durable preferences.
- OpenAI + Anthropic support:
gpt-4o,claude-*, Azure OpenAI, Together, Groq, vLLM, LM Studio, and any OpenAI-compatible proxy viaSEEKNAL_ASK_OPENAI_BASE_URL/SEEKNAL_ASK_ANTHROPIC_BASE_URL execute_sqlguards: rows capped at 500, columns at 50, per-cell length at 200 chars, 50 KB markdown budget — every truncation emits an actionable notice with accurate total row countpreview_querytool: four pre-execution safety probes (row count, column count, JOIN fan-out, dry-run reachability) — blocks queries returning ≥100k rows; pure aggregations auto-skip- Context files:
list_context_filesandwrite_project_filetools scan/write{project}/context/with path-traversal guards - Durable preferences:
save_preferenceappends topreferences.yml; preferences are injected into the system prompt on every session
Install from Source
For development or contributing:
git clone https://github.com/mta-tech/seeknal.git
cd seeknal
uv venv --python 3.11 && source .venv/bin/activate
uv pip install -e ".[all]"
Contributing
Contributions are welcome! See CONTRIBUTING.md for setup, code style, testing, and PR guidelines.
License
Seeknal is Apache 2.0 licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seeknal-2.9.5.tar.gz.
File metadata
- Download URL: seeknal-2.9.5.tar.gz
- Upload date:
- Size: 860.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa6498a5118d7c1cb0e06c20e1b6eb9c2409c5e7ad92a5d70fccee5264b1404a
|
|
| MD5 |
aa126958714ecdae345c2bc5ede0818d
|
|
| BLAKE2b-256 |
a7015cace20426fc17adf09395aff5f3a2739da51a530b1747446a8d7410715b
|
Provenance
The following attestation bundles were made for seeknal-2.9.5.tar.gz:
Publisher:
release.yml on mta-tech/seeknal
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
seeknal-2.9.5.tar.gz -
Subject digest:
aa6498a5118d7c1cb0e06c20e1b6eb9c2409c5e7ad92a5d70fccee5264b1404a - Sigstore transparency entry: 1591115601
- Sigstore integration time:
-
Permalink:
mta-tech/seeknal@ec47869214cc1ea2c828f56b87144836057176fd -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mta-tech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ec47869214cc1ea2c828f56b87144836057176fd -
Trigger Event:
push
-
Statement type:
File details
Details for the file seeknal-2.9.5-py3-none-any.whl.
File metadata
- Download URL: seeknal-2.9.5-py3-none-any.whl
- Upload date:
- Size: 1.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a5a66764cbc595b8193139321a83a77b1153f089ac47b8b9bf2da0707925da0
|
|
| MD5 |
80dae970e97183bd26074281c1a380e7
|
|
| BLAKE2b-256 |
fa69c4589fb6848480ccf27453d3dc3fd2020cca6dda8fd7cc94ad5759626660
|
Provenance
The following attestation bundles were made for seeknal-2.9.5-py3-none-any.whl:
Publisher:
release.yml on mta-tech/seeknal
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
seeknal-2.9.5-py3-none-any.whl -
Subject digest:
8a5a66764cbc595b8193139321a83a77b1153f089ac47b8b9bf2da0707925da0 - Sigstore transparency entry: 1591115608
- Sigstore integration time:
-
Permalink:
mta-tech/seeknal@ec47869214cc1ea2c828f56b87144836057176fd -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mta-tech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ec47869214cc1ea2c828f56b87144836057176fd -
Trigger Event:
push
-
Statement type: