Skip to main content

All-in-one platform for data and AI/ML engineering

Project description

Seeknal

Transform data with SQL and Python. Build ML features with point-in-time joins. Materialize to PostgreSQL and Iceberg — all from one CLI.

PyPI version Python versions License CI

Seeknal is an all-in-one platform for data and AI/ML engineering. Define pipelines in YAML or Python, run them through a safe draft → dry-run → apply workflow, and materialize outputs to PostgreSQL and Apache Iceberg simultaneously. Python 3.11+ required.

Quick Start

pip install seeknal

seeknal init --name my_project
seeknal draft --name my_pipeline --type transform
seeknal dry-run
seeknal apply

Explore your data interactively or search docs from the terminal:

seeknal repl          # Interactive SQL on pipeline outputs
seeknal docs query    # Search documentation from the CLI
SELECT customer_id, COUNT(*) as order_count
FROM target.my_transform
GROUP BY customer_id;

Key Features

Dual Pipeline Authoring — Write pipelines in YAML, Python decorators, or both:

from seeknal.pipeline import source, transform

@source(name="orders", source="csv", table="data/orders.csv")
def orders():
    pass

@transform(name="order_metrics", inputs=["source.orders"])
def order_metrics(ctx):
    df = ctx.ref("source.orders")
    return ctx.duckdb.sql(
        "SELECT customer_id, SUM(amount) as total FROM df GROUP BY customer_id"
    ).df()

Multi-Target Materialization — Write to PostgreSQL and Iceberg from a single node:

materializations:
  - type: postgresql
    connection: local_pg
    table: analytics.my_table
    mode: upsert_by_key
    unique_keys: [id]
  - type: iceberg
    table: atlas.namespace.my_table

Environment Management — Isolated namespaces with per-environment profiles:

seeknal env plan dev --profile profiles-dev.yml
seeknal env apply dev
seeknal run --env dev

Feature Store — Define ML features in YAML or Python with entity keys, point-in-time joins, and automatic versioning. Supports offline (batch) and online (real-time) serving.

# seeknal/feature_groups/customer_features.yml
kind: feature_group
name: customer_features
entity:
  name: customer
  join_keys: ["customer_id"]
materialization:
  event_time_col: latest_order_date
  offline: { enabled: true, format: parquet }
  online: { enabled: false, ttl: 7d }
features:
  total_orders: { dtype: integer }
  total_spent: { dtype: float }
  avg_order_value: { dtype: float }
inputs:
  - ref: transform.customer_orders
# Or use Python decorators
@feature_group(name="customer_rfm", entity="customer")
def customer_rfm(ctx):
    df = ctx.ref("transform.clean_transactions")
    return ctx.duckdb.sql("""
        SELECT CustomerID, COUNT(DISTINCT InvoiceNo) as frequency,
               SUM(TotalAmount) as monetary_value
        FROM df GROUP BY CustomerID
    """).df()
seeknal entity list                           # Cross-feature-group consolidation
seeknal entity show customer                  # Inspect entity schema and feature groups

Interactive SQL REPL — Auto-registers parquets, PostgreSQL, and Iceberg sources at startup. Query pipeline outputs, explore data, iterate on SQL — all without leaving the terminal.

AI-Powered Thinking Partnerseeknal ask chat is your collaborative partner for data work. The agent uses 16 tools for fast data access and 11 built-in skills for multi-step workflows like report generation, pipeline building, and data profiling — all loaded on demand to keep responses fast:

seeknal ask chat                        # Start a brainstorm / build session
seeknal ask "What are the top 5 customers by revenue?"  # Quick one-shot question
seeknal ask report "customer analysis"  # Generate interactive HTML dashboard
seeknal ask chat --web                  # Enable web search for benchmarks

Ask it to build a pipeline from scratch, and it will draft a plan, walk you through the design, and wait for your go-ahead before generating code. Publish reports to a self-hosted Seeknal Report Server and share them with your team via a URL.

seeknal report-server start             # Host published reports
seeknal gateway start                   # Expose ask as an API (WebSocket/SSE/REST)

Supports Google Gemini (default) and Ollama (local). Use --provider ollama for fully local, private analysis.

Documentation

Getting Started Installation, configuration, first pipeline
CLI Reference All commands and flags
YAML Schema Pipeline YAML reference
CLI Docs Search Search documentation from the terminal (seeknal docs)
Tutorials YAML Pipelines · Python Pipelines · Mixed · Seeknal Ask Agent · Report Exposures
Guides Python Pipelines · Testing & Audits · Iceberg Materialization · Training to Serving
Servers Gateway Server · Report Server
Concepts Point-in-Time Joins · Virtual Environments · Exposures · Glossary

Changelog

v2.6.0 (April 2026)

Skills-Powered Agent + Report Server — The ask agent now uses a thin-tools/fat-skills architecture: 16 lean tools for fast data access, 11 built-in skills for multi-step workflows (reports, pipelines, profiling, metrics, publishing). Skills load on demand via progressive disclosure, keeping the agent's context lean.

  • Seeknal Report Server (seeknal report-server start): self-hosted server for publishing and sharing reports via unique URLs — publish from the chat TUI or the agent tool
  • 11 built-in skills: report generation, pipeline building, data profiling, Python analysis, semantic model bootstrap, metric query/save, report exposure codification, Proof Editor publishing
  • Chat enhancements: --style (concise/explanatory/formal/conversational), --budget (USD cap), --web (DuckDuckGo search), --session/--name (named session resume)
  • Gateway improvements: cloud-only backend mode, standalone workers, Redis multi-replica, split topology
  • Auto .env loading: --project <path> loads <path>/.env automatically
  • Error UX: network errors classified with actionable hints; error logs saved to ~/.seeknal/logs/

v2.5.0 (April 2026)

Seeknal as Your Thinking Partnerseeknal ask chat is now a collaborative partner that brainstorms, builds pipelines, and trains models with you through conversation. It always asks for confirmation before acting — you stay in control.

  • Interactive chat mode (seeknal ask chat): multi-turn brainstorm and build sessions with persistent history, streaming UI with Claude Code-inspired visual hierarchy
  • Confirmation-first workflow: the agent proposes plans and analysis directions, then waits for your go-ahead via interactive menus before executing
  • Pipeline and ML building: describe what you want to build in plain language — the agent drafts YAML pipelines, feature groups, or model training code and checks in before generating
  • Session management: create, resume, list, and delete sessions with full message persistence (seeknal session list/show/delete)
  • Iceberg REST catalog support: integrates with any Iceberg REST catalog provider (Lakekeeper, Tabular, Polaris, etc.)
  • Gateway server: WebSocket, SSE, and REST endpoints for web clients; optional Telegram bot integration
  • UI refresh: animated fox mascot, interactive arrow-key menus, real token/tool counters, subordinate reasoning display

v2.4.0 (March 2026)

Seeknal Ask — AI-Powered Data Agent — Natural language data analysis with 12 built-in tools:

seeknal ask "What are the top 5 customers by revenue?"
seeknal ask chat                                        # Interactive multi-turn session
seeknal ask report "customer segmentation"              # AI-guided HTML dashboard
seeknal ask report --exposure monthly_kpis              # Deterministic report exposure
seeknal ask report serve my-report                      # Live-preview with Evidence dev server
  • One-shot & chat modes: Ask questions or start multi-turn sessions with conversation memory
  • 12 agent tools: Data discovery, SQL execution, Python analysis (pandas/scipy/matplotlib), pipeline inspection, and report generation
  • Report exposures: Define repeatable reports in YAML with pinned SQL queries, chart types (BigValue, BarChart, LineChart, AreaChart, DataTable), and LLM-generated narratives
  • Deterministic reports: sections key pins SQL and charts — LLM only writes commentary
  • Dual output: Both interactive HTML dashboards and standalone Markdown reports
  • LLM providers: Google Gemini (default) and Ollama (local, no API key)
  • Subprocess sandbox: Python execution runs in isolated subprocess with restricted imports

v2.3.0 (March 2026)

Incremental Detection — Automatically skip unchanged data sources and process only new data:

# PostgreSQL watermark-based incremental detection
- kind: source
  name: events
  source: postgresql
  table: public.events
  freshness:
    time_column: created_at  # Tracks MAX(created_at) watermark
  params:
    connection: my_pg
  • PostgreSQL Incremental: Watermark-based detection using MAX(time_column) comparison. Automatically generates WHERE time_col > 'watermark' OR time_col IS NULL for incremental reads.
  • Iceberg Incremental: Snapshot-based detection comparing current snapshot ID. Supports partition pruning for time-partitioned tables.
  • Skip Optimization: If fingerprint and watermark match, source execution is skipped entirely.
  • Cascade Invalidation: Dependent nodes are automatically invalidated when source data changes.
  • Full Refresh: Use --full flag to ignore stored watermarks and reload all data.

Other Changes:

  • Enhanced QA automation with multi-spec execution support
  • Pipeline error logging with --verbose mode
  • Security fix: Updated cryptography to 46.0.5 (CVE-2026-26007)

v2.2.2 (February 2026)

  • Entity consolidation for per-entity feature views
  • Multi-target materialization (PostgreSQL + Iceberg from single node)
  • Environment-aware execution with namespace prefixing

Install from Source

For development or contributing:

git clone https://github.com/mta-tech/seeknal.git
cd seeknal
uv venv --python 3.11 && source .venv/bin/activate
uv pip install -e ".[all]"

Contributing

Contributions are welcome! See CONTRIBUTING.md for setup, code style, testing, and PR guidelines.

License

Seeknal is Apache 2.0 licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seeknal-2.7.1.tar.gz (735.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seeknal-2.7.1-py3-none-any.whl (893.6 kB view details)

Uploaded Python 3

File details

Details for the file seeknal-2.7.1.tar.gz.

File metadata

  • Download URL: seeknal-2.7.1.tar.gz
  • Upload date:
  • Size: 735.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seeknal-2.7.1.tar.gz
Algorithm Hash digest
SHA256 344d41e532b930a5a54c90a8dbf65966e70b5992b84edff3e4b5269c6715e6b4
MD5 1fd341558add3afacbfed448c543e01e
BLAKE2b-256 388cf2284a9154988bb5fe8659a8f15ab6f6715bf6d56674e03335885a89f9e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for seeknal-2.7.1.tar.gz:

Publisher: release.yml on mta-tech/seeknal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seeknal-2.7.1-py3-none-any.whl.

File metadata

  • Download URL: seeknal-2.7.1-py3-none-any.whl
  • Upload date:
  • Size: 893.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seeknal-2.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 49a265ca96125df7a6021c0e943693319106bb17af8887ae1ded7a96453f19e7
MD5 275995db4361b9d98629429c6a18ad84
BLAKE2b-256 ea7abff80a9ade73794a76cd9a55333c32a2f8e7a788172446f78fef6ae4893a

See more details on using hashes here.

Provenance

The following attestation bundles were made for seeknal-2.7.1-py3-none-any.whl:

Publisher: release.yml on mta-tech/seeknal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page