Skip to main content

All-in-one platform for data and AI/ML engineering

Project description

Seeknal

Seeknal

Transform data with SQL and Python. Build ML features with point-in-time joins. Materialize to PostgreSQL and Iceberg — all from one CLI.

PyPI version Python versions License CI

Seeknal is an all-in-one platform for data and AI/ML engineering. Define pipelines in YAML or Python, run them through a safe draft → dry-run → apply workflow, and materialize outputs to PostgreSQL and Apache Iceberg simultaneously. Python 3.11+ required.

Quick Start

pip install seeknal
# Optional, only for distributed Spark execution:
# pip install "seeknal[spark]"

seeknal init --name my_project
seeknal draft --name my_pipeline --type transform
seeknal dry-run
seeknal apply

Explore your data interactively or search docs from the terminal:

seeknal repl          # Interactive SQL on pipeline outputs
seeknal docs query    # Search documentation from the CLI
SELECT customer_id, COUNT(*) as order_count
FROM target.my_transform
GROUP BY customer_id;

Key Features

Dual Pipeline Authoring — Write pipelines in YAML, Python decorators, or both:

from seeknal.pipeline import source, transform

@source(name="orders", source="csv", table="data/orders.csv")
def orders():
    pass

@transform(name="order_metrics", inputs=["source.orders"])
def order_metrics(ctx):
    df = ctx.ref("source.orders")
    return ctx.duckdb.sql(
        "SELECT customer_id, SUM(amount) as total FROM df GROUP BY customer_id"
    ).df()

Multi-Target Materialization — Write to PostgreSQL and Iceberg from a single node:

materializations:
  - type: postgresql
    connection: local_pg
    table: analytics.my_table
    mode: upsert_by_key
    unique_keys: [id]
  - type: iceberg
    table: atlas.namespace.my_table

Environment Management — Isolated namespaces with per-environment profiles:

seeknal env plan dev --profile profiles-dev.yml
seeknal env apply dev
seeknal run --env dev

Feature Store — Define ML features in YAML or Python with entity keys, point-in-time joins, and automatic versioning. Supports offline (batch) and online (real-time) serving.

# seeknal/feature_groups/customer_features.yml
kind: feature_group
name: customer_features
entity:
  name: customer
  join_keys: ["customer_id"]
materialization:
  event_time_col: latest_order_date
  offline: { enabled: true, format: parquet }
  online: { enabled: false, ttl: 7d }
features:
  total_orders: { dtype: integer }
  total_spent: { dtype: float }
  avg_order_value: { dtype: float }
inputs:
  - ref: transform.customer_orders
# Or use Python decorators
@feature_group(name="customer_rfm", entity="customer")
def customer_rfm(ctx):
    df = ctx.ref("transform.clean_transactions")
    return ctx.duckdb.sql("""
        SELECT CustomerID, COUNT(DISTINCT InvoiceNo) as frequency,
               SUM(TotalAmount) as monetary_value
        FROM df GROUP BY CustomerID
    """).df()
seeknal entity list                           # Cross-feature-group consolidation
seeknal entity show customer                  # Inspect entity schema and feature groups

Interactive SQL REPL — Auto-registers parquets, PostgreSQL, and Iceberg sources at startup. Query pipeline outputs, explore data, iterate on SQL — all without leaving the terminal.

AI-Powered Thinking Partnerseeknal ask chat is your collaborative partner for data work. The agent uses thin tools for fast data access and fat skills for multi-step workflows like report generation, pipeline building, database analysis, and data profiling — all loaded on demand to keep responses fast:

seeknal ask chat                        # Start a brainstorm / build session (interactive TUI)
seeknal ask "What are the top 5 customers by revenue?"  # Quick one-shot question
seeknal ask report "customer analysis"  # Generate interactive HTML dashboard
seeknal ask test --project . --sql-only # Validate project prompt-to-SQL tests
seeknal ask chat --web                  # Enable web search for benchmarks

seeknal ask chat launches an interactive terminal UI (Bun + React + Ink) with streaming tokens, tool visualization, and arrow-key ask_user picker for approval gates. The TUI is bundled inside the wheel; end users do not need Bun or Node. One-shot (seeknal ask "...") and report (seeknal ask report) commands use Python-only rendering with no TUI.

Ask it to answer questions against existing read-only databases with seeknal source connect, reuse project SQL examples from seeknal/sql_pairs/, and validate important questions with executable seeknal/tests/ QA oracles. Ask it to build a pipeline from scratch, and it will draft a plan, walk you through the design, and wait for your go-ahead before generating code. Publish reports to a self-hosted Seeknal Report Server and share them with your team via a URL.

For editable installs (pip install -e .), set SEEKNAL_TUI_BINARY_PATH to your local TUI build. See src/seeknal/ask/tui/README.md for full TypeScript contributing guide and development workflow.

seeknal report-server start             # Host published reports
seeknal gateway start                   # Expose ask as an API (WebSocket/SSE/REST)
seeknal gateway worker --gateway-url http://gateway:8000 --api-token "$SEEKNAL_API_TOKEN"  # Token-routed Temporal worker

Supports Google Gemini (default), OpenAI-compatible providers, Anthropic-compatible providers, and Ollama (local). Use --provider ollama for fully local, private analysis.

Documentation

Getting Started Installation, configuration, first pipeline
CLI Reference All commands and flags
YAML Schema Pipeline YAML reference
CLI Docs Search Search documentation from the terminal (seeknal docs)
Tutorials YAML Pipelines · Python Pipelines · Mixed · Seeknal Ask Agent · Report Exposures
Guides Python Pipelines · Testing & Audits · Iceberg Materialization · Training to Serving
Servers Gateway Server · Report Server
Concepts Point-in-Time Joins · Virtual Environments · Exposures · Glossary

Changelog

v2.9.1 (April 2026)

HTTP-only Ask worker mode — Adds a gateway-routed worker topology where workers only need outbound HTTP(S) to Seeknal Gateway or a compatible kc-service gateway.

  • HTTP worker transport: seeknal gateway worker --transport http long-polls gateway work-stream endpoints, runs Ask locally near the data, and posts streaming events plus completion back over HTTP.
  • Gateway broker mode: seeknal gateway start --temporal --worker-transport http and seeknal gateway backend --worker-transport http keep Temporal routing inside the gateway while external workers avoid Temporal credentials/network access.
  • Token-routed runtime config: token records can advertise worker_transport: http; workers still bootstrap from SEEKNAL_GATEWAY_URL + SEEKNAL_API_TOKEN.
  • Worker reliability fixes: project .env is loaded in worker mode, gateway polling retries transient connection failures, and Temporal activities heartbeat while waiting for HTTP workers.
  • pydantic-deep compatibility: skips unsupported stuck_loop_detection passthrough on current runtime versions while preserving config compatibility in tests/mocks.

v2.9.0 (April 2026)

Read-only Ask source harness + project SQL QA — Adds a TUI-first workflow for users who already have analytical tables in a database and want Seeknal Ask to answer business questions without building a pipeline.

  • Connected-source registry: seeknal source connect/status/inspect/sync/test writes seeknal_agent.yml, generates .seeknal/context/sources/ metadata, and verifies read-only database attachments.
  • SQL pairs for context: seeknal/sql_pairs/*.yml stores prompt-to-SQL examples the Ask agent can discover with list_sql_pairs / read_sql_pair.
  • Ask SQL tests: seeknal ask test runs project-local prompt-to-SQL QA cases from seeknal/tests/, including SQL-only oracle checks and agent-answer checks.
  • TUI QA cockpit: Ask chat can list, read, run, and inspect Ask tests via thin tools over the same test engine.
  • Structured grading: Ask tests support assert.compare: dataframe for markdown/JSON table comparison against expected SQL rows.
  • Init guidance: seeknal init now scaffolds AGENTS.md, CLAUDE.md, seeknal/sql_pairs/, and seeknal/tests/ for project-local agent conventions.

v2.8.0 (April 2026)

OpenAI/Anthropic providers + SQL safety + context files — Adds two new LLM provider families, execution guards on execute_sql, a pre-execution preview_query tool, persistent context files, and durable preferences.

  • OpenAI + Anthropic support: gpt-4o, claude-*, Azure OpenAI, Together, Groq, vLLM, LM Studio, and any OpenAI-compatible proxy via SEEKNAL_ASK_OPENAI_BASE_URL / SEEKNAL_ASK_ANTHROPIC_BASE_URL
  • execute_sql guards: rows capped at 500, columns at 50, per-cell length at 200 chars, 50 KB markdown budget — every truncation emits an actionable notice with accurate total row count
  • preview_query tool: four pre-execution safety probes (row count, column count, JOIN fan-out, dry-run reachability) — blocks queries returning ≥100k rows; pure aggregations auto-skip
  • Context files: list_context_files and write_project_file tools scan/write {project}/context/ with path-traversal guards
  • Durable preferences: save_preference appends to preferences.yml; preferences are injected into the system prompt on every session

Install from Source

For development or contributing:

git clone https://github.com/mta-tech/seeknal.git
cd seeknal
uv venv --python 3.11 && source .venv/bin/activate
uv pip install -e ".[all]"

Contributing

Contributions are welcome! See CONTRIBUTING.md for setup, code style, testing, and PR guidelines.

License

Seeknal is Apache 2.0 licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seeknal-2.9.2.tar.gz (845.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seeknal-2.9.2-py3-none-any.whl (1.0 MB view details)

Uploaded Python 3

File details

Details for the file seeknal-2.9.2.tar.gz.

File metadata

  • Download URL: seeknal-2.9.2.tar.gz
  • Upload date:
  • Size: 845.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seeknal-2.9.2.tar.gz
Algorithm Hash digest
SHA256 4ca6848eb26da0624604323ae43aee69f01bdb47d9f15ffaea777d48a8488bd7
MD5 4a2e652c0c053d96fe70772fe32a9e27
BLAKE2b-256 f61eb795c219a07ea9e16b6331f75e861726f2e9ef15d1dc422271546e151406

See more details on using hashes here.

Provenance

The following attestation bundles were made for seeknal-2.9.2.tar.gz:

Publisher: release.yml on mta-tech/seeknal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seeknal-2.9.2-py3-none-any.whl.

File metadata

  • Download URL: seeknal-2.9.2-py3-none-any.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seeknal-2.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d6201f847057bf50ec936e9c47cfbe328d05e5378e38f4a3da5e4824f74e3f62
MD5 bfb9d518f256e064c4a43d394a605617
BLAKE2b-256 13d0bd31f8f166b833ff0e9a8c0cdb79640abca6a0955dd2210b76efb09d4161

See more details on using hashes here.

Provenance

The following attestation bundles were made for seeknal-2.9.2-py3-none-any.whl:

Publisher: release.yml on mta-tech/seeknal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page