Skip to main content

All-in-one platform for data and AI/ML engineering

Project description

Seeknal

Transform data with SQL and Python. Build ML features with point-in-time joins. Materialize to PostgreSQL and Iceberg — all from one CLI.

PyPI version Python versions License CI

Seeknal is an all-in-one platform for data and AI/ML engineering. Define pipelines in YAML or Python, run them through a safe draft → dry-run → apply workflow, and materialize outputs to PostgreSQL and Apache Iceberg simultaneously. Python 3.11+ required.

Quick Start

pip install seeknal

seeknal init --name my_project
seeknal draft --name my_pipeline --type transform
seeknal dry-run
seeknal apply

Explore your data interactively or search docs from the terminal:

seeknal repl          # Interactive SQL on pipeline outputs
seeknal docs query    # Search documentation from the CLI
SELECT customer_id, COUNT(*) as order_count
FROM target.my_transform
GROUP BY customer_id;

Key Features

Dual Pipeline Authoring — Write pipelines in YAML, Python decorators, or both:

from seeknal.pipeline import source, transform

@source(name="orders", source="csv", table="data/orders.csv")
def orders():
    pass

@transform(name="order_metrics", inputs=["source.orders"])
def order_metrics(ctx):
    df = ctx.ref("source.orders")
    return ctx.duckdb.sql(
        "SELECT customer_id, SUM(amount) as total FROM df GROUP BY customer_id"
    ).df()

Multi-Target Materialization — Write to PostgreSQL and Iceberg from a single node:

materializations:
  - type: postgresql
    connection: local_pg
    table: analytics.my_table
    mode: upsert_by_key
    unique_keys: [id]
  - type: iceberg
    table: atlas.namespace.my_table

Environment Management — Isolated namespaces with per-environment profiles:

seeknal env plan dev --profile profiles-dev.yml
seeknal env apply dev
seeknal run --env dev

Feature Store — Point-in-time joins, automatic versioning, offline and online serving. Powered by DuckDB (single-node, <100M rows) or Apache Spark (distributed).

from seeknal.featurestore.duckdbengine.feature_group import FeatureGroupDuckDB, FeatureLookup, Materialization, HistoricalFeaturesDuckDB
from seeknal.entity import Entity

fg = FeatureGroupDuckDB(
    name="user_features",
    entity=Entity(name="user", join_keys=["user_id"]),
    materialization=Materialization(event_time_col="event_time"),
)
fg.set_dataframe(df).set_features()
fg.write(feature_start_time=datetime(2024, 1, 1))

# Point-in-time join (prevents data leakage)
hist = HistoricalFeaturesDuckDB(lookups=[FeatureLookup(source=fg)])
training_df = hist.to_dataframe(feature_start_time=datetime(2024, 1, 1))

Interactive SQL REPL — Auto-registers parquets, PostgreSQL, and Iceberg sources at startup. Query pipeline outputs, explore data, iterate on SQL — all without leaving the terminal.

Documentation

Getting Started Installation, configuration, first pipeline
CLI Reference All commands and flags
YAML Schema Pipeline YAML reference
CLI Docs Search Search documentation from the terminal (seeknal docs)
Tutorials YAML Pipelines · Python Pipelines · Mixed
Guides Python Pipelines · Testing & Audits · Iceberg Materialization · Training to Serving
Concepts Point-in-Time Joins · Virtual Environments · Glossary

Changelog

v2.3.0 (March 2026)

Incremental Detection — Automatically skip unchanged data sources and process only new data:

# PostgreSQL watermark-based incremental detection
- kind: source
  name: events
  source: postgresql
  table: public.events
  freshness:
    time_column: created_at  # Tracks MAX(created_at) watermark
  params:
    connection: my_pg
  • PostgreSQL Incremental: Watermark-based detection using MAX(time_column) comparison. Automatically generates WHERE time_col > 'watermark' OR time_col IS NULL for incremental reads.
  • Iceberg Incremental: Snapshot-based detection comparing current snapshot ID. Supports partition pruning for time-partitioned tables.
  • Skip Optimization: If fingerprint and watermark match, source execution is skipped entirely.
  • Cascade Invalidation: Dependent nodes are automatically invalidated when source data changes.
  • Full Refresh: Use --full flag to ignore stored watermarks and reload all data.

Other Changes:

  • Enhanced QA automation with multi-spec execution support
  • Pipeline error logging with --verbose mode
  • Security fix: Updated cryptography to 46.0.5 (CVE-2026-26007)

v2.2.2 (February 2026)

  • Entity consolidation for per-entity feature views
  • Multi-target materialization (PostgreSQL + Iceberg from single node)
  • Environment-aware execution with namespace prefixing

Install from Source

For development or contributing:

git clone https://github.com/mta-tech/seeknal.git
cd seeknal
uv venv --python 3.11 && source .venv/bin/activate
uv pip install -e ".[all]"

Contributing

Contributions are welcome! See CONTRIBUTING.md for setup, code style, testing, and PR guidelines.

License

Seeknal is Apache 2.0 licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seeknal-2.3.0.tar.gz (469.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seeknal-2.3.0-py3-none-any.whl (557.6 kB view details)

Uploaded Python 3

File details

Details for the file seeknal-2.3.0.tar.gz.

File metadata

  • Download URL: seeknal-2.3.0.tar.gz
  • Upload date:
  • Size: 469.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for seeknal-2.3.0.tar.gz
Algorithm Hash digest
SHA256 f8a5ef6760e61121f2e504b2b20eedbc3eee74f71e4f642b7fa4ee96c127b648
MD5 88a3c35c58e8ec235d1491408e4924e0
BLAKE2b-256 e11cdce7a608fd546ad5bf7a01391c95184f038a5a420b85de46120345b6b993

See more details on using hashes here.

Provenance

The following attestation bundles were made for seeknal-2.3.0.tar.gz:

Publisher: release.yml on mta-tech/seeknal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seeknal-2.3.0-py3-none-any.whl.

File metadata

  • Download URL: seeknal-2.3.0-py3-none-any.whl
  • Upload date:
  • Size: 557.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for seeknal-2.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b7b77d509c97c4ac162750d2feb34205dc42bd9cebc9572092a7c15eb50093d8
MD5 5bd9caf09a6ffd61c25b7bce9ef7eac9
BLAKE2b-256 dd6827b03f18fa2e4587d1bf9d49209dedd9c78b91f3284c49bca8057aa345fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for seeknal-2.3.0-py3-none-any.whl:

Publisher: release.yml on mta-tech/seeknal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page