Skip to main content

All-in-one platform for data and AI/ML engineering

Project description

Seeknal

Transform data with SQL and Python. Build ML features with point-in-time joins. Materialize to PostgreSQL and Iceberg — all from one CLI.

PyPI version Python versions License CI

Seeknal is an all-in-one platform for data and AI/ML engineering. Define pipelines in YAML or Python, run them through a safe draft → dry-run → apply workflow, and materialize outputs to PostgreSQL and Apache Iceberg simultaneously. Python 3.11+ required.

Quick Start

pip install seeknal

seeknal init --name my_project
seeknal draft --name my_pipeline --type transform
seeknal dry-run
seeknal apply

Explore your data interactively:

seeknal repl
SELECT customer_id, COUNT(*) as order_count
FROM target.my_transform
GROUP BY customer_id;

Key Features

Dual Pipeline Authoring — Write pipelines in YAML, Python decorators, or both:

from seeknal.pipeline.decorators import source, transform, materialize

@source(name="orders", type="csv", path="data/orders.csv")
def orders():
    pass

@transform(name="order_metrics", depends_on=["source.orders"])
def order_metrics(ctx):
    return ctx.ref("source.orders").sql(
        "SELECT customer_id, SUM(amount) as total FROM orders GROUP BY customer_id"
    )

Multi-Target Materialization — Write to PostgreSQL and Iceberg from a single node:

materializations:
  - type: postgresql
    connection: local_pg
    table: analytics.my_table
    mode: upsert_by_key
    unique_keys: [id]
  - type: iceberg
    table: atlas.namespace.my_table

Environment Management — Isolated namespaces with per-environment profiles:

seeknal env plan dev --profile profiles-dev.yml
seeknal env apply dev
seeknal run --env dev

Feature Store — Point-in-time joins, automatic versioning, offline and online serving. Powered by DuckDB (single-node, <100M rows) or Apache Spark (distributed).

from seeknal.featurestore.duckdbengine.feature_group import FeatureGroupDuckDB, FeatureLookup, Materialization, HistoricalFeaturesDuckDB
from seeknal.entity import Entity

fg = FeatureGroupDuckDB(
    name="user_features",
    entity=Entity(name="user", join_keys=["user_id"]),
    materialization=Materialization(event_time_col="event_time"),
)
fg.set_dataframe(df).set_features()
fg.write(feature_start_time=datetime(2024, 1, 1))

# Point-in-time join (prevents data leakage)
hist = HistoricalFeaturesDuckDB(lookups=[FeatureLookup(source=fg)])
training_df = hist.to_dataframe(feature_start_time=datetime(2024, 1, 1))

Interactive SQL REPL — Auto-registers parquets, PostgreSQL, and Iceberg sources at startup. Query pipeline outputs, explore data, iterate on SQL — all without leaving the terminal.

Documentation

Getting Started Installation, configuration, first pipeline
CLI Reference All commands and flags
YAML Schema Pipeline YAML reference
Tutorials YAML Pipelines · Python Pipelines · Mixed
Guides Testing & Audits · Iceberg Materialization · Training to Serving
Concepts Point-in-Time Joins · Virtual Environments · Glossary

Install from Source

For development or contributing:

git clone https://github.com/mta-tech/seeknal.git
cd seeknal
uv venv --python 3.11 && source .venv/bin/activate
uv pip install -e ".[all]"

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting pull requests.

License

Seeknal is Apache 2.0 licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seeknal-2.1.0.tar.gz (395.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seeknal-2.1.0-py3-none-any.whl (481.6 kB view details)

Uploaded Python 3

File details

Details for the file seeknal-2.1.0.tar.gz.

File metadata

  • Download URL: seeknal-2.1.0.tar.gz
  • Upload date:
  • Size: 395.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for seeknal-2.1.0.tar.gz
Algorithm Hash digest
SHA256 258501b2224644fb8e95960cb599c9b522c4a05d679549104b027a8a4d1a2230
MD5 c65337a99eaf86845f6949f1109f595f
BLAKE2b-256 ceb04b997b3926e94a7d4d57859b4d1b686a31a3ccf84750bd249d2950520d01

See more details on using hashes here.

Provenance

The following attestation bundles were made for seeknal-2.1.0.tar.gz:

Publisher: publish.yml on mta-tech/seeknal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seeknal-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: seeknal-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 481.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for seeknal-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e6837861afe23fe8b1619e3493cb413a35cfada9ac56aa80ca26445ce871ecea
MD5 fad7e18947068d7082f030bc4d2dd3be
BLAKE2b-256 6759aaecc57100115fe9a4391ae482ab7e105d863f111bee4d92126f3bf5b5b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for seeknal-2.1.0-py3-none-any.whl:

Publisher: publish.yml on mta-tech/seeknal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page