Allow SKLearn predictions to run on database systems in pure SQL.

These details have not been verified by PyPI

Project description

OrbitalML

Convert SKLearn pipelines into SQL queries for execution in a database without the need for a Python environment.

See examples directory for example pipelines and Documentation

Warning:

This is a work in progress.
You might encounter bugs or missing features.

Note:

Not all transformations and models can be represented as SQL queries,
so OrbitalML might not be able to implement the specific pipeline you are using.

Getting Started

Install OrbitalML:

$ pip install orbitalml

Prepare some data:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

COLUMNS = ["sepal.length", "sepal.width", "petal.length", "petal.width"]

iris = load_iris(as_frame=True)
iris_x = iris.data.set_axis(COLUMNS, axis=1)

# SQL and OrbitalML don't like dots in column names, replace them with underscores
iris_x.columns = COLUMNS = [cname.replace(".", "_") for cname in COLUMNS]

X_train, X_test, y_train, y_test = train_test_split(
    iris_x, iris.target, test_size=0.2, random_state=42
)

Define a Scikit-Learn pipeline and train it:

from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

pipeline = Pipeline(
    [
        ("preprocess", ColumnTransformer([("scaler", StandardScaler(with_std=False), COLUMNS)],
                                        remainder="passthrough")),
        ("linear_regression", LinearRegression()),
    ]
)
pipeline.fit(X_train, y_train)

Convert the pipeline to OrbitalML:

import orbitalml
import orbitalml.types

orbitalml_pipeline = orbitalml.parse_pipeline(pipeline, features={
    "sepal_length": orbitalml.types.DoubleColumnType(),
    "sepal_width": orbitalml.types.DoubleColumnType(),
    "petal_length": orbitalml.types.DoubleColumnType(),
    "petal_width": orbitalml.types.DoubleColumnType(),
})

You can print the pipeline to see the result:

>>> print(orbitalml_pipeline)

ParsedPipeline(
    features={
        sepal_length: DoubleColumnType()
        sepal_width: DoubleColumnType()
        petal_length: DoubleColumnType()
        petal_width: DoubleColumnType()
    },
    steps=[
        merged_columns=Concat(
            inputs: sepal_length, sepal_width, petal_length, petal_width,
            attributes: 
             axis=1
        )
        variable1=Sub(
            inputs: merged_columns, Su_Subcst=[5.809166666666666, 3.0616666666666665, 3.7266666666666666, 1.18333333...,
            attributes: 
        )
        multiplied=MatMul(
            inputs: variable1, coef=[-0.11633479416518255, -0.05977785171980231, 0.25491374699772246, 0.5475959...,
            attributes: 
        )
        resh=Add(
            inputs: multiplied, intercept=[0.9916666666666668],
            attributes: 
        )
        variable=Reshape(
            inputs: resh, shape_tensor=[-1, 1],
            attributes: 
        )
    ],
)

Now we can generate the SQL from the pipeline:

sql = orbitalml.export_sql("DATA_TABLE", orbitalml_pipeline, dialect="duckdb")

And check the resulting query:

>>> print(sql)

SELECT ("t0"."sepal_length" - 5.809166666666666) * -0.11633479416518255 + 0.9916666666666668 +  
       ("t0"."sepal_width" - 3.0616666666666665) * -0.05977785171980231 + 
       ("t0"."petal_length" - 3.7266666666666666) * 0.25491374699772246 + 
       ("t0"."petal_width" - 1.1833333333333333) * 0.5475959809777828 
AS "variable" FROM "DATA_TABLE" AS "t0"

Once the SQL is generate, you can use it to run the pipeline on a database. From here on the SQL can be exported and reused in other places:

>>> print("\nPrediction with SQL")
>>> duckdb.register("DATA_TABLE", X_test)
>>> print(duckdb.sql(sql).df()["variable"][:5].to_numpy())

Prediction with SQL
[ 1.23071715 -0.04010441  2.21970287  1.34966889  1.28429336]

We can verify that the prediction matches the one done by Scikit-Learn by running the scikitlearn pipeline on the same set of data:

>>> print("\nPrediction with SciKit-Learn")
>>> print(pipeline.predict(X_test)[:5])

Prediction with SciKit-Learn
[ 1.23071715 -0.04010441  2.21970287  1.34966889  1.28429336 ]

Supported Models

OrbitalML currently supports the following models:

Linear Regression
Logistic Regression
Lasso Regression
Elastic Net
Decision Tree Regressor
Decision Tree Classifier
Random Forest Classifier
Gradient Boosting Regressor
Gradient Boosting Classifier

Testing

Setup testing environment:

$ uv sync --no-dev --extra test

Run Tests:

$ uv run pytest -v

Try Examples:

$ uv run examples/pipeline_lineareg.py

Development

Setup a development environment:

$ uv sync

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.2

Jun 18, 2025

0.2.1

Jun 11, 2025

This version

0.2.0

May 20, 2025

0.1.0

May 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orbitalml-0.2.0.tar.gz (41.4 kB view details)

Uploaded May 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

orbitalml-0.2.0-py3-none-any.whl (53.3 kB view details)

Uploaded May 20, 2025 Python 3

File details

Details for the file orbitalml-0.2.0.tar.gz.

File metadata

Download URL: orbitalml-0.2.0.tar.gz
Upload date: May 20, 2025
Size: 41.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for orbitalml-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`7aee54d61c94af31e457999ff48c71b3c3698d6eae26aa6a73e12bb094c6f167`
MD5	`33bac50111ee38d7c2c08ed190832409`
BLAKE2b-256	`33f21537100708ddaf0cbd2c3672184a464e03470fd64b5d22e6f2dea12e763b`

See more details on using hashes here.

File details

Details for the file orbitalml-0.2.0-py3-none-any.whl.

File metadata

Download URL: orbitalml-0.2.0-py3-none-any.whl
Upload date: May 20, 2025
Size: 53.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for orbitalml-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b25b0a079204a5445e187d18da1a3be6c265d48d53385ee6fa26a6f2fdc7f1ab`
MD5	`b0fc50d4a6c9a41758dacf8a9cf167bd`
BLAKE2b-256	`09f88d41527c1783a1223227b0e832c71f4dc2ad6a613eae8c32be274227bb5b`

See more details on using hashes here.

orbitalml 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

OrbitalML

Getting Started

Supported Models

Testing

Development

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes