Skip to main content

Polars Least Squares Extension

Project description

Polars OLS

Least squares extension in Polars

Supports linear model estimation in Polars.

This package provides efficient rust implementations of common linear regression variants (OLS, WLS, Ridge, Elastic Net, Non-negative least squares, Recursive least squares) and exposes them as simple polars expressions which can easily be integrated into your workflow.

Why?

  1. High Performance: implementations are written in rust and make use of optimized rust linear-algebra crates & LAPACK routines. See benchmark section.
  2. Polars Integration: avoids unnecessary conversions from lazy to eager mode and to external libraries (e.g. numpy, sklearn) to do simple linear regressions. Chain least squares formulae like any other expression in polars.
  3. Efficient Implementations:
    • Numerically stable algorithms are chosen where appropriate (e.g. QR, Cholesky).
    • Flexible model specification allows arbitrary combination of sample weighting, L1/L2 regularization, & non-negativity constraints on parameters.
    • Efficient rank-1 update algorithms used for moving window regressions.
  4. Easy Parallelism: Computing OLS predictions, in parallel, across groups can not be easier: call .over() or group_by just like any other polars' expression and benefit from full Rust parallelism.
  5. Formula API: supports building models via patsy syntax: y ~ x1 + x2 + x3:x4 -1 (like statsmodels) which automatically converts to equivalent polars expressions.

Installation

First, you need to install Polars. Then run the below to install the polars-ols extension:

pip install polars-ols

API & Examples

Importing polars_ols will register the namespace least_squares provided by this package. You can build models either by either specifying polars expressions (e.g. pl.col(...)) for your targets and features or using the formula api (patsy syntax). All models support the following general (optional) arguments:

  • mode - a literal which determines the type of output produced by the model
  • null_policy - a literal which determines how to deal with missing data
  • add_intercept - a boolean specifying if an intercept feature should be added to the features
  • sample_weights - a column or expression providing non-negative weights applied to the samples

Remaining parameters are model specific, for example alpha penalty parameter used by regularized least squares models.

See below for basic usage examples. Please refer to the tests or demo notebook for detailed examples.

import polars as pl
import polars_ols as pls  # registers 'least_squares' namespace

df = pl.DataFrame({"y": [1.16, -2.16, -1.57, 0.21, 0.22, 1.6, -2.11, -2.92, -0.86, 0.47],
                   "x1": [0.72, -2.43, -0.63, 0.05, -0.07, 0.65, -0.02, -1.64, -0.92, -0.27],
                   "x2": [0.24, 0.18, -0.95, 0.23, 0.44, 1.01, -2.08, -1.36, 0.01, 0.75],
                   "group": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
                   "weights": [0.34, 0.97, 0.39, 0.8, 0.57, 0.41, 0.19, 0.87, 0.06, 0.34],
                   })

lasso_expr = pl.col("y").least_squares.lasso(pl.col("x1"), pl.col("x2"), alpha=0.0001, add_intercept=True).over("group")
wls_expr = pls.least_squares_from_formula("y ~ x1 + x2 -1", sample_weights=pl.col("weights"))

predictions = df.with_columns(lasso_expr.round(2).alias("predictions_lasso"),
                              wls_expr.round(2).alias("predictions_wls"))

print(predictions.head(5))
shape: (5, 7)
┌───────┬───────┬───────┬───────┬─────────┬───────────────────┬─────────────────┐
│ y     ┆ x1    ┆ x2    ┆ group ┆ weights ┆ predictions_lasso ┆ predictions_wls │
│ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---     ┆ ---               ┆ ---             │
│ f64   ┆ f64   ┆ f64   ┆ i64   ┆ f64     ┆ f32               ┆ f32             │
╞═══════╪═══════╪═══════╪═══════╪═════════╪═══════════════════╪═════════════════╡
│ 1.16  ┆ 0.72  ┆ 0.24  ┆ 1     ┆ 0.34    ┆ 0.97              ┆ 0.93            │
│ -2.16 ┆ -2.43 ┆ 0.18  ┆ 1     ┆ 0.97    ┆ -2.23             ┆ -2.18           │
│ -1.57 ┆ -0.63 ┆ -0.95 ┆ 1     ┆ 0.39    ┆ -1.54             ┆ -1.54           │
│ 0.21  ┆ 0.05  ┆ 0.23  ┆ 1     ┆ 0.8     ┆ 0.29              ┆ 0.27            │
│ 0.22  ┆ -0.07 ┆ 0.44  ┆ 1     ┆ 0.57    ┆ 0.37              ┆ 0.36            │
└───────┴───────┴───────┴───────┴─────────┴───────────────────┴─────────────────┘

The mode parameter is used to set the type of the output returned by all methods ("predictions", "residuals", "coefficients"). It defaults to returning predictions matching the input's length.

In case "coefficients" is set the output is a polars Struct with coefficients as values and feature names as fields. It's output shape 'broadcasts' depending on context, see below:

coefficients = df.select(pl.col("y").least_squares.from_formula("x1 + x2", mode="coefficients")
                         .alias("coefficients"))

coefficients_group = df.select("group", pl.col("y").least_squares.from_formula("x1 + x2", mode="coefficients").over("group")
                        .alias("coefficients_group")).unique(maintain_order=True)

print(coefficients)
print(coefficients_group)
shape: (1, 1)
┌──────────────────────────────┐
│ coefficients                 │
│ ---                          │
│ struct[3]                    │
╞══════════════════════════════╡
│ {0.977375,0.987413,0.000757} │  # <--- coef for x1, x2, and intercept added by formula API
└──────────────────────────────┘
shape: (2, 2)
┌───────┬───────────────────────────────┐
│ group ┆ coefficients_group            │
│ ---   ┆ ---                           │
│ i64   ┆ struct[3]                     │
╞═══════╪═══════════════════════════════╡
│ 1     ┆ {0.995157,0.977495,0.014344}  │
│ 2     ┆ {0.939217,0.997441,-0.017599} │  # <--- (unique) coefficients per group
└───────┴───────────────────────────────┘

For dynamic models (like rolling_ols) or if in a .over, .group_by, or .with_columns context, the coefficients will take the shape of the data it is applied on. For example:

coefficients = df.with_columns(pl.col("y").least_squares.rls(pl.col("x1"), pl.col("x2"), mode="coefficients")
                         .over("group").alias("coefficients"))

print(coefficients.head())
shape: (5, 6)
┌───────┬───────┬───────┬───────┬─────────┬─────────────────────┐
│ y     ┆ x1    ┆ x2    ┆ group ┆ weights ┆ coefficients        │
│ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---     ┆ ---                 │
│ f64   ┆ f64   ┆ f64   ┆ i64   ┆ f64     ┆ struct[2]           │
╞═══════╪═══════╪═══════╪═══════╪═════════╪═════════════════════╡
│ 1.16  ┆ 0.72  ┆ 0.24  ┆ 1     ┆ 0.34    ┆ {1.235503,0.411834} │
│ -2.16 ┆ -2.43 ┆ 0.18  ┆ 1     ┆ 0.97    ┆ {0.963515,0.760769} │
│ -1.57 ┆ -0.63 ┆ -0.95 ┆ 1     ┆ 0.39    ┆ {0.975484,0.966029} │
│ 0.21  ┆ 0.05  ┆ 0.23  ┆ 1     ┆ 0.8     ┆ {0.975657,0.953735} │
│ 0.22  ┆ -0.07 ┆ 0.44  ┆ 1     ┆ 0.57    ┆ {0.97898,0.909793}  │
└───────┴───────┴───────┴───────┴─────────┴─────────────────────┘

Finally, for convenience, in order to compute out-of-sample predictions you can use: least_squares.{predict, predict_from_formula}. This saves you the effort of un-nesting the coefficients and doing the dot product in python and instead does this in Rust, as an expression. Usage is as follows:

df_test.select(pl.col("coefficients_train").least_squares.predict(pl.col("x1"), pl.col("x2")).alias("predictions_test"))

Supported Models

Currently, this extension package supports the following variants:

  • Ordinary Least Squares: least_squares.ols
  • Weighted Least Squares: least_squares.wls
  • Regularized Least Squares (Lasso / Ridge / Elastic Net) least_squares.{lasso, ridge, elastic_net}
  • Non-negative Least Squares: least_squares.nnls

As well as efficient implementations of moving window models:

  • Recursive Least Squares: least_squares.rls
  • Rolling / Expanding Window OLS: least_squares.{rolling_ols, expanding_ols}

An arbitrary combination of sample_weights, L1/L2 penalties, and non-negativity constraints can be specified with the least_squares.from_formula and least_squares.least_squares entry-points.

Benchmark

The usual caveats of benchmarks apply here, but the below should still be indicative of the type of performance improvements to expect when using this package.

This benchmark was run on randomly generated data with pyperf on my Apple M2 Max macbook (32GB RAM, MacOS Sonoma 14.2.1). See benchmark.py for implementation.

n_samples=2_000, n_features=5

Model polars_ols Python Benchmark Benchmark Type Speed-up vs Python Benchmark
Least Squares 283 ± 4 us 509 ± 313 us Numpy 1.8x
Ridge 262 ± 3 us 369 ± 231 us Numpy 1.4x
Weighted Least Squares 493 ± 7 us 2.13 ms ± 0.22 ms Statsmodels 4.3x
Elastic Net 326 ± 3 us 87.3 ms ± 9.0 ms Sklearn 268.2x
Recursive Least Squares 1.39 ms ± 0.01 ms 18.7 ms ± 1.4 ms Statsmodels 13.5x
Rolling Least Squares 2.72 ms ± 0.03 ms 22.3 ms ± 0.2 ms Statsmodels 8.2x

n_samples=10_000, n_features=100

Model polars_ols Python Benchmark Benchmark Type Speed-up vs Python Benchmark
Least Squares 15.6 ms ± 0.2 ms 29.9 ms ± 8.6 ms Numpy 1.9x
Ridge 5.81 ms ± 0.05 ms 5.21 ms ± 0.94 ms Numpy 0.9x
Weighted Least Squares 16.8 ms ± 0.2 ms 82.4 ms ± 9.1 ms Statsmodels 4.9x
Elastic Net 20.9 ms ± 0.3 ms 134 ms ± 21 ms Sklearn 6.4x
Recursive Least Squares 163 ms ± 28 ms 65.7 sec ± 28.2 sec Statsmodels 403.1x
Rolling Least Squares 390 ms ± 10 ms 3.99 sec ± 0.54 sec Statsmodels 10.2x

Numpy's lstsq is already a highly optimized call into LAPACK and so the scope for speed-up is limited. However, we can achieve substantial speed-ups for the more complex models by working entirely in rust and avoiding overhead from back and forth into python.

Expect an additional relative order-of-magnitude speed up to your workflow if it involved repeated re-estimation of models in (python) loops.

Credits & Related Projects

  • Rust linear algebra libraries faer and ndarray support the implementations provided by this extension package
  • This package was templated around the very helpful: polars-plugin-tutorial
  • The python package patsy is used for (optionally) building models from formulae
  • Please check out the extension package polars-ds for general data-science functionality in polars

Future Work / TODOs

  • Support generic types, in rust implementations, so that both f32 and f64 types are recognized. Right now data is cast to f32 prior to estimation
  • Handle nulls more clearly
  • Add more detailed documentation on supported models, signatures, and API

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_ols-0.2.4.tar.gz (54.7 kB view details)

Uploaded Source

Built Distributions

polars_ols-0.2.4-cp38-abi3-win_amd64.whl (4.0 MB view details)

Uploaded CPython 3.8+ Windows x86-64

polars_ols-0.2.4-cp38-abi3-win32.whl (3.4 MB view details)

Uploaded CPython 3.8+ Windows x86

polars_ols-0.2.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

polars_ols-0.2.4-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (7.9 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ i686

polars_ols-0.2.4-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (7.0 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARMv7l

polars_ols-0.2.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARM64

polars_ols-0.2.4-cp38-abi3-macosx_11_0_arm64.whl (3.8 MB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

polars_ols-0.2.4-cp38-abi3-macosx_10_12_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

File details

Details for the file polars_ols-0.2.4.tar.gz.

File metadata

  • Download URL: polars_ols-0.2.4.tar.gz
  • Upload date:
  • Size: 54.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.5.1

File hashes

Hashes for polars_ols-0.2.4.tar.gz
Algorithm Hash digest
SHA256 a905d6dbad8b8f6c041d00fbfdd85c5d2b78dac35f55a5d5891def71e0e2623c
MD5 f0a1756e62c7ae646b0c21b74486e4c0
BLAKE2b-256 59feae4e1cd369bf0c47c9a534f78e7f824d57a78e32ceb81d2f158f628fe93c

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.4-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.4-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 f6da42f120b8baccb9ceca1aee4a77148fa4fc66baf49b6e48588427b192a3d8
MD5 789c7e7998aab67a68e22efb15d88ac4
BLAKE2b-256 835d2c376fcc948a03bd71bd411c08b4a634cdc0d5e537d9b42d1de2aa529308

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.4-cp38-abi3-win32.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.4-cp38-abi3-win32.whl
Algorithm Hash digest
SHA256 1ec0ea22cdfb0c27f561d6d0c10a690550b1f8aea1f60896cedcb849e257752b
MD5 a85c85f079dd1987cc225141adfd8905
BLAKE2b-256 59aba591baece04fdb85d1e3e6a26e459fdbec8482e0027c164cc1505fecd6c2

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bd3861bd5ab1cdbe1b0ad74f006f52b9f816187f3bc7b75a71b5e838bc680f7c
MD5 00cbabf289e720208012d36f26107f81
BLAKE2b-256 641e5d0fdb4b491f3eea9f1b95a0d4d431dc2e4d34985d7f65af4c5a8c873dff

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.4-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.4-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 dd568dd4b0ec0051afa81bcecc581b92cece15eec9933854c9e01cfaeeeab8a5
MD5 a9c09cb390fc73496e1a26d3267fbdb0
BLAKE2b-256 956aa9480052cb6050a14917131afe03bb729ceaf6c40c02d4824d919ee42264

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.4-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.4-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 a22748c2f38898d469a2b09f758aff5b0e3d1a35a477e27c083a08960e6a1313
MD5 668324f3aa7bdb3404045585e5b164b9
BLAKE2b-256 387dfbd1c6429591acd2ab72c1e881ee1857c522ae5abb0b211ee7e7ec5641d0

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c03dfca0b5003a558fb055a67c544be63de914673e1119816bb03a581b0f2aab
MD5 c639d7ddcff1ebd8828820bfc07b7eb5
BLAKE2b-256 533cc074fee9595ba570b3ee837da7bda2a4d9328bbb6cc926e862b505c16bb3

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.4-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.4-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0a85705007d6637c26f50a4103f8c8c5aef586ec58408464f68c2376e38a72b5
MD5 a02d15489e8e9538178cdc58b892d053
BLAKE2b-256 1f0721ea70b7b8f60fc7d498f83e5c997340a563ec776577bbba3e022aacf8e9

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.4-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.4-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 181724808808c4445afea6f7347e3504c5b88207352422a30a51af9b2effc360
MD5 38a90fc99aa9f0f3085b3c2d8a1075e3
BLAKE2b-256 b955ab647b60f4935c21e5981b7ff69d1288b5ba7480cf9e00e2534ac6012e3d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page