Skip to main content

Polars Least Squares Extension

Project description

Polars OLS

Least squares extension in Polars

Supports linear model estimation in Polars.

This package provides efficient rust implementations of common linear regression variants (OLS, WLS, Ridge, Elastic Net, Non-negative least squares, Recursive least squares) and exposes them as simple polars expressions which can easily be integrated into your workflow.

Why?

  1. High Performance: implementations are written in rust and make use of optimized rust linear-algebra crates & LAPACK routines. See benchmark section.
  2. Polars Integration: avoids unnecessary conversions from lazy to eager mode and to external libraries (e.g. numpy, sklearn) to do simple linear regressions. Chain least squares formulae like any other expression in polars.
  3. Efficient Implementations:
    • Numerically stable algorithms are chosen where appropriate (e.g. QR, Cholesky).
    • Flexible model specification allows arbitrary combination of sample weighting, L1/L2 regularization, & non-negativity constraints on parameters.
    • Efficient rank-1 update algorithms used for moving window regressions.
  4. Easy Parallelism: Computing OLS predictions, in parallel, across groups can not be easier: call .over() or group_by just like any other polars' expression and benefit from full Rust parallelism.
  5. Formula API: supports building models via patsy syntax: y ~ x1 + x2 + x3:x4 -1 (like statsmodels) which automatically converts to equivalent polars expressions.

Installation

First, you need to install Polars. Then run the below to install the polars-ols extension:

pip install polars-ols

API & Examples

Importing polars_ols will register the namespace least_squares provided by this package. You can build models either by either specifying polars expressions (e.g. pl.col(...)) for your targets and features or using the formula api (patsy syntax). All models support the following general (optional) arguments:

  • mode - a literal which determines the type of output produced by the model
  • add_intercept - a boolean specifying if an intercept feature should be added to the features
  • sample_weights - a column or expression providing non-negative weights applied to the samples

Remaining parameters are model specific, for example alpha penalty parameter used by regularized least squares models.

See below for basic usage examples. Please refer to the tests or demo notebook for detailed examples.

import polars as pl
import polars_ols as pls  # registers 'least_squares' namespace

df = pl.DataFrame({"y": [1.16, -2.16, -1.57, 0.21, 0.22, 1.6, -2.11, -2.92, -0.86, 0.47],
                   "x1": [0.72, -2.43, -0.63, 0.05, -0.07, 0.65, -0.02, -1.64, -0.92, -0.27],
                   "x2": [0.24, 0.18, -0.95, 0.23, 0.44, 1.01, -2.08, -1.36, 0.01, 0.75],
                   "group": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
                   "weights": [0.34, 0.97, 0.39, 0.8, 0.57, 0.41, 0.19, 0.87, 0.06, 0.34],
                   })

lasso_expr = pl.col("y").least_squares.lasso(pl.col("x1"), pl.col("x2"), alpha=0.0001, add_intercept=True).over("group")
wls_expr = pls.least_squares_from_formula("y ~ x1 + x2 -1", sample_weights=pl.col("weights"))

predictions = df.with_columns(lasso_expr.round(2).alias("predictions_lasso"),
                              wls_expr.round(2).alias("predictions_wls"))

print(predictions.head(5))
shape: (5, 7)
┌───────┬───────┬───────┬───────┬─────────┬───────────────────┬─────────────────┐
│ y     ┆ x1    ┆ x2    ┆ group ┆ weights ┆ predictions_lasso ┆ predictions_wls │
│ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---     ┆ ---               ┆ ---             │
│ f64   ┆ f64   ┆ f64   ┆ i64   ┆ f64     ┆ f32               ┆ f32             │
╞═══════╪═══════╪═══════╪═══════╪═════════╪═══════════════════╪═════════════════╡
│ 1.16  ┆ 0.72  ┆ 0.24  ┆ 1     ┆ 0.34    ┆ 0.97              ┆ 0.93            │
│ -2.16 ┆ -2.43 ┆ 0.18  ┆ 1     ┆ 0.97    ┆ -2.23             ┆ -2.18           │
│ -1.57 ┆ -0.63 ┆ -0.95 ┆ 1     ┆ 0.39    ┆ -1.54             ┆ -1.54           │
│ 0.21  ┆ 0.05  ┆ 0.23  ┆ 1     ┆ 0.8     ┆ 0.29              ┆ 0.27            │
│ 0.22  ┆ -0.07 ┆ 0.44  ┆ 1     ┆ 0.57    ┆ 0.37              ┆ 0.36            │
└───────┴───────┴───────┴───────┴─────────┴───────────────────┴─────────────────┘

The mode parameter is used to set the type of the output returned by all methods ("predictions", "residuals", "coefficients"). It defaults to returning predictions matching the input's length.

In case "coefficients" is set the output is a polars Struct with coefficients as values and feature names as fields. It's output shape 'broadcasts' depending on context, see below:

coefficients = df.select(pl.col("y").least_squares.from_formula("x1 + x2", mode="coefficients")
                         .alias("coefficients"))

coefficients_group = df.select("group", pl.col("y").least_squares.from_formula("x1 + x2", mode="coefficients").over("group")
                        .alias("coefficients_group")).unique(maintain_order=True)

print(coefficients)
print(coefficients_group)
shape: (1, 1)
┌──────────────────────────────┐
│ coefficients                 │
│ ---                          │
│ struct[3]                    │
╞══════════════════════════════╡
│ {0.977375,0.987413,0.000757} │  # <--- coef for x1, x2, and intercept added by formula API
└──────────────────────────────┘
shape: (2, 2)
┌───────┬───────────────────────────────┐
│ group ┆ coefficients_group            │
│ ---   ┆ ---                           │
│ i64   ┆ struct[3]                     │
╞═══════╪═══════════════════════════════╡
│ 1     ┆ {0.995157,0.977495,0.014344}  │
│ 2     ┆ {0.939217,0.997441,-0.017599} │  # <--- (unique) coefficients per group
└───────┴───────────────────────────────┘

For dynamic models (like rolling_ols) or if in a .over, .group_by, or .with_columns context, the coefficients will take the shape of the data it is applied on. For example:

coefficients = df.with_columns(pl.col("y").least_squares.rls(pl.col("x1"), pl.col("x2"), mode="coefficients")
                         .over("group").alias("coefficients"))

print(coefficients.head())
shape: (5, 6)
┌───────┬───────┬───────┬───────┬─────────┬─────────────────────┐
│ y     ┆ x1    ┆ x2    ┆ group ┆ weights ┆ coefficients        │
│ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---     ┆ ---                 │
│ f64   ┆ f64   ┆ f64   ┆ i64   ┆ f64     ┆ struct[2]           │
╞═══════╪═══════╪═══════╪═══════╪═════════╪═════════════════════╡
│ 1.16  ┆ 0.72  ┆ 0.24  ┆ 1     ┆ 0.34    ┆ {1.235503,0.411834} │
│ -2.16 ┆ -2.43 ┆ 0.18  ┆ 1     ┆ 0.97    ┆ {0.963515,0.760769} │
│ -1.57 ┆ -0.63 ┆ -0.95 ┆ 1     ┆ 0.39    ┆ {0.975484,0.966029} │
│ 0.21  ┆ 0.05  ┆ 0.23  ┆ 1     ┆ 0.8     ┆ {0.975657,0.953735} │
│ 0.22  ┆ -0.07 ┆ 0.44  ┆ 1     ┆ 0.57    ┆ {0.97898,0.909793}  │
└───────┴───────┴───────┴───────┴─────────┴─────────────────────┘

Finally, for convenience, in order to compute out-of-sample predictions you can use: least_squares.{predict, predict_from_formula}. This saves you the effort of un-nesting the coefficients and doing the dot product in python and instead does this in Rust, as an expression. Usage is as follows:

df_test.select(pl.col("coefficients_train").least_squares.predict(pl.col("x1"), pl.col("x2")).alias("predictions_test"))

Supported Models

Currently, this extension package supports the following variants:

  • Ordinary Least Squares: least_squares.ols
  • Weighted Least Squares: least_squares.wls
  • Regularized Least Squares (Lasso / Ridge / Elastic Net) least_squares.{lasso, ridge, elastic_net}
  • Non-negative Least Squares: least_squares.nnls

As well as efficient implementations of moving window models:

  • Recursive Least Squares: least_squares.rls
  • Rolling / Expanding Window OLS: least_squares.{rolling_ols, expanding_ols}

An arbitrary combination of sample_weights, L1/L2 penalties, and non-negativity constraints can be specified with the least_squares.from_formula and least_squares.least_squares entry-points.

Benchmark

The usual caveats of benchmarks apply here, but the below should still be indicative of the type of performance improvements to expect when using this package.

This benchmark was run on randomly generated data with pyperf on my Apple M2 Max macbook (32GB RAM, MacOS Sonoma 14.2.1). See benchmark.py for implementation.

n_samples=2_000, n_features=5

Model polars_ols Python Benchmark Benchmark Type Speed-up vs Python Benchmark
Least Squares 283 ± 4 us 509 ± 313 us Numpy 1.8x
Ridge 262 ± 3 us 369 ± 231 us Numpy 1.4x
Weighted Least Squares 493 ± 7 us 2.13 ms ± 0.22 ms Statsmodels 4.3x
Elastic Net 326 ± 3 us 87.3 ms ± 9.0 ms Sklearn 268.2x
Recursive Least Squares 1.39 ms ± 0.01 ms 22.3 ms ± 0.2 ms Statsmodels 16.0x
Rolling Least Squares 2.72 ms ± 0.03 ms 22.3 ms ± 0.2 ms Statsmodels 8.2x

n_samples=10_000, n_features=100

Model polars_ols Python Benchmark Benchmark Type Speed-up vs Python Benchmark
Least Squares 15.6 ms ± 0.2 ms 29.9 ms ± 8.6 ms Numpy 1.9x
Ridge 5.81 ms ± 0.05 ms 5.21 ms ± 0.94 ms Numpy 0.9x
Weighted Least Squares 16.8 ms ± 0.2 ms 82.4 ms ± 9.1 ms Statsmodels 4.9x
Elastic Net 20.9 ms ± 0.3 ms 134 ms ± 21 ms Sklearn 6.4x
Recursive Least Squares 163 ms ± 28 ms 3.99 sec ± 0.54 sec Statsmodels 24.5x
Rolling Least Squares 390 ms ± 10 ms 3.99 sec ± 0.54 sec Statsmodels 10.2x

Numpy's lstsq is already a highly optimized call into LAPACK and so the scope for speed-up is limited. However, we can achieve substantial speed-ups for the more complex models by working entirely in rust and avoiding overhead from back and forth into python.

Expect an additional relative order-of-magnitude speed up to your workflow if it involved repeated re-estimation of models in (python) loops.

Credits & Related Projects

  • Rust linear algebra libraries faer and ndarray support the implementations provided by this extension package
  • This package was templated around the very helpful: polars-plugin-tutorial
  • The python package patsy is used for (optionally) building models from formulae
  • Please check out the extension package polars-ds for general data-science functionality in polars

Future Work / TODOs

  • Support generic types, in rust implementations, so that both f32 and f64 types are recognized. Right now data is cast to f32 prior to estimation
  • Handle nulls more clearly
  • Add more detailed documentation on supported models, signatures, and API

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_ols-0.2.3.tar.gz (53.8 kB view details)

Uploaded Source

Built Distributions

polars_ols-0.2.3-cp310-abi3-win_amd64.whl (4.0 MB view details)

Uploaded CPython 3.10+ Windows x86-64

polars_ols-0.2.3-cp310-abi3-win32.whl (3.4 MB view details)

Uploaded CPython 3.10+ Windows x86

polars_ols-0.2.3-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB view details)

Uploaded CPython 3.10+ manylinux: glibc 2.17+ x86-64

polars_ols-0.2.3-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (7.9 MB view details)

Uploaded CPython 3.10+ manylinux: glibc 2.17+ i686

polars_ols-0.2.3-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (7.0 MB view details)

Uploaded CPython 3.10+ manylinux: glibc 2.17+ ARMv7l

polars_ols-0.2.3-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.10+ manylinux: glibc 2.17+ ARM64

polars_ols-0.2.3-cp310-abi3-macosx_11_0_arm64.whl (3.8 MB view details)

Uploaded CPython 3.10+ macOS 11.0+ ARM64

polars_ols-0.2.3-cp310-abi3-macosx_10_12_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.10+ macOS 10.12+ x86-64

File details

Details for the file polars_ols-0.2.3.tar.gz.

File metadata

  • Download URL: polars_ols-0.2.3.tar.gz
  • Upload date:
  • Size: 53.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.5.1

File hashes

Hashes for polars_ols-0.2.3.tar.gz
Algorithm Hash digest
SHA256 e86cea3e24d7a2756f80b986dbc7bc79a375a6c7e99238700af3a07e1c3eca1b
MD5 92803e661fc1d5f7e5df1e3bd8149afc
BLAKE2b-256 2ae358202a47b5350dfac7c273561461c23d20097010c719a4c0bbeace875d6e

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.3-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.3-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 389431683f5ebdbb5d17b15104a32399fef1ac70ef59c9b5a9485049ebfd8633
MD5 0003c77315c87e0be88165f00deddf74
BLAKE2b-256 c2077f9719556c7b002950258ea770372da6c5b6a621da6cdf24f84d38e3ecca

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.3-cp310-abi3-win32.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.3-cp310-abi3-win32.whl
Algorithm Hash digest
SHA256 ffbde7de5c38e7db038a58547e0df1bc6c7ab83f43faecc2981c913c1d1c7dbc
MD5 14953323eccc85fad1877e478c9d4e08
BLAKE2b-256 0df4a42c3a3aa2b185511f100a746b2a4fb370ca7575ad7e1e9bf4b898b4ef4b

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.3-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.3-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 983893ead522cfcb69a8dc6611066a7aae259794a8a31bb5e6e99aebc00d0c53
MD5 f80627b903b428e0257684967efa7cc8
BLAKE2b-256 bc535739d3d8b84539c2788d5be09e0adfbd443d7500866dbbd6623ad04dc552

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.3-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.3-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 a8a2e095cc8650befd84ac39d21fc355da58e867a6f5f9878b46be0fbe331f40
MD5 7062e1049084eafda728037a7d68d826
BLAKE2b-256 f3cb62e2403db06dffa14b17678085823fda5a10fc1bbe6908d32b5d396db780

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.3-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.3-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 87ff5180a1ee7b984a6085c9f39498e52315b4b2cf548e4a16fb4a2604d5b03e
MD5 4bba1a02c640c536c2ba5e7148e603c2
BLAKE2b-256 44e80a1c34909e56e3c77c6a34a3ad8e3fc6dc89921a1e75244d8d7657e8c0f7

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.3-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.3-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c9596140edc45d10cd0e048a8175e66373a64fab5787a17388c354dadd676d75
MD5 50b62387926f4f9965b858f0a5536688
BLAKE2b-256 b6a897bd838b9ecc986bfda194f935603bb8b39f43899213bfd8f3dd7f7aeb68

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.3-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.3-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8cc6d5166a60b125c5d23f2ec022f03a847735904d6dbe2048f9e88f72926fdd
MD5 76915c1c2abeba5379fd9a3aa9e954f8
BLAKE2b-256 d7339b012cc87dd15e7f8940e4cd1f3699c53d16b8322475728137b31178fa60

See more details on using hashes here.

File details

Details for the file polars_ols-0.2.3-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_ols-0.2.3-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 082cccbe2135956ab9dbd1167ad57b93c5fe1e87ec03fedfe5fa640e8fd2f154
MD5 86c5679cd846ded3aacaba3a9b1ef60a
BLAKE2b-256 fcf99e6b08bfc5f7198a004fb3b694b3677954cae7df4c18a309f9af3d6d38d2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page