Skip to main content

No project description provided

Project description

Polars OLS

Least squares extension in Polars

Supports linear model estimation in Polars.

This package provides efficient rust implementations of common linear regression variants (OLS, WLS, Ridge, Elastic Net, Non-negative least squares, Recursive least squares) and exposes them as simple polars expressions which can easily be integrated into your workflow.

Why?

  1. High Performance: implementations are written in rust and make use of optimized rust linear-algebra crates & LAPACK routines. See benchmark section.
  2. Polars Integration: avoids unnecessary conversions from lazy to eager mode and to external libraries (e.g. numpy, sklearn) to do simple linear regressions. Chain least squares formulae like any other expression in polars.
  3. Efficient Implementations:
    • Numerically stable algorithms are chosen where appropriate (e.g. QR, Cholesky).
    • Flexible model specification allows arbitrary combination of sample weighting, L1/L2 regularization, & non-negativity constraints on parameters.
    • Efficient rank-1 update algorithms used for moving window regressions.
  4. Easy Parallelism: Computing OLS predictions, in parallel, across groups can not be easier: call .over() or group_by just like any other polars' expression and benefit from full Rust parallelism.
  5. Formula API: supports building models via patsy syntax: y ~ x1 + x2 + x3:x4 -1 (like statsmodels) which automatically converts to equivalent polars expressions.

Installation

First, you need to install Polars. Then run the below to install the polars-ols extension:

pip install polars-ols

API & Examples

Importing polars_ols will register the namespace least_squares provided by this package. You can build models either by either specifying polars expressions (e.g. pl.col(...)) for your targets and features or using the formula api (patsy syntax). All models support the following general (optional) arguments:

  • mode - a literal which determines the type of output produced by the model
  • add_intercept - a boolean specifying if an intercept feature should be added to the features
  • sample_weights - a column or expression providing non-negative weights applied to the samples

Remaining parameters are model specific, for example alpha penalty parameter used by regularized least squares models.

See below for basic usage examples. Please refer to the tests or demo notebook for detailed examples.

import polars as pl
import polars_ols as pls  # registers 'least_squares' namespace

df = pl.DataFrame({"y": [1.16, -2.16, -1.57, 0.21, 0.22, 1.6, -2.11, -2.92, -0.86, 0.47],
                   "x1": [0.72, -2.43, -0.63, 0.05, -0.07, 0.65, -0.02, -1.64, -0.92, -0.27],
                   "x2": [0.24, 0.18, -0.95, 0.23, 0.44, 1.01, -2.08, -1.36, 0.01, 0.75],
                   "group": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
                   "weights": [0.34, 0.97, 0.39, 0.8, 0.57, 0.41, 0.19, 0.87, 0.06, 0.34],
                   })

lasso_expr = pl.col("y").least_squares.lasso(pl.col("x1"), pl.col("x2"), alpha=0.0001, add_intercept=True).over("group")
wls_expr = pls.least_squares_from_formula("y ~ x1 + x2 -1", sample_weights=pl.col("weights"))

predictions = df.with_columns(lasso_expr.round(2).alias("predictions_lasso"),
                              wls_expr.round(2).alias("predictions_wls"))

print(predictions.head(5))
shape: (5, 7)
┌───────┬───────┬───────┬───────┬─────────┬───────────────────┬─────────────────┐
│ y     ┆ x1    ┆ x2    ┆ group ┆ weights ┆ predictions_lasso ┆ predictions_wls │
│ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---     ┆ ---               ┆ ---             │
│ f64   ┆ f64   ┆ f64   ┆ i64   ┆ f64     ┆ f32               ┆ f32             │
╞═══════╪═══════╪═══════╪═══════╪═════════╪═══════════════════╪═════════════════╡
│ 1.16  ┆ 0.72  ┆ 0.24  ┆ 1     ┆ 0.34    ┆ 0.97              ┆ 0.93            │
│ -2.16 ┆ -2.43 ┆ 0.18  ┆ 1     ┆ 0.97    ┆ -2.23             ┆ -2.18           │
│ -1.57 ┆ -0.63 ┆ -0.95 ┆ 1     ┆ 0.39    ┆ -1.54             ┆ -1.54           │
│ 0.21  ┆ 0.05  ┆ 0.23  ┆ 1     ┆ 0.8     ┆ 0.29              ┆ 0.27            │
│ 0.22  ┆ -0.07 ┆ 0.44  ┆ 1     ┆ 0.57    ┆ 0.37              ┆ 0.36            │
└───────┴───────┴───────┴───────┴─────────┴───────────────────┴─────────────────┘

The mode parameter is used to set the type of the output returned by all methods ("predictions", "residuals", "coefficients"). It defaults to returning predictions matching the input's length.

In case "coefficients" is set the output's shape is the number of features specified, see below:

coefficients = df.select(pl.col("y").least_squares.from_formula("x1 + x2", mode="coefficients")
                         .alias("coefficients").round(2))
print(coefficients)
shape: (3, 1)
┌──────────────┐
│ coefficients │
│ ---          │
│ f32          │
╞══════════════╡
│ 0.98         │ <-- x1
│ 0.99         │ <-- x2
│ 0.0          │ <-- intercept added by formula api
└──────────────┘

Supported Models

Currently, this extension package supports the following variants:

  • Ordinary Least Squares: least_squares.ols
  • Weighted Least Squares: least_squares.wls
  • Regularized Least Squares (Lasso / Ridge / Elastic Net) least_squares.{lasso, ridge, elastic_net}
  • Non-negative Least Squares: least_squares.nnls

As well as efficient implementations of moving window models:

  • Recursive Least Squares: least_squares.rls
  • Rolling / Expanding Window OLS: least_squares.{rolling_ols, expanding_ols}

An arbitrary combination of sample_weights, L1/L2 penalties, and non-negativity constraints can be specified with the least_squares.from_formula and least_squares.least_squares entry-points.

Benchmark

The usual caveats of benchmarks apply here, but the below should still be indicative of the type of performance improvements to expect when using this package.

This benchmark was run on randomly generated data with pyperf on my Apple M2 Max macbook (32GB RAM, MacOS Sonoma 14.2.1). See benchmark.py for implementation.

n_samples=2_000, n_features=5

Model polars_ols Python Benchmark Benchmark Type Speed-up vs Python Benchmark
Least Squares 283 ± 4 us 509 ± 313 us Numpy 1.8x
Ridge 262 ± 3 us 369 ± 231 us Numpy 1.4x
Weighted Least Squares 493 ± 7 us 2.13 ms ± 0.22 ms Statsmodels 4.3x
Elastic Net 326 ± 3 us 87.3 ms ± 9.0 ms Sklearn 268.2x
Recursive Least Squares 1.39 ms ± 0.01 ms 22.3 ms ± 0.2 ms Statsmodels 16.0x
Rolling Least Squares 2.72 ms ± 0.03 ms 22.3 ms ± 0.2 ms Statsmodels 8.2x

n_samples=10_000, n_features=100

Model polars_ols Python Benchmark Benchmark Type Speed-up vs Python Benchmark
Least Squares 15.6 ms ± 0.2 ms 29.9 ms ± 8.6 ms Numpy 1.9x
Ridge 5.81 ms ± 0.05 ms 5.21 ms ± 0.94 ms Numpy 0.9x
Weighted Least Squares 16.8 ms ± 0.2 ms 82.4 ms ± 9.1 ms Statsmodels 4.9x
Elastic Net 20.9 ms ± 0.3 ms 134 ms ± 21 ms Sklearn 6.4x
Recursive Least Squares 163 ms ± 28 ms 3.99 sec ± 0.54 sec Statsmodels 24.5x
Rolling Least Squares 390 ms ± 10 ms 3.99 sec ± 0.54 sec Statsmodels 10.2x

Numpy's lstsq is already a highly optimized call into LAPACK and so the scope for speed-up is limited. However, we can achieve substantial speed-ups for the more complex models by working entirely in rust and avoiding overhead from back and forth into python.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_ols-0.1.1.tar.gz (48.0 kB view details)

Uploaded Source

Built Distributions

polars_ols-0.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.7 MB view details)

Uploaded CPython 3.10+ manylinux: glibc 2.17+ x86-64

polars_ols-0.1.1-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (7.9 MB view details)

Uploaded CPython 3.10+ manylinux: glibc 2.17+ i686

polars_ols-0.1.1-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (7.0 MB view details)

Uploaded CPython 3.10+ manylinux: glibc 2.17+ ARMv7l

polars_ols-0.1.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.10+ manylinux: glibc 2.17+ ARM64

polars_ols-0.1.1-cp310-abi3-macosx_11_0_arm64.whl (3.8 MB view details)

Uploaded CPython 3.10+ macOS 11.0+ ARM64

polars_ols-0.1.1-cp310-abi3-macosx_10_12_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.10+ macOS 10.12+ x86-64

File details

Details for the file polars_ols-0.1.1.tar.gz.

File metadata

  • Download URL: polars_ols-0.1.1.tar.gz
  • Upload date:
  • Size: 48.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.5.1

File hashes

Hashes for polars_ols-0.1.1.tar.gz
Algorithm Hash digest
SHA256 45141c92a6dcb3370444d52e3f7b536ca91fe8f7c11efffea22712fdad46900d
MD5 91c257b428a39e36b778d123e9b4f1f6
BLAKE2b-256 1254864d92a88876e4e8068b1d6efa50e8518fe30f8fd60f650b5349ec942f5a

See more details on using hashes here.

File details

Details for the file polars_ols-0.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_ols-0.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2aa16969f758dea791ae5b157ccccdeb761942c5ff8fae8e29f52cdb45bfb6e0
MD5 a6bba41c1e30423f9cd485b9da0b4f4d
BLAKE2b-256 63eb187251f4100d5a801a9bee79959e97a8172df2e702acda3c1b9cc1b42910

See more details on using hashes here.

File details

Details for the file polars_ols-0.1.1-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for polars_ols-0.1.1-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 293c75b18c21bf86844478d6a4d475e5b9f85c923b56ab33666618972b85bae0
MD5 c5450ba1482f79c6a1c0caafaa8fed0c
BLAKE2b-256 05a018e4980f6c99bdf2f5675820b6fffb040d102cbd50f036e3e644fece1a57

See more details on using hashes here.

File details

Details for the file polars_ols-0.1.1-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for polars_ols-0.1.1-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 f75ce14c05097c17d40520c6ff7bc62b68f41d55c159582f79b99265aa0fda26
MD5 61dc5ad8250e1ba63301c85897f2acbf
BLAKE2b-256 0d1567c0add30571bee639d8079542b9b7e3e7a20a2d9ec9ca59c363782ccc69

See more details on using hashes here.

File details

Details for the file polars_ols-0.1.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_ols-0.1.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4a1454eed5356892a854ea6a9215ad5f236b74a4f1803f446a608b2fd1d53398
MD5 3ec28412d630d4d0efdd72bbf2f48b7f
BLAKE2b-256 4627dad15e3abd6b0d113ef7cfa89638e7f4b4a4a6da4e1b441c51b3777731fb

See more details on using hashes here.

File details

Details for the file polars_ols-0.1.1-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_ols-0.1.1-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7639260e39f398d36df63bd68f33205fabb13b51a5b985ddd9a84df3b2f9cabe
MD5 1f3eabb200bcb803b14386a5568a89f1
BLAKE2b-256 fc0441144ebd46812315641d1bc4cd5bd3286e27f19f1cd960c0e4ea279600d9

See more details on using hashes here.

File details

Details for the file polars_ols-0.1.1-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_ols-0.1.1-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 85b3e91a9d9deb046460e382d4327de77b0d5483c25a0b7277af4cd9b88f9a12
MD5 ddffe4f8e21c610eb1b4fe082b696b6a
BLAKE2b-256 ef5d25c65be098dbf13c3eddeb7c37cea08d4e063437a524f0009ed08597641c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page