Skip to main content

An implementation of Wilkinson formulas.

Project description

Formulaic

PyPI - Version PyPI - Python Version PyPI - Status build docs codecov Code Style

Formulaic is a high-performance implementation of Wilkinson formulas for Python.

It provides:

  • high-performance dataframe to model-matrix conversions.
  • support for reusing the encoding choices made during conversion of one data-set on other datasets.
  • extensible formula parsing.
  • extensible data input/output plugins, with implementations for:
    • input:
      • pandas.DataFrame
      • Any dataframe representation supported by narwhals including
        • pyarrow.Table
        • polars.DataFrame
        • ...
    • output:
      • pandas.DataFrame
      • numpy.ndarray
      • scipy.sparse.CSCMatrix
      • narwhals dataframe passthrough when using narwhals dataframes.
  • support for symbolic differentiation of formulas (and hence model matrices).
  • and much more.

Example code

import pandas
from formulaic import Formula

df = pandas.DataFrame({
    'y': [0, 1, 2],
    'x': ['A', 'B', 'C'],
    'z': [0.3, 0.1, 0.2],
})

y, X = Formula('y ~ x + z').get_model_matrix(df)

y =

y
0 0
1 1
2 2

X =

Intercept x[T.B] x[T.C] z
0 1.0 0 0 0.3
1 1.0 1 0 0.1
2 1.0 0 1 0.2

Note that the above can be short-handed to:

from formulaic import model_matrix
model_matrix('y ~ x + z', df)

Benchmarks

Formulaic typically outperforms R for both dense and sparse model matrices, and vastly outperforms patsy (the existing implementation for Python) for dense matrices (patsy does not support sparse model matrix output).

Benchmarks

For more details, see here.

Related projects and prior art

  • Patsy: a prior implementation of Wilkinson formulas for Python, which is widely used (e.g. in statsmodels). It has fantastic documentation (which helped bootstrap this project), and a rich array of features.
  • StatsModels.jl @formula: The implementation of Wilkinson formulas for Julia.
  • R Formulas: The implementation of Wilkinson formulas for R, which is thoroughly introduced here. [R itself is an implementation of S, in which formulas were first made popular].
  • The work that started it all: Wilkinson, G. N., and C. E. Rogers. Symbolic description of factorial models for analysis of variance. J. Royal Statistics Society 22, pp. 392–399, 1973.

Used by

Below are some of the projects that use Formulaic:

  • Glum: High performance Python GLM's with all the features.
  • Lifelines: Survival analysis in Python.
  • Linearmodels: Additional linear models including instrumental variable and panel data models that are missing from statsmodels.
  • Pyfixest: Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax.
  • Tabmat: Efficient matrix representations for working with tabular data.
  • Add your project here!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

formulaic-1.2.1.tar.gz (655.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

formulaic-1.2.1-py3-none-any.whl (117.3 kB view details)

Uploaded Python 3

File details

Details for the file formulaic-1.2.1.tar.gz.

File metadata

  • Download URL: formulaic-1.2.1.tar.gz
  • Upload date:
  • Size: 655.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for formulaic-1.2.1.tar.gz
Algorithm Hash digest
SHA256 dc79476baa2d811b35798893eb2f2c1e51edee8d7a9c1429b400e56f4e0beccc
MD5 81f04a3fd44265e80d78c14cb4792fee
BLAKE2b-256 6a8b8038d2af289a5cc194fa0a255fe964a1a04e0e6ca4426aed8841a4b571e6

See more details on using hashes here.

File details

Details for the file formulaic-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: formulaic-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 117.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for formulaic-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 661d6d2467aa961b9afb3a1e2a187494239793c63eb729e422d1307afa98b43b
MD5 8c6e32653a4186659e8891e63e285a93
BLAKE2b-256 1a9dc2c8b51b32f829a16fe042db30ad1dcef7947bf1dcf77c2cfd7b6f37b83a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page