Skip to main content

Python bindings for fiasto - A language-agnostic modern Wilkinson's formula parser and lexer

Project description

fiasto-py

PyPI version Python versions License: MIT

fiasto-py

logo


Pronouned like fiasco, but with a t instead of a c


(F)ormulas (I)n (AST) (O)ut

Python bindings for fiasto - A language-agnostic modern Wilkinson's formula parser and lexer.

๐ŸŽฏ Features

  • Parse Wilkinson's Formulas: Convert formula strings into structured JSON metadata
  • Tokenize Formulas: Break down formulas into individual tokens with detailed information
  • Python Dictionaries: Returns native Python dictionaries for easy integration

๐ŸŽฏ Simple API

  • parse_formula() - Takes a Wilkinsonโ€™s formula string and returns a Python dictionary
  • lex_formula() - Tokenizes a formula string and returns a Python dictionary

๐Ÿš€ Quick Start

Installation

Install from PyPI (recommended):

pip install fiasto-py

Usage

Usage: Parse Formula

import fiasto_py
from pprint import pprint
# Parse a formula into structured metadata
print("="*30)
print("Parse Formula")
print("="*30)
result = fiasto_py.parse_formula("y ~ x1 + x2 + (1|group)")
pprint(result, compact = True)

Output:

==============================
Parse Formula
==============================
{'all_generated_columns': ['y', 'x1', 'x2', 'group'],
 'columns': {'group': {'generated_columns': ['group'],
                       'id': 4,
                       'interactions': [],
                       'random_effects': [{'correlated': True,
                                           'grouping_variable': 'group',
                                           'has_intercept': True,
                                           'includes_interactions': [],
                                           'kind': 'grouping',
                                           'variables': []}],
                       'roles': ['GroupingVariable'],
                       'transformations': []},
             'x1': {'generated_columns': ['x1'],
                    'id': 2,
                    'interactions': [],
                    'random_effects': [],
                    'roles': ['FixedEffect'],
                    'transformations': []},
             'x2': {'generated_columns': ['x2'],
                    'id': 3,
                    'interactions': [],
                    'random_effects': [],
                    'roles': ['FixedEffect'],
                    'transformations': []},
             'y': {'generated_columns': ['y'],
                   'id': 1,
                   'interactions': [],
                   'random_effects': [],
                   'roles': ['Response'],
                   'transformations': []}},
 'formula': 'y ~ x1 + x2 + (1|group)',
 'metadata': {'family': None,
              'has_intercept': True,
              'has_uncorrelated_slopes_and_intercepts': False,
              'is_random_effects_model': True}}

Usage: Lex Formula

import fiasto_py
from pprint import pprint
print("="*30)
print("Lex Formula")
print("="*30)
tokens = fiasto_py.lex_formula("y ~ x1 + x2 + (1|group)")
pprint(tokens, compact = True)

Output:

==============================
Lex Formula
==============================
[{'lexeme': 'y', 'token': 'ColumnName'},
 {'lexeme': '~', 'token': 'Tilde'},
 {'lexeme': 'x1', 'token': 'ColumnName'},
 {'lexeme': '+', 'token': 'Plus'},
 {'lexeme': 'x2', 'token': 'ColumnName'},
 {'lexeme': '+', 'token': 'Plus'},
 {'lexeme': '(', 'token': 'FunctionStart'},
 {'lexeme': '1', 'token': 'One'},
 {'lexeme': '|', 'token': 'Pipe'},
 {'lexeme': 'group', 'token': 'ColumnName'},
 {'lexeme': ')', 'token': 'FunctionEnd'}]

Simple OLS Regression

import fiasto_py
import polars as pl
import numpy as np
from pprint import pprint

# Load data
mtcars_path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
df = pl.read_csv(mtcars_path)

# Parse formula
formula = "mpg ~ wt + cyl"
result = fiasto_py.parse_formula(formula)

pprint(result)

# Find the response column(s)
response_cols = [
    col for col, details in result["columns"].items()
    if "Response" in details["roles"]
]

# Find non-response columns
preds = [
    col for col, details in result["columns"].items()
    if "Response" not in details["roles"]
]

# Has intercept
has_intercept = result["metadata"]["has_intercept"]

# Prepare data matrices
X = df.select(preds).to_numpy()
y = df.select(response_cols).to_numpy().ravel()

# Add intercept if metadata says so
if has_intercept:
    X_with_intercept = np.column_stack([np.ones(X.shape[0]), X])
else:
    X_with_intercept = X

# Solve normal equations: (X'X)^-1 X'y
XTX = X_with_intercept.T @ X_with_intercept
XTy = X_with_intercept.T @ y
coefficients = np.linalg.solve(XTX, XTy)

# Extract intercept and slopes
if has_intercept:
    intercept = coefficients[0]
    slopes = coefficients[1:]
else:
    intercept = 0.0
    slopes = coefficients

# Calculate R2
y_pred = X_with_intercept @ coefficients
ss_res = np.sum((y - y_pred) ** 2)
ss_tot = np.sum((y - np.mean(y)) ** 2)
r_squared = 1 - (ss_res / ss_tot)

# Prep Output
# Combine intercept and slopes into one dict
coef_dict = {"intercept": intercept} | dict(zip(preds, slopes))

# Create a tidy DataFrame
coef_df = pl.DataFrame(
    {
        "term": list(coef_dict.keys()),
        "estimate": list(coef_dict.values())
    }
)

# Print results
print(f"Formula: {formula}")
print(f"Rยฒ Score: {r_squared:.3f}")
print(coef_df)

Output:

{'all_generated_columns': ['mpg', 'intercept', 'wt', 'cyl'],
 'all_generated_columns_formula_order': {'1': 'mpg',
                                         '2': 'intercept',
                                         '3': 'wt',
                                         '4': 'cyl'},
 'columns': {'cyl': {'generated_columns': ['cyl'],
                     'id': 3,
                     'interactions': [],
                     'random_effects': [],
                     'roles': ['Identity'],
                     'transformations': []},
             'mpg': {'generated_columns': ['mpg'],
                     'id': 1,
                     'interactions': [],
                     'random_effects': [],
                     'roles': ['Response'],
                     'transformations': []},
             'wt': {'generated_columns': ['wt'],
                    'id': 2,
                    'interactions': [],
                    'random_effects': [],
                    'roles': ['Identity'],
                    'transformations': []}},
 'formula': 'mpg ~ wt + cyl',
 'metadata': {'family': None,
              'has_intercept': True,
              'has_uncorrelated_slopes_and_intercepts': False,
              'is_random_effects_model': False,
              'response_variable_count': 1}}
Formula: mpg ~ wt + cyl
Rยฒ Score: 0.830
shape: (3, 2)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ term      โ”† estimate  โ”‚
โ”‚ ---       โ”† ---       โ”‚
โ”‚ str       โ”† f64       โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ intercept โ”† 39.686261 โ”‚
โ”‚ cyl       โ”† -1.507795 โ”‚
โ”‚ wt        โ”† -3.190972 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“‹ Supported Formula Syntax

fiasto supports comprehensive Wilkinson's notation including:

  • Basic formulas: y ~ x1 + x2
  • Interactions: y ~ x1 * x2
  • Smooth terms: y ~ s(z)
  • Random effects: y ~ x + (1|group)
  • Complex random effects: y ~ x + (1+x|group)

Supported Formulas (Coming Soon)

  • Multivariate models: mvbind(y1, y2) ~ x + (1|g)
  • Non-linear models: y ~ a1 - a2^x, a1 ~ 1, a2 ~ x + (x|g), nl = TRUE

For the complete reference, see the fiasto documentation.

๐Ÿ“ฆ PyPI Package

The package is available on PyPI and can be installed with:

pip install fiasto-py

๐Ÿ“š API Reference

parse_formula(formula: str) -> dict

Parse a Wilkinson's formula string and return structured JSON metadata.

Parameters:

  • formula (str): The formula string to parse

Returns:

  • dict: Structured metadata describing the formula

Raises:

  • ValueError: If the formula is invalid or parsing fails

lex_formula(formula: str) -> dict

Tokenize a formula string and return JSON describing each token.

Parameters:

  • formula (str): The formula string to tokenize

Returns:

  • dict: Token information for each element in the formula

Raises:

  • ValueError: If the formula is invalid or lexing fails

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

๐Ÿ™ Acknowledgments

  • fiasto - The underlying Rust library
  • PyO3 - Python-Rust bindings
  • maturin - Build system for Python extensions
  • PyPI - Python Package Index for distribution

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fiasto_py-0.1.4-cp313-cp313-macosx_11_0_arm64.whl (292.1 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

File details

Details for the file fiasto_py-0.1.4-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fiasto_py-0.1.4-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 079ee52330e7b3fc6f0dc8903bf0ed43e2f09e0cc042469bee20c82e62e8b294
MD5 f0ca0e756770db5fc3dd96f69b0ef796
BLAKE2b-256 43c09b6240ffdda6b12b1bbad7038a846a9a7a822a7d7de597f3237aae0abd5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page