Skip to main content

Symbolic likelihood models for statistical inference

Project description

symlik

Symbolic Likelihood Models for Statistical Inference

symlik lets you define statistical models symbolically and automatically derives everything needed for inference: score functions, Hessians, Fisher information, standard errors, and maximum likelihood estimates.

Documentation Python 3.8+ License: MIT

Why symlik?

Traditional statistical computing requires manually deriving score functions and information matrices, or relying on numerical approximations. symlik takes a different approach: write the log-likelihood symbolically, and let the computer handle the calculus.

from symlik.distributions import exponential

model = exponential()
data = {'x': [1.2, 0.8, 2.1, 1.5]}

mle, _ = model.mle(data=data, init={'lambda': 1.0})
se = model.se(mle, data)

print(f"Rate: {mle['lambda']:.3f} ± {se['lambda']:.3f}")
# Rate: 0.714 ± 0.357

Installation

pip install symlik

Requirements: Python 3.8+, NumPy, and rerum (symbolic rewriting engine).

Key Features

  • Symbolic differentiation - Automatic score functions and Hessians
  • Heterogeneous data - Handle censoring, masking, and mixed observation types
  • Series systems - Reliability analysis for multi-component systems
  • DataFrame support - Works with pandas, polars, or plain dicts
  • Extensible - Custom distributions, operators, and hazard functions

Quick Start

Pre-built Distributions

from symlik.distributions import normal, poisson, exponential

# Normal distribution
model = normal()
mle, _ = model.mle(
    data={'x': [4.2, 5.1, 4.8, 5.3, 4.9]},
    init={'mu': 0, 'sigma2': 1},
    bounds={'sigma2': (0.01, None)}
)

Custom Models

Define log-likelihoods using s-expressions:

from symlik import LikelihoodModel

# Exponential: ℓ(λ) = Σ[log(λ) - λxᵢ]
log_lik = ['sum', 'i', ['len', 'x'],
           ['+', ['log', 'lambda'],
            ['*', -1, ['*', 'lambda', ['@', 'x', 'i']]]]]

model = LikelihoodModel(log_lik, params=['lambda'])

# Symbolic derivatives available
score = model.score()       # Gradient
hess = model.hessian()      # Hessian matrix
info = model.information()  # Fisher information

Heterogeneous Data (Censoring)

Handle mixed observation types with ContributionModel:

from symlik import ContributionModel
from symlik.contributions import complete_exponential, right_censored_exponential

model = ContributionModel(
    params=["lambda"],
    type_column="status",
    contributions={
        "observed": complete_exponential(),
        "censored": right_censored_exponential(),
    }
)

data = {
    "status": ["observed", "censored", "observed", "observed", "censored"],
    "t": [1.2, 3.0, 0.8, 2.1, 4.5],
}

mle, _ = model.mle(data=data, init={"lambda": 1.0})

Series System Reliability

Model multi-component systems with known cause, masked cause, or censored observations:

from symlik import ContributionModel
from symlik.series import build_exponential_series_contributions

# 3-component series system
contribs = build_exponential_series_contributions(m=3)

model = ContributionModel(
    params=["lambda1", "lambda2", "lambda3"],
    type_column="obs_type",
    contributions=contribs,
)

# Handles: known_1, known_2, known_3, masked_12, masked_13,
#          masked_23, masked_123, right_censored

Masked cause observations satisfy C1, C2, C3 conditions:

  • C1: True cause always in candidate set
  • C2: Candidate set probability independent of which component failed
  • C3: Masking probabilities independent of parameters

DataFrame Support

Works seamlessly with pandas and polars:

import pandas as pd

df = pd.DataFrame({
    "status": ["observed", "censored", "observed"],
    "t": [1.2, 3.0, 0.8],
})

mle, _ = model.mle(data=df, init={"lambda": 1.0})  # Just works!

S-Expression Syntax

symlik uses s-expressions to represent mathematical formulas:

Expression Meaning
['+', 'x', 'y'] x + y
['*', 2, 'x'] 2x
['^', 'x', 2]
['/', 'x', 'y'] x / y
['log', 'x'] ln(x)
['exp', 'x']
['sum', 'i', 'n', body] Σᵢ₌₁ⁿ body
['@', 'x', 'i'] xᵢ (1-based indexing)
['len', 'x'] length of x

Available Distributions

Distribution Parameters Use Case
exponential() λ (rate) Waiting times, lifetimes
normal() μ, σ² Continuous measurements
poisson() λ Count data
bernoulli() p Binary outcomes
binomial() p Proportion estimation
gamma() α, β Positive continuous data
weibull() k, λ Reliability, survival
beta() α, β Proportions, probabilities

Contribution Types

Function Formula Use Case
complete_exponential() log(λ) - λt Observed failure
right_censored_exponential() -λt Survived past t
left_censored_exponential() log(1 - e⁻λᵗ) Failed before t
complete_weibull() Weibull density Observed failure
series_exponential_known_cause() Known component Reliability
series_exponential_masked_cause() Candidate set Masked cause

Series System Components

Build custom reliability models:

from symlik.series import (
    exponential_component,
    weibull_component,
    exponential_ph_component,  # Proportional hazards
    custom_component,          # Any hazard function
    series_known_cause,
    series_masked_cause,
)

# Mixed system: exponential + Weibull
components = [
    exponential_component("lambda"),
    weibull_component("k", "theta"),
]

# Covariate-dependent hazards
comp = exponential_ph_component(
    baseline_rate="lambda0",
    coefficients=["beta"],
    covariates=["temperature"],
)

Direct Calculus Operations

Use symlik's calculus module for standalone symbolic math:

from symlik import diff, gradient, hessian, simplify

# Differentiate x³ + 2x
expr = ['+', ['^', 'x', 3], ['*', 2, 'x']]
deriv = diff(expr, 'x')  # 3x² + 2

# Gradient of f(x,y) = x² + xy
expr = ['+', ['^', 'x', 2], ['*', 'x', 'y']]
grad = gradient(expr, ['x', 'y'])  # [2x + y, x]

Documentation

Full documentation: queelius.github.io/symlik

License

MIT License. See LICENSE for details.

Contributing

Contributions welcome! Please open an issue or pull request on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

symlik-0.2.0.tar.gz (48.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

symlik-0.2.0-py3-none-any.whl (27.5 kB view details)

Uploaded Python 3

File details

Details for the file symlik-0.2.0.tar.gz.

File metadata

  • Download URL: symlik-0.2.0.tar.gz
  • Upload date:
  • Size: 48.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for symlik-0.2.0.tar.gz
Algorithm Hash digest
SHA256 dc9a101e98a043c00411385342e1ae0e4f4ca51ef4c9ed81604f7931991ddf3a
MD5 8140b932c5984dca81876968bae77bd4
BLAKE2b-256 d5d487bf6df006d68f20cb79c8c0647b488b20a80c2dd7bb93b27599175d15bb

See more details on using hashes here.

File details

Details for the file symlik-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: symlik-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 27.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for symlik-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 79070609a0808e387d7bccf454840851e44051f94cffc5a97c0b4443fcba5b99
MD5 ead7227957af211d8ba670c89671039d
BLAKE2b-256 abc95d1d14052127b7f08b9cd6a558ac9417faf976e331cbe789f2a1fbe0e3c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page