Skip to main content

Bayesian causal inference for zero-inflated outcomes — GPU-accelerated joint hurdle BCF with SBC calibration

Project description

pytyche

GPU-accelerated Bayesian causal forests for zero-inflated outcomes — and an adaptive, round-based experiment loop built on top of them.

PyPI version Python versions License: MIT Docs

pytyche does two things. First, it ships some of the fastest Bayesian Causal Forest (BCF) estimators available: continuous and binary effects run on the GPU via bartz, and hurdle outcomes (revenue, spend, and other "mostly-zero, sometimes-positive" metrics) run on pytyche's own GPU kernel. Any of these can be used standalone for a single fit — give it data, get back a calibrated posterior over heterogeneous treatment effects. Second, it wraps those estimators in a round-based adaptive experiment loop that allocates the next round's traffic toward the segments that respond, while keeping controls everywhere so measurement stays honest. The whole loop runs on a single GPU.

The speed is what makes the rest practical: BCF intervals at production scale need empirical recalibration (simulation-based calibration across realistic data), which means hundreds of full posterior fits. On a GPU that is an overnight job instead of a CPU-week, so calibration becomes something you do per-deployment rather than per-publication.

Install

# Recommended — GPU JAX (CUDA 12, Linux)
uv add 'pytyche[gpu]'      # or: pip install 'pytyche[gpu]'

# CPU-only (fully functional; the first fit warns once if no GPU is found)
uv add pytyche             # or: pip install pytyche

Check the runtime with python -c "import pytyche as pt; pt.check_setup()".

Quick start

Fit the canonical hurdle model on an 800-visitor synthetic dataset in about 20 seconds on JAX-CPU:

import os; os.environ["JAX_PLATFORMS"] = "cpu"  # omit for GPU
import pytyche as pt

bundle = pt.generate(n_visitors=800, segments={
    "responders":     {"pct": 0.4, "base_conv": 0.08, "treatment_effect": 0.10,
                       "aov_mu": 3.5, "aov_sigma": 0.5, "treatment_aov_mu_shift": 0.15},
    "non_responders": {"pct": 0.6, "base_conv": 0.06, "treatment_effect": 0.0,
                       "aov_mu": 3.3, "aov_sigma": 0.5, "treatment_aov_mu_shift": 0.0},
}, metric="revenue_per_visitor", seed=0)

result = pt.fit(bundle.observed, num_burnin=40, num_mcmc=80, num_trees_mu=30,
                num_trees_tau=15, max_depth=4, num_gfr_sweeps=2,
                diagnostic_interval=20, seed=0)
result.analyze()  # treatment comparisons, discovered segments, recommendation
# result.rpv_cate_samples → (n_visitors, 80) posterior draws of the per-visitor effect

To run a full multi-round experiment instead of a single fit, pt.sequential_experiment(...) drives the adaptive loop end to end — a realistically-sized run (350,000 visitors) takes about fifteen minutes on a consumer GPU.

Highlights

  • GPU hurdle BCF. Two coupled forests — probit conversion and log-severity — share a single tree topology (following Linero et al.'s shared Bayesian forests), so the structure carries information across both channels and stabilizes per-segment effects at the low conversion rates online experiments actually live at. Roughly 4.5–8.6× faster than the StochTree CPU backend at n=750k; single-channel continuous/binary fits hit 17–63× from n=250k to n=2M (benchmark grid).
  • Calibrated intervals. BCF posteriors are narrow by construction; pytyche recalibrates them against simulation-based ground truth so the credible intervals you report are honest at your operating scale.
  • Adaptive experiment loop. pt.sequential_experiment runs Thompson allocation with guaranteed control retention and built-in power simulation.
  • Interpretable segments. Each round compresses the effect posterior into a shallow policy tree — a reviewable decision surface, not just a model.
  • Synthetic data generators. A small typed grammar (pytyche.generators.scenarios) parameterizes the data-generating process for calibration sweeps and power analysis.
  • Honest-uncertainty contracts. pytyche.contracts separates observed data from ground truth at the type level, so analysis code cannot accidentally peek at what it shouldn't see.

Documentation

When to use it

pytyche is built for designed experiments: round-based online tests with a handful of treatments where assignment rules are explicit and propensities are recorded exactly. It also supports observational causal inference — BCF is purpose-built for confounded settings, taking propensity scores into the prior for strong point estimation. Two honest caveats there: pytyche expects propensity scores as an input (it has no built-in nuisance/propensity estimation or double-ML cross-fitting — that's the reason to reach for econml or DoubleML instead), and the library is shaped and validated around designed experiments, so observational use is supported but less tested. In all cases, treat intervals as needing calibration at your scale before you rely on them.

Out of scope: marketplaces and anything with cross-visitor interference (SUTVA violations), regulated contexts needing preregistration-grade governance, large-catalog per-item recommendation, and real-time / streaming adaptation. The full scope discussion is in the overview.

Contributing

Contributions are welcome — see CONTRIBUTING.md for the development setup, branching model, and testing tiers.

License

MIT — see LICENSE. Built on bartz (MIT) by Giacomo Petrillo; the GPU BART kernels the continuous and binary paths fit on top of are bartz's. The hurdle GPU kernel, shared-tree extensions, and the calibration / targeting / generator stack are pytyche's.

Source: https://gitlab.com/tradcliffe2/tyche · PyPI: https://pypi.org/project/pytyche/ (the package is pytyche; the GitLab repo is tyche for URL brevity).

Citation

Methodology paper in preparation. Cite as pytyche, v0.2.1, https://gitlab.com/tradcliffe2/tyche until a citable DOI is up.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytyche-0.2.1.tar.gz (293.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytyche-0.2.1-py3-none-any.whl (306.5 kB view details)

Uploaded Python 3

File details

Details for the file pytyche-0.2.1.tar.gz.

File metadata

  • Download URL: pytyche-0.2.1.tar.gz
  • Upload date:
  • Size: 293.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pytyche-0.2.1.tar.gz
Algorithm Hash digest
SHA256 800f594499e0b672171770c0644bc6876f82b047e8ea16bbf3d3b482fed182fc
MD5 c74373094a57668e7d79958cbe8b3232
BLAKE2b-256 730e8da48346c49196ded7a67bee944bb29fa9d92b14d0df4b5d2ff8e4c028fd

See more details on using hashes here.

File details

Details for the file pytyche-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: pytyche-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 306.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pytyche-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a52713b86ec824113135a7bca01f3c20d80931fd59af8b1b26198ebcf1ee2a7d
MD5 c69a82b767914dd9437d96ac83b430d0
BLAKE2b-256 73e22ffe1dd20e7aa25ec2b19583832f5270574d383f11f97ca1a9a8d698af14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page