Bayesian decision engine for A/B testing

These details have not been verified by PyPI

Project description

argonx

PyPI Downloads Python License

argonx is a Bayesian decision engine for A/B experiments.

It handles inference, multi-metric risk management, hierarchical segment-aware analysis, and sequential stopping. Feed it your data, tell it what matters, and it surfaces everything you need to make the right call.

Install

pip install argonx

# development install
git clone https://github.com/souro26/bayesian-a-b-testing.git
cd bayesian-a-b-testing
pip install -e .

Quick Start

from argonx import Experiment

experiment = Experiment(
    data=df,
    variant_col='variant',
    primary_metric='revenue',
    guardrails=['page_load_ms'],
    lower_is_better={'page_load_ms': True},
    model='lognormal',
    guardrail_models={'page_load_ms': 'gaussian'},
    control='control',
)

result = experiment.run()
result.summary()
result.plot()

Ratio metrics via callable, no extra classes needed:

experiment = Experiment(
    data=df,
    variant_col='variant',
    primary_metric=lambda df: df['clicks'] / df['impressions'],
    model='lognormal',
    control='control',
)

Segment-aware hierarchical inference, one extra argument:

experiment = Experiment(
    data=df,
    variant_col='variant',
    segment_col='device_type',
    primary_metric='revenue',
    model='lognormal',
    control='control',
)

result = experiment.run()
result.summary()          # aggregate, population-level
result.segment_summary()  # per-segment decisions, cross-segment conflict detection

What It Computes

Most testing frameworks answer is there an effect? argonx answers what you should do about it, and how much you lose if you get it wrong.

A p-value tells you whether the observed difference is unlikely under the null. It does not tell you which variant to ship. argonx computes the quantities that actually drive that decision:

Metric	What it answers
P(variant is best)	Posterior probability of being the true winner, computed via simultaneous argmax across all N variants. Not pairwise.
Expected loss	Average loss if you ship the wrong variant, integrated over the full posterior. Not a point estimate.
CVaR	Expected loss in the worst-case tail. Catches cases where average loss looks fine but tail outcomes are catastrophic.
ROPE	Is the effect large enough to matter in practice? A statistically real effect can still be business-irrelevant. ROPE separates these.
HDI	The actual posterior probability interval. The lift falls inside this range with 95% posterior probability.
Joint probability	P(all business conditions satisfied simultaneously), not independent per-metric checks that miss correlations.
Composite score	Weighted multi-metric business impact, computed draw-by-draw from posteriors, not from means.
Guardrail conflict	When the primary metric improves and a guardrail degrades, the framework surfaces the conflict and stops there. No arbitrary resolution.
Sequential stopping	Stop when expected loss drops below your threshold, not when a fixed sample size is reached.

What `result.summary()` Looks Like

============================================================
EXPERIMENT RESULTS
============================================================

PRIMARY METRIC
----------------------------------------
Best Variant: variant_b
Expected lift:    +4.3% (95% HDI: +1.0% to +7.0%)
P(best) across all variants: 0.971

RISK
----------------------------------------
Expected loss if wrong:          0.0009
CVaR (95th percentile loss):     0.0021
Risk level:                      low

PRACTICAL SIGNIFICANCE (ROPE)
----------------------------------------
Effect is OUTSIDE ROPE -- practically meaningful.
P(practical effect): 0.941

GUARDRAILS
----------------------------------------
  page_load_ms    [FAIL]  variant=variant_b  P(degraded)=0.912  threshold=0.100

GUARDRAIL CONFLICTS DETECTED
----------------------------------------
Strong evidence for variant_b on primary metric.
Guardrail violation on page_load_ms with 91.2% probability.
Framework cannot resolve this tradeoff. Human review required.

============================================================
DECISION
----------------------------------------
State:          conflict
Recommendation: REVIEW REQUIRED
Confidence:     low

Reasoning:
  - P(best) exceeds strong threshold
  - Expected loss below configured maximum
  - Guardrail violation: page_load_ms cannot be resolved automatically
============================================================

The framework does not make the decision. It makes the right decision obvious.

Models

Model	Use case	Data type
`binary`	Conversion rate, click-through, churn	0/1 outcomes
`lognormal`	Revenue, order value, session duration	Right-skewed positive continuous
`gaussian`	Latency, load time, scores	Symmetric continuous
`studentt`	Same as gaussian, robust to outliers	Symmetric continuous with heavy tails
`poisson`	Events per user, purchases per session	Count data

Every model has a flat and hierarchical variant. Flat is the default. Hierarchical is selected automatically when segment_col is set. Partial pooling handles thin segments by borrowing strength from larger ones without collapsing differences that are real.

Guardrail metrics can use a different model than the primary:

experiment = Experiment(
    ...
    model='binary',
    guardrail_models={'page_load_ms': 'lognormal'},
)

Sequential Stopping

from argonx.sequential import StoppingChecker

checker = StoppingChecker(
    loss_threshold=0.01,
    prob_best_min=0.95,
    min_sample_size=1000,
)

status = checker.update(
    samples=result.samples,
    variant_names=['control', 'variant_b'],
    control='control',
    n_users_per_variant=n_counts,
)

print(status.safe_to_stop)
print(status.users_needed)  # estimated additional users needed if not safe

checker.plot_trajectory()   # P(best) and expected loss over time

Frequentist peeking inflates false positive rates. Bayesian expected-loss stopping does not. argonx stops when evidence is strong enough, and tells you how far you are from that threshold when it is not.

Examples

Five worked examples across different industries and model types in examples/:

Notebook	Scenario	Key feature
`01_ecommerce_checkout.ipynb`	Checkout redesign	Guardrail conflict: conversion up, load time up
`02_saas_revenue_sequential.ipynb`	SaaS pricing page	Sequential stopping fires at week 2 of 4
`03_clinical_trial.ipynb`	Drug dosage protocol	StudentT vs Gaussian on data with outliers
`04_gaming_matchmaking.ipynb`	Matchmaking algorithm	3-way experiment, simultaneous argmax
`05_mobile_personalisation.ipynb`	Fintech personalisation	Hierarchical: segment conflict, thin-segment pooling

Running Tests

# unit tests only, no MCMC, fast
pytest tests/unit/

# statistical property verification, no MCMC
pytest tests/math/

# full suite including MCMC integration tests
pytest tests/

Three tiers matching the CI pipeline. Unit tests on every push. Math tests on every PR. Integration tests on merge to main.

Contributing

Open an issue before submitting anything beyond a bug fix. PRs are welcome.

Before opening a PR, run pytest tests/unit/ tests/math/ and confirm everything passes. For decision engine changes, add a test to tests/math/test_decision_sims.py that verifies the statistical property. For new model variants, add tests to tests/integration/test_models.py.

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.5

May 20, 2026

0.1.4

May 16, 2026

0.1.3

May 16, 2026

0.1.2

May 12, 2026

This version

0.1.1

May 5, 2026

0.1.0

May 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

argonx-0.1.1.tar.gz (88.5 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

argonx-0.1.1-py3-none-any.whl (70.4 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file argonx-0.1.1.tar.gz.

File metadata

Download URL: argonx-0.1.1.tar.gz
Upload date: May 5, 2026
Size: 88.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for argonx-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`092616378788d110bea13b41b02dd6e20746d17b8fffd5917f6cbfcb283a97cd`
MD5	`44cb3267d657f81073a9894e893bce90`
BLAKE2b-256	`44cac285a1e8005149bc022d8f89a024e424fda73202b99294638c1017ebe41c`

See more details on using hashes here.

File details

Details for the file argonx-0.1.1-py3-none-any.whl.

File metadata

Download URL: argonx-0.1.1-py3-none-any.whl
Upload date: May 5, 2026
Size: 70.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for argonx-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`54d7c5841f87ec873faa2d0b773da3a45b2f40b5756ccc0a9fc24eda73973154`
MD5	`bb34b28e3d6c8c51a5c3fcebf9c30697`
BLAKE2b-256	`66f16698d796eb77437c255dea96da0f335c38fd4e7bc7f64ee0c4cf33cedd1c`

See more details on using hashes here.

argonx 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

argonx

Install

Quick Start

What It Computes

What `result.summary()` Looks Like

Models

Sequential Stopping

Examples

Running Tests

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

argonx 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

argonx

Install

Quick Start

What It Computes

What result.summary() Looks Like

Models

Sequential Stopping

Examples

Running Tests

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

What `result.summary()` Looks Like