Skip to main content

Warehouse-native causal ML experimentation platform

Project description

Argenta

PyPI version Python CI License Code style: ruff

Warehouse-native causal ML for experimentation.

Most A/B testing tools answer one question: Did the experiment work?

Argenta answers: For whom did it work, why, and what should you do next?


What Argenta Is

Argenta is an analytics layer that connects to your existing data warehouse, runs a SQL pipeline inside it to construct an experiment dataset, and then applies causal ML methods to produce results your current tool doesn't: heterogeneous treatment effects (HTE), per-user CATE scores, automatic subgroup discovery, and targeted rollout recommendations.

It works with experiments you've already run — using whatever tool assigned variants (Statsig, LaunchDarkly, Optimizely, homegrown feature flags). Argenta only analyzes.

What Argenta Is NOT

  • Not an A/B testing platform. Argenta does not assign variants or run feature flags.
  • Not a real-time system. Analysis runs post-hoc, after experiments have collected data.
  • Not a data pipeline tool. Argenta does not move or replicate your data. SQL runs inside your warehouse; results are written back to a schema you control.

Quick Start

1. Install

# Pick your warehouse:
pip install "argenta-cml[snowflake]"
pip install "argenta-cml[bigquery]"
pip install "argenta-cml[redshift]"
pip install "argenta-cml[all]"   # all warehouses

2. Create a config file

# argenta.yaml
warehouse:
  warehouse_type: snowflake
  output_schema: argenta
  credentials:
    account: my_account
    user: argenta_svc
    password: "${SNOWFLAKE_PASSWORD}"
    database: ANALYTICS
    schema: PUBLIC
    warehouse: COMPUTE_WH

exposures:
  table: ANALYTICS.PUBLIC.STATSIG_EXPOSURES
  user_id_col: user_id
  experiment_id_col: experiment_name
  variant_col: group
  timestamp_col: exposure_time

outcomes:
  table: ANALYTICS.PUBLIC.EVENTS
  user_id_col: user_id
  event_name_col: event_type
  value_col: revenue
  timestamp_col: event_time
  target_events:
    - purchase
    - add_to_cart

user_features:
  table: ANALYTICS.PUBLIC.USER_DIM
  user_id_col: user_id
  feature_cols:
    - country
    - device_type
    - account_age_days
  covariate_col: pre_experiment_revenue

experiment:
  experiment_id: checkout_redesign_2024
  control_variant: control
  treatment_variant: treatment
  alpha: 0.05
  winsorize_percentile: 0.99
  use_cuped: true

3. Run

from argenta import ArgentoConfig, PipelineRunner
from argenta.config.loader import load_config

config = load_config("argenta.yaml")
runner = PipelineRunner(config)
result = runner.run("checkout_redesign_2024")

print(result)

Results are also written back to your warehouse at argenta.experiment_results.


Architecture

Your warehouse (Snowflake / BigQuery / Redshift)
         │
         │  read-only + write to argenta schema
         ▼
Argenta SQL pipeline (runs INSIDE your warehouse)
  ├── 1. Exposure deduplication   — first exposure per user wins
  ├── 2. Outcome join             — only events after first exposure
  └── 3. User feature join        — covariates for CUPED + future CATE
         │
         ▼
Argenta stats layer (Python)
  ├── ATE + Welch CI + p-value
  ├── Winsorization
  ├── SRM detection
  └── CUPED variance reduction
         │
         ▼
Results written back to your warehouse
  ├── argenta.experiment_results   — per-metric ATE, CI, p-value
  ├── argenta.user_cate_scores     — per-user CATE (Phase 2)
  └── argenta.segment_effects      — HTE by segment (Phase 2)

Statistical Methods

Method Purpose
Welch's t-test ATE + p-value
Confidence intervals 95% CI on ATE
Winsorization Outlier handling for revenue metrics
SRM detection Sample ratio mismatch check
CUPED Variance reduction via pre-experiment covariate
Causal Forest (CausalForestDML) Non-parametric CATE estimation
X-Learner CATE with unbalanced treatment/control
Uplift modeling Score users not in experiment
Sequential testing (mSPRT) Always-valid p-values
Multiple testing correction Bonferroni / BH-FDR across metrics

Input Tables Required

Argenta needs three tables in your warehouse (all column names are configurable):

Table Required columns
Exposures user_id, experiment_id, variant, timestamp
Outcomes / events user_id, event_name, value, timestamp
User features user_id, + any feature columns

The user features table is required for CUPED variance reduction. Without it, set use_cuped: false.


Output Tables

Argenta writes results back to {output_schema} in your warehouse (default: argenta):

Table Contents
argenta.experiment_results ATE, CI, p-value, SRM flag per metric
argenta.user_cate_scores Per-user CATE score
argenta.segment_effects HTE by segment

Supported Warehouses

Warehouse Extra Status
Snowflake argenta[snowflake] Supported
BigQuery argenta[bigquery] Supported
Redshift argenta[redshift] Supported
Databricks argenta[databricks] Planned

Documentation


Contributing

Contributions are welcome. See CONTRIBUTING.md for setup instructions, coding conventions, and the PR process.


License

Apache 2.0. See LICENSE.


Side note

The name "Argenta" is inspired by the name of a professor who introduced me to causal inference during my bachelor's degree.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

argenta_cml-0.1.0.tar.gz (65.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

argenta_cml-0.1.0-py3-none-any.whl (63.3 kB view details)

Uploaded Python 3

File details

Details for the file argenta_cml-0.1.0.tar.gz.

File metadata

  • Download URL: argenta_cml-0.1.0.tar.gz
  • Upload date:
  • Size: 65.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for argenta_cml-0.1.0.tar.gz
Algorithm Hash digest
SHA256 48d8b74ade04894dcf3823b6d5f015c3a7f946038cd516e8f2b8dff6a6d1b61e
MD5 3035b4601548b29ab468f90e730a0fe6
BLAKE2b-256 ae3e0645790e53a01444357a038e3a5cc65b70b246d8b4b561e5d62c909fb20b

See more details on using hashes here.

Provenance

The following attestation bundles were made for argenta_cml-0.1.0.tar.gz:

Publisher: publish.yml on athammad/argenta

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file argenta_cml-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: argenta_cml-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 63.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for argenta_cml-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e475a2899152c3523d14ced9b66d2e870730caf3b9168f71ff5730e7ed71252a
MD5 b0908db2a642e3308a35aa5cbf35958e
BLAKE2b-256 3a5c4b1b31bde2c28d99f1b5a3144ce97e02aa0abd3b722589d73db255ddd509

See more details on using hashes here.

Provenance

The following attestation bundles were made for argenta_cml-0.1.0-py3-none-any.whl:

Publisher: publish.yml on athammad/argenta

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page