Three-tier symbolic formula search: discover interpretable classifiers that compete with ensembles

These details have not been verified by PyPI

Project links

Homepage

Project description

Theory Radar

Do you need a black box, or would a formula suffice?

Theory Radar in action

Python License

Theory Radar discovers interpretable symbolic classifiers from tabular data and tells you whether they can replace an ensemble on your specific dataset.

On Pima Diabetes, a three-feature formula min(insulin, age) + glucose beats gradient boosting at 25σ significance with fair held-out evaluation across 1000 folds. On Breast Cancer, PCA-projected formulas come within 6σ of gradient boosting by accessing all 30 features through learned projections. On EEG (N=15K), ensembles win decisively — and Theory Radar reports that honestly. The tool's value is in the comparison, not in always winning.

from symbolic_search import TheoryRadar

radar = TheoryRadar(X_train, y_train, projection="pca")
result = radar.search(mode="fast")
print(result.formula)  # "(f19 - pc0) + f5"
print(result.f1)       # 0.963

Or let it find the best configuration automatically:

radar, result = TheoryRadar.autotune(X_train, y_train, max_time=120)

Three-Tier Architecture

Theory Radar is a complete search stack, not a one-off script.

Tier 1: Core Engine

The foundation. Produces valid, fair results on its own.

Phased enumeration of all formula trees to a given depth
Exact optimal F1 via sort-and-sweep over all N thresholds, O(N log N)
Monotone Invariance Theorem: monotone transforms preserve F1 and AUROC (proved), enabling evaluation reuse
FormulaTrace: records the operation sequence so formulas can be replayed on test data
Fair CV evaluation: formula discovered on train, threshold tuned on train, scored on test—identical to sklearn baselines

Tier 2: Search Accelerators

Optional components that make the search faster and more thorough.

Beam search with cost-plus-heuristic ordering (h = 1 - F1 for alive nodes)
Subspace fuzzing: random feature subsets for broader exploration
Meta-learned pruning: the search discovers its own pruning rules from exhaustive fold-local micro-search. 88-99.6% of dead subtrees pruned with zero false negatives.

Tier 3: Representation Augmenters

Projections that give depth-3 formulas implicit access to ALL features.

Projection	What it captures	When to use
`"pca"`	Linear variance directions	Default for d > 10. Closes gap on BreastCancer (.955 → .963)
`"pls"`	Discriminative directions (supervised)	When you want projections that target the class boundary
`"tucker"`	Pairwise feature interactions	Tested; PCA outperforms on current benchmarks
`"kernel"`	Nonlinear manifold structure	Complex nonlinear boundaries
`"neural"`	Learned nonlinear projection	Data-adaptive

# PCA projections (recommended default)
radar = TheoryRadar(X, y, projection="pca")

# PLS: supervised projections maximizing covariance with y
radar = TheoryRadar(X, y, projection="pls")

# Full stack: projections + fuzzing + meta-pruning
radar = TheoryRadar(X, y,
    projection=["pca", "pls"],
    n_subspaces=10,
    subspace_k=12,
    meta_prune=True,
)

Results

Full pipeline (Tier 1 + 2 + 3), 200×5-fold repeated CV, fair held-out evaluation:

Dataset	N	d	Test F1	Formula	vs GB	vs RF	vs LR
Diabetes	768	8	.668	`min(v5,v7) + v1`	*25σ A>**	*27σ A>**	*21σ A>**
Wine	178	13	.953	`min(w6,w0) + w12`	*16σ A>**	18σ RF>	23σ LR>
Banknote	1372	4	.986	`(v0+v1) + v2`	26σ GB>	22σ RF>	*27σ A>**
BreastCancer	569	30	.963	`(f19 - pc0) + f5`	6σ GB>	10σ RF>	37σ LR>
EEG	14980	14	.655	`max(v13-v5, v6)`	246σ GB>	417σ RF>	*250σ A>**

Bold = formula wins. The pattern: when the boundary can be captured by 2-3 features, formulas match or beat ensembles. When it requires many features, ensembles win. Theory Radar quantifies this tradeoff on your specific data.

Full 17-dataset benchmark (Heart, Sonar, Spambase, German Credit, Australian Credit, Adult, HIGGS, Electricity, MiniBooNE, Magic, Ionosphere, Covertype) in progress.

Projection Comparison (BreastCancer)

Projection	Test F1	Gap	vs GB σ	Notes
raw (no projection)	.955	.017	20.9σ GB>	Baseline
PCA (8 components)	.963	.013	6.0σ GB>	Best so far
Tucker (HOSVD)	.955	.017	20.9σ GB>	No improvement over raw
PLS / combined	—	—	—	Shootout running

PCA projections give formulas implicit access to all features through linear combinations. Tucker decomposition (feature interaction tensor) was tested but did not improve over PCA on current benchmarks.

Installation

pip install theory-radar

For GPU acceleration:

pip install theory-radar[gpu]   # adds cupy-cuda12x

For all projections:

pip install theory-radar[all]   # adds scikit-learn, tensorly

Quick Start

Basic search

from symbolic_search import TheoryRadar

radar = TheoryRadar(X_train, y_train)
result = radar.search(mode="fast", max_depth=3)
print(f"Formula: {result.formula}")
print(f"F1: {result.f1:.4f}")

Fair evaluation on test data

# The formula is recorded as a FormulaTrace
# Replay it on test data with a threshold tuned on train
X_test_aug = radar.transform_test(X_test)
values = result.trace.evaluate(X_test_aug)
threshold, direction, _ = find_optimal_threshold(
    result.trace.evaluate(X_train_aug), y_train)
predictions = (direction * values >= threshold).astype(int)
test_f1 = f1_score(y_test, predictions)

Autotune (find best configuration)

radar, result = TheoryRadar.autotune(X, y, max_time=120)
# Automatically searches projections, depths, subspace sizes

Compare with baselines

from sklearn.ensemble import GradientBoostingClassifier

gb = GradientBoostingClassifier(n_estimators=100)
gb.fit(X_train, y_train)
gb_f1 = f1_score(y_test, gb.predict(X_test))

print(f"Formula: {test_f1:.4f}")
print(f"GB:      {gb_f1:.4f}")
print(f"Gap:     {gb_f1 - test_f1:+.4f}")

API Reference

`TheoryRadar(X, y, **options)`

Parameter	Type	Default	Description
`X`	ndarray	required	Feature matrix (N, d)
`y`	ndarray	required	Binary labels (N,)
`feature_names`	list	auto	Names for features
`projection`	str/list	None	`"pca"`, `"tucker"`, `"kernel"`, `"neural"`, or list
`n_projection_components`	int	8	Components per projection
`n_subspaces`	int	1	Random subspace trials
`subspace_k`	int	d	Features per subspace
`meta_prune`	bool	False	Enable meta-learned pruning
`ensemble_k`	int	1	Top-k formula ensemble
`validation_fraction`	float	0.0	Holdout for beam selection
`binary_ops`	dict	10 ops	Custom binary operations
`unary_ops`	dict	8 ops	Custom unary operations

`radar.search(mode, **options)`

Parameter	Type	Default	Description
`mode`	str	`"auto"`	`"strict"`, `"fast"`, or `"auto"`
`f1_target`	float	0.0	Target F1 (0 = find best)
`max_depth`	int	3	Maximum formula depth
`max_expansions`	int	50000	Node budget
`auroc_threshold`	float	0.55	AUROC pruning threshold (fast mode)
`timeout`	float	300	Seconds before fallback (auto mode)

`TheoryRadar.autotune(X, y, max_time=300)`

Static method. Searches over configurations and returns the best (radar, result) tuple.

How It Works

Enumerate all formula trees: x1, (x1 + x2), min(x1, x2) + x3, ...
Evaluate each by sorting all N samples and finding the threshold that maximizes F1
Keep the top B formulas at each depth (beam search)
Prune dead subtrees using meta-learned criteria (zero false negatives)
Project features via PCA/Tucker/kernel to access all d features in depth-3 formulas
Record the winning formula's operations for fair test-set replay

Citation

@article{bond2026theoryradar,
  title={Theory Radar: Learning Safe Pruning Rules for Symbolic Formula
         Search from Exhaustive Micro-Search},
  author={Bond, Andrew H.},
  journal={IEEE Transactions on Artificial Intelligence},
  year={2026},
  note={Under review}
}

Architecture

See ARCHITECTURE.md for the full three-tier design document, including theoretical results, what was abandoned (and why), and the experiment plan.

Related Work

batch-probe (PyPI): GPU batch size finder + Kalman-filtered thermal CPU management. Used for running Theory Radar experiments with ThermalJobManager.
Tensor rank and dynamical tractability: The Tucker/HOSVD methods in Tier 3 originate from research on tensor decomposition for the gravitational 3-body problem.
PySR / Cranmer 2023: Genetic symbolic regression for physics. Theory Radar differs in targeting classification (F1), providing fair evaluation (FormulaTrace), and learning its own pruning rules.
InterpretML / EBM: Microsoft's Explainable Boosting Machines. Interpretable but still complex models with many parameters. Theory Radar formulas are one line of math.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.4.0

Mar 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

theory_radar-0.4.0.tar.gz (36.0 kB view details)

Uploaded Mar 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

theory_radar-0.4.0-py3-none-any.whl (34.3 kB view details)

Uploaded Mar 30, 2026 Python 3

File details

Details for the file theory_radar-0.4.0.tar.gz.

File metadata

Download URL: theory_radar-0.4.0.tar.gz
Upload date: Mar 30, 2026
Size: 36.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for theory_radar-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`24e5672427fedc61086bb4a802f1a6edec9c31e69e8a57cd8c156b8c6a2f2e93`
MD5	`ea091d8b6cc4875b8df15e8875a15c7a`
BLAKE2b-256	`bae3010355da4e1e7be71dbfa683a705ec3352551d4b1140e8036fab36da7dfc`

See more details on using hashes here.

File details

Details for the file theory_radar-0.4.0-py3-none-any.whl.

File metadata

Download URL: theory_radar-0.4.0-py3-none-any.whl
Upload date: Mar 30, 2026
Size: 34.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for theory_radar-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e4959aee27a7e8eed089b2c211647eea65dff771a8fb8a468f54d9cce4e6274b`
MD5	`e6a24f049816bd54b490f425a3174e76`
BLAKE2b-256	`8a32737885c97f07c6c436f42bc2e05b5d17f7d01c5c501d23447bd2689845f1`

See more details on using hashes here.

theory-radar 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Theory Radar

Three-Tier Architecture

Tier 1: Core Engine

Tier 2: Search Accelerators

Tier 3: Representation Augmenters

Results

Projection Comparison (BreastCancer)

Installation

Quick Start

Basic search

Fair evaluation on test data

Autotune (find best configuration)

Compare with baselines

API Reference

TheoryRadar(X, y, **options)

radar.search(mode, **options)

TheoryRadar.autotune(X, y, max_time=300)

How It Works

Citation

Architecture

Related Work

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`TheoryRadar(X, y, **options)`

`radar.search(mode, **options)`

`TheoryRadar.autotune(X, y, max_time=300)`