Anomaly-Driven Correction Discovery: Physics-Constrained Symbolic Regression for Evolutionary Scientific Discovery
Project description
ADCD — Anomaly-Driven Correction Discovery
Physics-Constrained Symbolic Regression for Evolutionary Scientific Discovery
ADCD is a symbolic regression framework that discovers physical correction terms rather than learning equations from scratch. Given a known classical law and anomalous observations, ADCD recovers the dimensionless correction Δ that reconciles theory with experiment — mirroring how physics actually evolves.
82.8% (±7.7%) mean structural recovery across 5 random seeds, with peak 94.4% at the reference seed.
4/4 real-world structural class matches (Mercury, Lamb Shift, Muon g-2, Blackbody).
77 automated unit tests passing on Python 3.10 and 3.11.
Key Features
- Correction-first paradigm — starts from a known classical law, not a blank slate; designed for anomaly-driven theory refinement where the baseline is structurally correct
- Physics-gated search cascade — AST complexity, dimensional homogeneity + transcendental guardrails, and asymptotic consistency (ARC) gates screen unphysical candidates before optimization
- JAX-traced L-BFGS-B optimizer — parameter-scaled differentiable fitting with multi-restart log-uniform initialization
- BIC reranking — selects the most parsimonious correction over purely numerical fits
- Residual feature intelligence — statistical priors (monotonicity, curvature, oscillation, decay rate, symmetry) bias the template sampler toward the correct mathematical family
- Coarse empirical evaluation — data-driven pre-filter ranks gate survivors before full JAX optimization
- Noise-robust — 93.3% mean at 0% noise, 91.1% at 1%, 71.1% at 5%, 68.9% at 10%
Quick Start
Installation
pip install adcd
Or install from source:
git clone https://github.com/apiprdt/PhysicsPaper.git
cd PhysicsPaper
pip install -e ".[dev]"
Usage
Running ADCD is extremely simple using the high-level scientific API:
import adcd
# 1. Load a pre-defined benchmark scenario
scenarios = adcd.get_all_scenarios()
scenario = scenarios[0] # Relativistic Kinetic Energy
# 2. Run discovery in a single line!
result = adcd.discover_correction(scenario, max_iterations=5, proposer="mock")
print(f"Discovered correction: {result.best_expr}")
print(f"Residual NMSE: {result.best_nmse_residual:.2e}")
print(f"Parameters: {result.best_theta}")
# 3. Export LaTeX or plot residuals
print(result.export_latex())
result.plot_residuals()
For custom experimental data, use adcd.fit(...):
import numpy as np
import adcd
x = np.linspace(1.0, 5.0, 100)
X = {"x": x}
y_classical = 2.0 * x
y_observed = 2.0 * x + 0.5 * x**2 # hidden x² correction
result = adcd.fit(
X=X,
y_obs=y_observed,
y_classical=y_classical,
limit_variable="x",
limit_direction="0",
correction_mode="additive"
)
result.summary()
Benchmark Results
Standard Benchmark (seed=42, Mock Proposer)
Results from run_correction_discovery.py --proposer mock (reference seed=42, 4 iterations per scenario).
| Scenario | Tier | 0% Noise | 1% Noise | 5% Noise | 10% Noise |
|---|---|---|---|---|---|
| Relativistic KE | Textbook | ✓ | ✓ | ✓ | ✓ |
| Yukawa Gravity | Textbook | ✓ | ✓ | ✓ | ✓ |
| Anharmonic Spring | Textbook | ✓ | ✓ | ✓ | ✓ |
| Screened Coulomb | Cross-Domain | ✓ | ✓ | ✗ | ✗ |
| Net Radiation | Cross-Domain | ✓ | ✓ | ✓ | ✓ |
| Nonlinear Drag | Cross-Domain | ✓ | ✓ | ✓ | ✓ |
| Mystery-A (tanh²) | Synthetic | ✓ | ✓ | ✓ | ✓ |
| Mystery-B (sinc) | Synthetic | ✓ | ✓ | ✓ | ✓ |
| Mystery-C (log-quotient) | Synthetic | ✓ | ✓ | ✓ | ✓ |
| Overall | 100% | 100% | 88.9% | 88.9% |
Note: Screened Coulomb fails at ≥5% noise because exponential decay ($e^{-r/\lambda}$) and rational saturation ($r/(r+\lambda)$) are numerically indistinguishable at the tested SNR with limited dynamic range — an information-theoretic limit, not a framework deficiency.
Multi-Seed Reproducibility
All results are reported across 5 independent random seeds (0, 7, 21, 42, 99):
| Seed | Class Match Rate |
|---|---|
| 0 | 86.1% (31/36) |
| 7 | 75.0% (27/36) |
| 21 | 77.8% (28/36) |
| 42 | 94.4% (34/36) |
| 99 | 80.6% (29/36) |
| Mean | 82.8% ± 7.7% |
Performance variation reflects stochastic template sampling in the MockProposer. Physics gates ensure that when the correct functional family is sampled, it consistently survives filtering and is selected by BIC reranking.
Real-World Physical Constants Benchmark
Synthetic-real hybrid data using experimentally validated constants from JPL DE440, NIST, and CODATA:
| Physical Scenario | Discovered Correction | Converged | Class Match | NMSE |
|---|---|---|---|---|
| Mercury Perihelion (GR) | θ₀·vc² |
— | ✓ polynomial | 1.11e-05 |
| Hydrogen Lamb Shift (QED) | θ₀(n/θ₁)^(-θ₂) |
✓ | ✓ power_law | 1.82e-18 |
| Muon g-2 (Schwinger) | θ₀(α/π)^θ₁ |
✓ | ✓ polynomial | 7.94e-07 |
| Blackbody (Planck) | -1 + e^(-f/θ₁) |
— | ✓ exponential | 2.59e-02 |
All 4 scenarios achieve correct structural class identification. 2 scenarios (Lamb Shift, Muon g-2) achieve full convergence with NMSE < 10⁻⁶. Mercury and Blackbody achieve correct structural identification but quantitative convergence is limited by parametrization sensitivity and dynamic range, respectively.
PySR Comparison (fair profile: 100 iterations, maxsize 30, 60s timeout)
| Method | 0% Noise | 1% Noise | 5% Noise | 10% Noise |
|---|---|---|---|---|
| ADCD (ours, seed=42) | 9/9 (100%) | 9/9 (100%) | 8/9 (88.9%) | 8/9 (88.9%) |
| PySR fair | 4/9 (44.4%) | 5/9 (55.6%) | 1/9 (11.1%) | 5/9 (55.6%) |
ADCD outperforms PySR fair by 77.8 percentage points at 5% noise (88.9% vs 11.1%). A legacy fast profile (wall-clock matched) is retained in pysr_baseline_results.json for historical comparison only.
Project Structure
PhysicsPaper/
├── src/adcd/ # Installable package
│ ├── __init__.py # Public API (adcd.fit, adcd.discover_correction)
│ ├── anomaly_scenarios.py # 9 standard + 3 blind benchmark scenarios
│ ├── arc_scorer.py # Asymptotic consistency gate (ARC)
│ ├── coarse_evaluator.py # Coarse numerical pre-filter
│ ├── correction_orchestrator.py # Main multi-iteration discovery loop
│ ├── dimensional_checker.py # Dimensional homogeneity + transcendental guardrail
│ ├── jax_optimizer.py # JAX L-BFGS-B optimizer (parameter-scaled)
│ ├── llm_proposer.py # Mock + Gemini + OpenAI-compatible proposers
│ ├── metrics.py # NMSE, BIC, structural classification
│ ├── pipeline.py # Stage 1 filter cascade
│ ├── real_data_loader.py # Real-world data loading (JPL, NIST, CODATA)
│ ├── real_scenarios.py # Real-world validation scenarios
│ ├── residual_analyzer.py # Statistical residual feature extraction
│ └── result.py # CorrectionResult: summary, LaTeX, plot
├── tests/ # 58 unit + integration tests
├── paper/ # LaTeX source (main.tex) + figures
├── run_correction_discovery.py # Standard 9-scenario benchmark runner
├── run_real_data_benchmark.py # Real-world physical constants benchmark
├── run_reproducibility.py # Multi-seed reproducibility study (5 seeds)
├── run_ablation.py # Gate ablation study
├── run_pysr_baseline.py # PySR comparison baseline
├── run_mlp_baseline.py # MLP comparison baseline
├── run_misspecification_benchmark.py # Baseline misspecification fail-safe test
├── generate_figures.py # Paper figure generator
├── .github/workflows/ # CI (test + lint + LaTeX) and PyPI publish
├── pyproject.toml # PEP 517/518 build configuration
└── README.md # This file
Running Tests
pip install -e ".[dev]"
pytest --cov=adcd
All 77 tests pass on Python 3.10 and 3.11 (Ubuntu and Windows).
Submission & Release
Paper submission guide (GitHub Release → Zenodo → arXiv): docs/SUBMISSION_CHECKLIST_v2.1.2.md
Current release tag: v2.1.2 | Package version: 2.1.2
Reproducing Paper Results
Verify claims before citing numbers:
python scripts/verify_paper_claims.py # expect [ALL OK]
One-command reproduction (Windows):
.\reproduce_all.ps1
Or step-by-step:
python run_correction_discovery.py --proposer mock # Main benchmark + gate telemetry
python run_real_data_benchmark.py # Real-world (5 scenarios)
python run_pysr_baseline.py --profile fair # Fair PySR comparison
python run_ablation.py # Gate ablation study
python run_oracle_ablation.py # Oracle ground-truth injection test
python run_correction_scaling.py # Correction magnitude sweep
python scripts/generate_experiment_report.py # Sync experiment_results.md
python scripts/generate_efficiency_table.py # ADCD vs PySR efficiency table
python scripts/validate_results.py # Consistency checks
python generate_figures.py # All paper figures
Proposer regimes: Mock Proposer = template-assisted recovery; Hybrid/Gemini = zero-shot discovery. Report both separately (see paper Section 4).
# LLM benchmark (requires GEMINI_API_KEY) — writes results/llm_benchmark.json
python run_llm_benchmark.py --proposer hybrid
Citing This Work
If you use ADCD in your research, please cite:
@software{erdita2026adcd,
author = {Erdita, Muhammad Afif},
title = {{Anomaly-Driven Correction Discovery (ADCD): Physics-Constrained
Symbolic Regression for Evolutionary Scientific Discovery}},
year = {2026},
publisher = {Zenodo},
version = {2.1.2},
doi = {10.5281/zenodo.20534940},
url = {https://doi.org/10.5281/zenodo.20534940}
}
AI Disclosure
This project was developed with assistance from Google DeepMind's Antigravity AI assistant. AI was used as a pair-programming and writing tool. All scientific content, experimental design decisions, and intellectual contributions are the author's own.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file adcd-2.1.2.tar.gz.
File metadata
- Download URL: adcd-2.1.2.tar.gz
- Upload date:
- Size: 82.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eecd8845c73954c514fac1a60815814ddd215e1eedacf49cecd4d7b7c5a3cb50
|
|
| MD5 |
53514631a280ec3377005ce4786a9858
|
|
| BLAKE2b-256 |
25f1db835f408a1bdcc22ba5acdec357c51fe23462af95a2cc379283c36542dd
|
Provenance
The following attestation bundles were made for adcd-2.1.2.tar.gz:
Publisher:
publish.yml on apiprdt/PhysicsPaper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
adcd-2.1.2.tar.gz -
Subject digest:
eecd8845c73954c514fac1a60815814ddd215e1eedacf49cecd4d7b7c5a3cb50 - Sigstore transparency entry: 1786194029
- Sigstore integration time:
-
Permalink:
apiprdt/PhysicsPaper@3db06f701f32b193445df57a52ccbd3262bc0181 -
Branch / Tag:
refs/tags/v2.1.2 - Owner: https://github.com/apiprdt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3db06f701f32b193445df57a52ccbd3262bc0181 -
Trigger Event:
release
-
Statement type:
File details
Details for the file adcd-2.1.2-py3-none-any.whl.
File metadata
- Download URL: adcd-2.1.2-py3-none-any.whl
- Upload date:
- Size: 69.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
deae22ae6e7667809b80802b0fab42ea192fb4ee9708a523cfe52bfaf458712e
|
|
| MD5 |
a015ff55abc408db0befbaec5037b156
|
|
| BLAKE2b-256 |
272a3f16bb84283ebb6a815dfdcc56d4da7659bc6f00a630fd2b99d2e4892bf1
|
Provenance
The following attestation bundles were made for adcd-2.1.2-py3-none-any.whl:
Publisher:
publish.yml on apiprdt/PhysicsPaper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
adcd-2.1.2-py3-none-any.whl -
Subject digest:
deae22ae6e7667809b80802b0fab42ea192fb4ee9708a523cfe52bfaf458712e - Sigstore transparency entry: 1786194099
- Sigstore integration time:
-
Permalink:
apiprdt/PhysicsPaper@3db06f701f32b193445df57a52ccbd3262bc0181 -
Branch / Tag:
refs/tags/v2.1.2 - Owner: https://github.com/apiprdt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3db06f701f32b193445df57a52ccbd3262bc0181 -
Trigger Event:
release
-
Statement type: