Symbolic Form Discovery - Find interpretable mathematical formulas in data
Project description
PPF - Symbolic Form Discovery
Discover interpretable mathematical formulas from data, then deploy them anywhere.
PPF (Promising Partial Form) is a law-aware symbolic regression library that finds compact mathematical expressions describing your data. Unlike black-box neural networks, PPF produces human-readable formulas grounded in physical reality that can be deployed to edge devices in under 100 bytes.
Core Philosophy
A Promising Partial Form is a mathematical expression that explains a meaningful portion of a signal with high explanatory power and low complexity. Complex signals are modeled as layered sums:
y(t) ≈ f₁(t) + f₂(t) + ... + fₖ(t) + ε(t)
where each fᵢ(t) is a discovered form (oscillation, decay, trend, etc.) and ε(t) is noise-like residual.
PPF repeatedly discovers → subtracts → analyzes residuals to reveal the layered mathematical structure hidden in data. This approach applies to any 1D time-series or signal analysis task - from real-time sensor monitoring to scientific data exploration to serving as an interpretable front-end to downstream AI systems.
Why PPF?
| Traditional ML | PPF Approach |
|---|---|
| Train neural network | Discover formula |
| Deploy 10-100KB model | Deploy 50-byte expression |
| 1000s of MACs/inference | 10 FLOPs/evaluation |
| Black box | Interpretable |
| Requires TensorFlow Lite | Runs on any microcontroller |
Key insight: Many real-world signals (sensor data, biological rhythms, physical systems) follow simple mathematical forms. PPF attempts to discover those forms automatically.
Information-theoretic view: Where neural networks compress data into opaque weight matrices, PPF compresses data into explicit equations. This enables a layered architecture: Raw Signal → PPF (laws) → ML (meaning) → Decisions, where PPF converts high-entropy measurements into low-entropy symbolic representations.
Quick Start
pip install timeseries-formula-finder
import numpy as np
from ppf import SymbolicRegressor, export_python, export_c
# Generate example data
t = np.linspace(0, 10, 200)
y = 2.5 * np.exp(-0.3 * t) * np.sin(4.0 * t + 0.5) + np.random.randn(200) * 0.1
# Discover the underlying formula
regressor = SymbolicRegressor(generations=30)
result = regressor.discover(t, y, verbose=True)
print(f"Discovered: {result.best_tradeoff.expression_string}")
print(f"R-squared: {result.best_tradeoff.r_squared:.4f}")
# Output: Discovered: 2.498*exp(-0.301*x)*sin(4.002*x + 0.497)
# Output: R-squared: 0.9847
# Export to standalone Python (no PPF dependency)
python_code = export_python(result.best_tradeoff.expression, fn_name="predict")
exec(python_code)
print(predict(1.5)) # Works without importing ppf
# Export to C for embedded deployment
c_code = export_c(result.best_tradeoff.expression, use_float=True)
# Compile with: gcc -std=c99 -O2 -lm model.c
Features
Multi-Domain Discovery
PPF includes domain-specific "vocabularies" that guide the search:
from ppf import SymbolicRegressor, DiscoveryMode
regressor = SymbolicRegressor()
# Let PPF auto-detect the best domain
result = regressor.discover(x, y, mode=DiscoveryMode.AUTO)
# Or specify a domain if you know it
result = regressor.discover(x, y, mode=DiscoveryMode.OSCILLATOR) # Vibrations, waves
result = regressor.discover(x, y, mode=DiscoveryMode.CIRCUIT) # RC charging, decay
result = regressor.discover(x, y, mode=DiscoveryMode.GROWTH) # Sigmoids, saturation
result = regressor.discover(x, y, mode=DiscoveryMode.RATIONAL) # Ratios, feedback
result = regressor.discover(x, y, mode=DiscoveryMode.UNIVERSAL) # Power laws, Gaussians
Macro Templates
PPF includes "macros" - pre-composed functional forms that capture common physics:
| Macro | Formula | Use Case |
|---|---|---|
DAMPED_SIN |
a·exp(-k·t)·sin(ω·t + φ) |
Vibration decay |
RC_CHARGE |
a·(1 - exp(-k·t)) + c |
Capacitor charging |
GAUSSIAN |
a·exp(-((x-μ)/σ)²) |
Peaks, distributions |
SIGMOID |
a / (1 + exp(-k·(x-x₀))) |
Saturation curves |
POWER_LAW |
a·x^b + c |
Scaling phenomena |
HILL |
a·x^n / (k^n + x^n) |
Enzyme kinetics |
These let PPF find complex forms (multiplicative compositions across function families) that pure GP struggles with.
Export to Edge Devices
Discovered formulas export to production-ready code:
from ppf import export_python, export_c, export_json
# Standalone Python evaluator
code = export_python(expr, fn_name="predict_temp", safe=True)
# Includes safety wrappers for div-by-zero, log(0), exp overflow
# C99 for microcontrollers
code = export_c(expr, use_float=True, safe=True)
# Compiles on ESP32, STM32, Arduino, etc.
# JSON bundle for storage/transmission
bundle = export_json(result, variables=["t"])
# Send via MQTT, store in database, audit trail
Feature Extraction for Downstream ML
Use discovered forms as features for classification/regression:
from ppf import extract_features, feature_vector
# Extract interpretable features
features = extract_features(result)
print(features["dominant_family"]) # "oscillation"
print(features["damping_k"]) # 0.301
print(features["omega"]) # 4.002
# Convert to ML-ready vector
vec, names = feature_vector(features, schema="ppf.features.v1.full")
# Feed to sklearn, XGBoost, etc.
Installation
From PyPI (recommended)
pip install timeseries-formula-finder
From source
git clone https://github.com/pcoz/timeseries-formula-finder.git
cd timeseries-formula-finder
pip install -e .
Optional dependencies
# For hybrid decomposition (EMD/SSA)
pip install timeseries-formula-finder[hybrid]
# For development (tests)
pip install timeseries-formula-finder[dev]
Documentation
| Document | Description |
|---|---|
| USER_GUIDE.md | Complete usage guide with examples |
| docs/PPF_Paper.md | Core PPF concepts and architecture |
| docs/PPF_Information_Theory_Paper.md | Information-theoretic foundations |
| COMPARISON.md | How PPF compares to PySR, gplearn, Eureqa, AI Feynman |
| USE_CASES.md | Edge AI, ECG analysis, predictive maintenance |
| TESTING.md | Test suite details and datasets used |
| docs/PPF_EXPORT_LAYER_TSD.md | Export layer technical specification |
| docs/IOT_SENSOR_ANALYSIS.md | Real IoT sensor case study |
| docs/ECG_ANALYSIS.md | ECG waveform analysis case study |
| DOCUMENTATION.md | Comprehensive API documentation |
Use Cases
Edge AI / IoT Sensors
Deploy temperature prediction models to ESP32 in 50 bytes instead of TensorFlow Lite:
# Discover daily temperature cycle from sensor data
result = regressor.discover(hours, temp, mode=DiscoveryMode.OSCILLATOR)
# Found: T(t) = 36 + 5·cos(2π·t/24 - 2) [R² = 0.58]
# Export to C for microcontroller
c_code = export_c(result.best_tradeoff.expression, use_float=True)
# 3 parameters, 10 FLOPs, runs at 100kHz on ESP32
Biomedical Signal Analysis
Extract interpretable features from ECG waveforms:
# Analyze T-wave morphology
result = regressor.discover(t_wave_time, t_wave_amplitude, mode=DiscoveryMode.UNIVERSAL)
# Found: Damped cosine (R² = 0.96) - indicates repolarization dynamics
# Use parameters as cardiac health features
features = extract_features(result)
# damping_k, omega → feed to arrhythmia classifier
Predictive Maintenance
Classify machine health from vibration signatures:
# Healthy machine: clean sinusoid
# Bearing fault: damped oscillations (impact response)
# Imbalance: dominant 1x RPM
# The discovered FORM is the diagnosis
result = regressor.discover(t, vibration, mode=DiscoveryMode.OSCILLATOR)
if "exp(-" in result.best_tradeoff.expression_string:
print("Bearing fault detected - damped oscillation signature")
Architecture
ppf/
├── symbolic_types.py # Expression trees, macros, primitives
├── symbolic.py # GP engine, symbolic regressor
├── symbolic_utils.py # Printing, simplification
├── detector.py # Fixed-form fitting (legacy)
├── residual_layer.py # Multi-layer decomposition
├── hierarchical.py # Windowed parameter evolution
├── hybrid.py # EMD/SSA + PPF interpretation
├── export/
│ ├── python_export.py # export_python()
│ ├── c_export.py # export_c()
│ ├── json_export.py # export_json()
│ └── load.py # load_json()
└── features/
├── extract.py # extract_features()
└── vectorize.py # feature_vector()
Benchmarks
Performance on standard symbolic regression benchmarks:
| Benchmark | PPF R² | Complexity | Notes |
|---|---|---|---|
| Kepler's 3rd Law | 0.9999 | 3 | T² = a³ |
| Damped Oscillator | 0.9847 | 8 | Macro template |
| Logistic Growth | 0.9923 | 5 | Sigmoid macro |
| Nguyen-1 (x³+x²+x) | 0.9998 | 7 | Polynomial mode |
See BENCHMARK_COMPARISON.md for detailed results.
License
MIT License - see LICENSE for details.
Copyright (c) 2026 Edward Chalk (fleetingswallow.com)
Citation
If you use PPF in research, please cite:
@software{timeseries-formula-finder,
author = {Chalk, Edward},
title = {Timeseries Formula Finder: Discover Mathematical Forms in Time-Series Data},
year = {2026},
url = {https://github.com/pcoz/timeseries-formula-finder}
}
Contact
- Website: fleetingswallow.com
- Email: edward@fleetingswallow.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file timeseries_formula_finder-0.1.1.tar.gz.
File metadata
- Download URL: timeseries_formula_finder-0.1.1.tar.gz
- Upload date:
- Size: 147.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20ca7519badd50290ccd074134edbb7086bdd6fe1717de2f2d86c748bd11b1dc
|
|
| MD5 |
4875a4a03d09d2764452c6536f044b61
|
|
| BLAKE2b-256 |
7c50804a48256f93080a835d05fc316c1f2d0c9f162c9294d4aa36bdd2edf5fb
|
File details
Details for the file timeseries_formula_finder-0.1.1-py3-none-any.whl.
File metadata
- Download URL: timeseries_formula_finder-0.1.1-py3-none-any.whl
- Upload date:
- Size: 64.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19d361f708becef2dce4cb7ac644114e704ebad02a6169a37a377c4374e20931
|
|
| MD5 |
fc55ebf0c823596b368a24425e3e2a97
|
|
| BLAKE2b-256 |
123a940d8605db658f578e0495521111c857531424cd88ef60201b255eb567d0
|