Skip to main content

Symbolic Form Discovery - Find interpretable mathematical formulas in data

Project description

PPF - Symbolic Form Discovery

PyPI version License: MIT Python 3.8+

Discover interpretable mathematical formulas from data, then deploy them anywhere.

PPF (Promising Partial Form) is a law-aware symbolic regression library that finds compact mathematical expressions describing your data. Unlike black-box neural networks, PPF produces human-readable formulas grounded in physical reality that can be deployed to edge devices in under 100 bytes.

Core Philosophy

A Promising Partial Form is a mathematical expression that explains a meaningful portion of a signal with high explanatory power and low complexity. Complex signals are modeled as layered sums:

y(t) ≈ f₁(t) + f₂(t) + ... + fₖ(t) + ε(t)

where each fᵢ(t) is a discovered form (oscillation, decay, trend, etc.) and ε(t) is noise-like residual.

PPF repeatedly discovers → subtracts → analyzes residuals to reveal the layered mathematical structure hidden in data. This approach applies to any 1D time-series or signal analysis task - from real-time sensor monitoring to scientific data exploration to serving as an interpretable front-end to downstream AI systems.

Why PPF?

Traditional ML PPF Approach
Train neural network Discover formula
Deploy 10-100KB model Deploy 50-byte expression
1000s of MACs/inference 10 FLOPs/evaluation
Black box Interpretable
Requires TensorFlow Lite Runs on any microcontroller

Key insight: Many real-world signals (sensor data, biological rhythms, physical systems) follow simple mathematical forms. PPF attempts to discover those forms automatically.

Information-theoretic view: Where neural networks compress data into opaque weight matrices, PPF compresses data into explicit equations. This enables a layered architecture: Raw Signal → PPF (laws) → ML (meaning) → Decisions, where PPF converts high-entropy measurements into low-entropy symbolic representations.

Quick Start

pip install timeseries-formula-finder
import numpy as np
from ppf import SymbolicRegressor, export_python, export_c

# Generate example data
t = np.linspace(0, 10, 200)
y = 2.5 * np.exp(-0.3 * t) * np.sin(4.0 * t + 0.5) + np.random.randn(200) * 0.1

# Discover the underlying formula
regressor = SymbolicRegressor(generations=30)
result = regressor.discover(t, y, verbose=True)

print(f"Discovered: {result.best_tradeoff.expression_string}")
print(f"R-squared:  {result.best_tradeoff.r_squared:.4f}")
# Output: Discovered: 2.498*exp(-0.301*x)*sin(4.002*x + 0.497)
# Output: R-squared:  0.9847

# Export to standalone Python (no PPF dependency)
python_code = export_python(result.best_tradeoff.expression, fn_name="predict")
exec(python_code)
print(predict(1.5))  # Works without importing ppf

# Export to C for embedded deployment
c_code = export_c(result.best_tradeoff.expression, use_float=True)
# Compile with: gcc -std=c99 -O2 -lm model.c

Command-Line Interface

PPF includes a powerful CLI for quick analysis without writing code:

# Discover formulas from CSV data
ppf discover data.csv -x time -y signal

# Verbose mode with specific discovery mode
ppf discover data.csv --mode oscillator -v -g 100

# Detect mathematical forms in windows
ppf detect sensor.csv --min-r-squared 0.8

# Extract forms iteratively until residuals are noise
ppf stack data.csv --entropy-method spectral

# Analyze with SSA decomposition (no extra dependencies)
ppf hybrid data.csv --method ssa

# Export discovered formula to Python code
ppf --json discover data.csv | ppf export python -f predict > model.py

# Export to C for embedded systems
ppf --json discover data.csv | ppf export c -f sensor_model --float > model.h

# Show available discovery modes
ppf info modes

Key Commands:

Command Purpose
discover Symbolic regression to find formulas
detect Detect forms in data windows
stack Extract forms layer-by-layer
hierarchy Find nested multi-timescale patterns
hybrid EMD/SSA decomposition + interpretation
export Export to Python/C/JSON
features Extract ML-ready features
info Show modes, forms, macros

Use ppf --help or ppf <command> --help for detailed documentation.

See docs/CLI.md for the complete CLI reference.

Features

Multi-Domain Discovery

PPF includes domain-specific "vocabularies" that guide the search:

from ppf import SymbolicRegressor, DiscoveryMode

regressor = SymbolicRegressor()

# Let PPF auto-detect the best domain
result = regressor.discover(x, y, mode=DiscoveryMode.AUTO)

# Or specify a domain if you know it
result = regressor.discover(x, y, mode=DiscoveryMode.OSCILLATOR)   # Vibrations, waves
result = regressor.discover(x, y, mode=DiscoveryMode.CIRCUIT)      # RC charging, decay
result = regressor.discover(x, y, mode=DiscoveryMode.GROWTH)       # Sigmoids, saturation
result = regressor.discover(x, y, mode=DiscoveryMode.RATIONAL)     # Ratios, feedback
result = regressor.discover(x, y, mode=DiscoveryMode.UNIVERSAL)    # Power laws, Gaussians

Macro Templates

PPF includes "macros" - pre-composed functional forms that capture common physics:

Macro Formula Use Case
DAMPED_SIN a·exp(-k·t)·sin(ω·t + φ) Vibration decay
RC_CHARGE a·(1 - exp(-k·t)) + c Capacitor charging
GAUSSIAN a·exp(-((x-μ)/σ)²) Peaks, distributions
SIGMOID a / (1 + exp(-k·(x-x₀))) Saturation curves
POWER_LAW a·x^b + c Scaling phenomena
HILL a·x^n / (k^n + x^n) Enzyme kinetics

These let PPF find complex forms (multiplicative compositions across function families) that pure GP struggles with.

Export to Edge Devices

Discovered formulas export to production-ready code:

from ppf import export_python, export_c, export_json

# Standalone Python evaluator
code = export_python(expr, fn_name="predict_temp", safe=True)
# Includes safety wrappers for div-by-zero, log(0), exp overflow

# C99 for microcontrollers
code = export_c(expr, use_float=True, safe=True)
# Compiles on ESP32, STM32, Arduino, etc.

# JSON bundle for storage/transmission
bundle = export_json(result, variables=["t"])
# Send via MQTT, store in database, audit trail

Feature Extraction for Downstream ML

Use discovered forms as features for classification/regression:

from ppf import extract_features, feature_vector

# Extract interpretable features
features = extract_features(result)
print(features["dominant_family"])  # "oscillation"
print(features["damping_k"])        # 0.301
print(features["omega"])            # 4.002

# Convert to ML-ready vector
vec, names = feature_vector(features, schema="ppf.features.v1.full")
# Feed to sklearn, XGBoost, etc.

Installation

From PyPI (recommended)

pip install timeseries-formula-finder

From source

git clone https://github.com/pcoz/timeseries-formula-finder.git
cd timeseries-formula-finder
pip install -e .

Optional dependencies

# For hybrid decomposition (EMD/SSA)
pip install timeseries-formula-finder[hybrid]

# For development (tests)
pip install timeseries-formula-finder[dev]

Documentation

Document Description
USER_GUIDE.md Complete usage guide with examples
docs/CLI.md Command-line interface reference
docs/PPF_Paper.md Core PPF concepts and architecture
docs/PPF_Information_Theory_Paper.md Information-theoretic foundations
COMPARISON.md How PPF compares to PySR, gplearn, Eureqa, AI Feynman
USE_CASES.md Edge AI, ECG analysis, predictive maintenance
TESTING.md Test suite details and datasets used
docs/PPF_EXPORT_LAYER_TSD.md Export layer technical specification
docs/IOT_SENSOR_ANALYSIS.md Real IoT sensor case study
docs/ECG_ANALYSIS.md ECG waveform analysis case study
DOCUMENTATION.md Comprehensive API documentation

Use Cases

Edge AI / IoT Sensors

Deploy temperature prediction models to ESP32 in 50 bytes instead of TensorFlow Lite:

# Discover daily temperature cycle from sensor data
result = regressor.discover(hours, temp, mode=DiscoveryMode.OSCILLATOR)
# Found: T(t) = 36 + 5·cos(2π·t/24 - 2)  [R² = 0.58]

# Export to C for microcontroller
c_code = export_c(result.best_tradeoff.expression, use_float=True)
# 3 parameters, 10 FLOPs, runs at 100kHz on ESP32

Biomedical Signal Analysis

Extract interpretable features from ECG waveforms:

# Analyze T-wave morphology
result = regressor.discover(t_wave_time, t_wave_amplitude, mode=DiscoveryMode.UNIVERSAL)
# Found: Damped cosine (R² = 0.96) - indicates repolarization dynamics

# Use parameters as cardiac health features
features = extract_features(result)
# damping_k, omega → feed to arrhythmia classifier

Predictive Maintenance

Classify machine health from vibration signatures:

# Healthy machine: clean sinusoid
# Bearing fault: damped oscillations (impact response)
# Imbalance: dominant 1x RPM

# The discovered FORM is the diagnosis
result = regressor.discover(t, vibration, mode=DiscoveryMode.OSCILLATOR)
if "exp(-" in result.best_tradeoff.expression_string:
    print("Bearing fault detected - damped oscillation signature")

Architecture

ppf/
├── __main__.py           # Entry point for python -m ppf
├── cli/                  # Command-line interface
│   ├── main.py           # Main dispatcher, argparse setup
│   ├── utils.py          # Data loading, output formatting
│   └── commands/         # Individual command modules
│       ├── discover.py   # ppf discover
│       ├── detect.py     # ppf detect
│       ├── stack.py      # ppf stack
│       ├── hierarchy.py  # ppf hierarchy
│       ├── hybrid.py     # ppf hybrid
│       ├── export.py     # ppf export (python/c/json)
│       ├── features.py   # ppf features
│       └── info.py       # ppf info
├── symbolic_types.py     # Expression trees, macros, primitives
├── symbolic.py           # GP engine, symbolic regressor
├── symbolic_utils.py     # Printing, simplification
├── detector.py           # Fixed-form fitting (legacy)
├── residual_layer.py     # Multi-layer decomposition
├── hierarchical.py       # Windowed parameter evolution
├── hybrid.py             # EMD/SSA + PPF interpretation
├── export/
│   ├── python_export.py  # export_python()
│   ├── c_export.py       # export_c()
│   ├── json_export.py    # export_json()
│   └── load.py           # load_json()
└── features/
    ├── extract.py        # extract_features()
    └── vectorize.py      # feature_vector()

Benchmarks

Performance on standard symbolic regression benchmarks:

Benchmark PPF R² Complexity Notes
Kepler's 3rd Law 0.9999 3 T² = a³
Damped Oscillator 0.9847 8 Macro template
Logistic Growth 0.9923 5 Sigmoid macro
Nguyen-1 (x³+x²+x) 0.9998 7 Polynomial mode

See BENCHMARK_COMPARISON.md for detailed results.

License

MIT License - see LICENSE for details.

Copyright (c) 2026 Edward Chalk (fleetingswallow.com)

Citation

If you use PPF in research, please cite:

@software{timeseries-formula-finder,
  author = {Chalk, Edward},
  title = {Timeseries Formula Finder: Discover Mathematical Forms in Time-Series Data},
  year = {2026},
  url = {https://github.com/pcoz/timeseries-formula-finder}
}

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timeseries_formula_finder-0.2.0.tar.gz (173.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

timeseries_formula_finder-0.2.0-py3-none-any.whl (98.2 kB view details)

Uploaded Python 3

File details

Details for the file timeseries_formula_finder-0.2.0.tar.gz.

File metadata

File hashes

Hashes for timeseries_formula_finder-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fc163ed062825616cd894da1a11dd11cfcd1588092b54dcb38a416291c2032dc
MD5 419e49b3872e5ab903deef8bd07630f2
BLAKE2b-256 a4ffd3df8df9faa75466fe33fd937a97366f1b8a1cf64a0a86845d194195ba46

See more details on using hashes here.

File details

Details for the file timeseries_formula_finder-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for timeseries_formula_finder-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b43e1d07d659214e71945683ec6bfcefc6807daf20cf706378988d4a114000c
MD5 9ad1a79c67692e8a67bb1b1a83f22e77
BLAKE2b-256 2b0b36d4f1b770e35917a735c65e604a47e6d5ae0deb4eea08e4e675eda2cd42

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page