Symbolic Form Discovery - Find interpretable mathematical formulas in data

These details have not been verified by PyPI

Project links

Project description

PPF - Symbolic Form Discovery

Discover interpretable mathematical formulas from data, then deploy them anywhere.

PPF (Promising Partial Form) is a law-aware symbolic regression library that finds compact mathematical expressions describing your data. Unlike black-box neural networks, PPF produces human-readable formulas grounded in physical reality that can be deployed to edge devices in under 100 bytes.

Core Philosophy

A Promising Partial Form is a mathematical expression that explains a meaningful portion of a signal with high explanatory power and low complexity. Complex signals are modeled as layered sums:

y(t) ≈ f₁(t) + f₂(t) + ... + fₖ(t) + ε(t)

where each fᵢ(t) is a discovered form (oscillation, decay, trend, etc.) and ε(t) is noise-like residual.

PPF repeatedly discovers → subtracts → analyzes residuals to reveal the layered mathematical structure hidden in data. This approach applies to any 1D time-series or signal analysis task - from real-time sensor monitoring to scientific data exploration to serving as an interpretable front-end to downstream AI systems.

Why PPF?

Traditional ML	PPF Approach
Train neural network	Discover formula
Deploy 10-100KB model	Deploy 50-byte expression
1000s of MACs/inference	10 FLOPs/evaluation
Black box	Interpretable
Requires TensorFlow Lite	Runs on any microcontroller

Key insight: Many real-world signals (sensor data, biological rhythms, physical systems) follow simple mathematical forms. PPF attempts to discover those forms automatically.

Information-theoretic view: Where neural networks compress data into opaque weight matrices, PPF compresses data into explicit equations. This enables a layered architecture: Raw Signal → PPF (laws) → ML (meaning) → Decisions, where PPF converts high-entropy measurements into low-entropy symbolic representations.

Quick Start

pip install timeseries-formula-finder

import numpy as np
from ppf import SymbolicRegressor, export_python, export_c

# Generate example data
t = np.linspace(0, 10, 200)
y = 2.5 * np.exp(-0.3 * t) * np.sin(4.0 * t + 0.5) + np.random.randn(200) * 0.1

# Discover the underlying formula
regressor = SymbolicRegressor(generations=30)
result = regressor.discover(t, y, verbose=True)

print(f"Discovered: {result.best_tradeoff.expression_string}")
print(f"R-squared:  {result.best_tradeoff.r_squared:.4f}")
# Output: Discovered: 2.498*exp(-0.301*x)*sin(4.002*x + 0.497)
# Output: R-squared:  0.9847

# Export to standalone Python (no PPF dependency)
python_code = export_python(result.best_tradeoff.expression, fn_name="predict")
exec(python_code)
print(predict(1.5))  # Works without importing ppf

# Export to C for embedded deployment
c_code = export_c(result.best_tradeoff.expression, use_float=True)
# Compile with: gcc -std=c99 -O2 -lm model.c

Features

Multi-Domain Discovery

PPF includes domain-specific "vocabularies" that guide the search:

from ppf import SymbolicRegressor, DiscoveryMode

regressor = SymbolicRegressor()

# Let PPF auto-detect the best domain
result = regressor.discover(x, y, mode=DiscoveryMode.AUTO)

# Or specify a domain if you know it
result = regressor.discover(x, y, mode=DiscoveryMode.OSCILLATOR)   # Vibrations, waves
result = regressor.discover(x, y, mode=DiscoveryMode.CIRCUIT)      # RC charging, decay
result = regressor.discover(x, y, mode=DiscoveryMode.GROWTH)       # Sigmoids, saturation
result = regressor.discover(x, y, mode=DiscoveryMode.RATIONAL)     # Ratios, feedback
result = regressor.discover(x, y, mode=DiscoveryMode.UNIVERSAL)    # Power laws, Gaussians

Macro Templates

PPF includes "macros" - pre-composed functional forms that capture common physics:

Macro	Formula	Use Case
`DAMPED_SIN`	`a·exp(-k·t)·sin(ω·t + φ)`	Vibration decay
`RC_CHARGE`	`a·(1 - exp(-k·t)) + c`	Capacitor charging
`GAUSSIAN`	`a·exp(-((x-μ)/σ)²)`	Peaks, distributions
`SIGMOID`	`a / (1 + exp(-k·(x-x₀)))`	Saturation curves
`POWER_LAW`	`a·x^b + c`	Scaling phenomena
`HILL`	`a·x^n / (k^n + x^n)`	Enzyme kinetics

These let PPF find complex forms (multiplicative compositions across function families) that pure GP struggles with.

Export to Edge Devices

Discovered formulas export to production-ready code:

from ppf import export_python, export_c, export_json

# Standalone Python evaluator
code = export_python(expr, fn_name="predict_temp", safe=True)
# Includes safety wrappers for div-by-zero, log(0), exp overflow

# C99 for microcontrollers
code = export_c(expr, use_float=True, safe=True)
# Compiles on ESP32, STM32, Arduino, etc.

# JSON bundle for storage/transmission
bundle = export_json(result, variables=["t"])
# Send via MQTT, store in database, audit trail

Feature Extraction for Downstream ML

Use discovered forms as features for classification/regression:

from ppf import extract_features, feature_vector

# Extract interpretable features
features = extract_features(result)
print(features["dominant_family"])  # "oscillation"
print(features["damping_k"])        # 0.301
print(features["omega"])            # 4.002

# Convert to ML-ready vector
vec, names = feature_vector(features, schema="ppf.features.v1.full")
# Feed to sklearn, XGBoost, etc.

Installation

From PyPI (recommended)

pip install timeseries-formula-finder

From source

git clone https://github.com/pcoz/timeseries-formula-finder.git
cd timeseries-formula-finder
pip install -e .

Optional dependencies

# For hybrid decomposition (EMD/SSA)
pip install timeseries-formula-finder[hybrid]

# For development (tests)
pip install timeseries-formula-finder[dev]

Documentation

Document	Description
USER_GUIDE.md	Complete usage guide with examples
docs/PPF_Paper.md	Core PPF concepts and architecture
docs/PPF_Information_Theory_Paper.md	Information-theoretic foundations
COMPARISON.md	How PPF compares to PySR, gplearn, Eureqa, AI Feynman
USE_CASES.md	Edge AI, ECG analysis, predictive maintenance
TESTING.md	Test suite details and datasets used
docs/PPF_EXPORT_LAYER_TSD.md	Export layer technical specification
docs/IOT_SENSOR_ANALYSIS.md	Real IoT sensor case study
docs/ECG_ANALYSIS.md	ECG waveform analysis case study
DOCUMENTATION.md	Comprehensive API documentation

Use Cases

Edge AI / IoT Sensors

Deploy temperature prediction models to ESP32 in 50 bytes instead of TensorFlow Lite:

# Discover daily temperature cycle from sensor data
result = regressor.discover(hours, temp, mode=DiscoveryMode.OSCILLATOR)
# Found: T(t) = 36 + 5·cos(2π·t/24 - 2)  [R² = 0.58]

# Export to C for microcontroller
c_code = export_c(result.best_tradeoff.expression, use_float=True)
# 3 parameters, 10 FLOPs, runs at 100kHz on ESP32

Biomedical Signal Analysis

Extract interpretable features from ECG waveforms:

# Analyze T-wave morphology
result = regressor.discover(t_wave_time, t_wave_amplitude, mode=DiscoveryMode.UNIVERSAL)
# Found: Damped cosine (R² = 0.96) - indicates repolarization dynamics

# Use parameters as cardiac health features
features = extract_features(result)
# damping_k, omega → feed to arrhythmia classifier

Predictive Maintenance

Classify machine health from vibration signatures:

# Healthy machine: clean sinusoid
# Bearing fault: damped oscillations (impact response)
# Imbalance: dominant 1x RPM

# The discovered FORM is the diagnosis
result = regressor.discover(t, vibration, mode=DiscoveryMode.OSCILLATOR)
if "exp(-" in result.best_tradeoff.expression_string:
    print("Bearing fault detected - damped oscillation signature")

Architecture

ppf/
├── symbolic_types.py     # Expression trees, macros, primitives
├── symbolic.py           # GP engine, symbolic regressor
├── symbolic_utils.py     # Printing, simplification
├── detector.py           # Fixed-form fitting (legacy)
├── residual_layer.py     # Multi-layer decomposition
├── hierarchical.py       # Windowed parameter evolution
├── hybrid.py             # EMD/SSA + PPF interpretation
├── export/
│   ├── python_export.py  # export_python()
│   ├── c_export.py       # export_c()
│   ├── json_export.py    # export_json()
│   └── load.py           # load_json()
└── features/
    ├── extract.py        # extract_features()
    └── vectorize.py      # feature_vector()

Benchmarks

Performance on standard symbolic regression benchmarks:

Benchmark	PPF R²	Complexity	Notes
Kepler's 3rd Law	0.9999	3	`T² = a³`
Damped Oscillator	0.9847	8	Macro template
Logistic Growth	0.9923	5	Sigmoid macro
Nguyen-1 (x³+x²+x)	0.9998	7	Polynomial mode

See BENCHMARK_COMPARISON.md for detailed results.

License

MIT License - see LICENSE for details.

Citation

If you use PPF in research, please cite:

@software{timeseries-formula-finder,
  author = {Chalk, Edward},
  title = {Timeseries Formula Finder: Discover Mathematical Forms in Time-Series Data},
  year = {2026},
  url = {https://github.com/pcoz/timeseries-formula-finder}
}

Contact

Website: fleetingswallow.com
Email: edward@fleetingswallow.com

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Jan 18, 2026

This version

0.1.1

Jan 13, 2026

0.1.0

Jan 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timeseries_formula_finder-0.1.1.tar.gz (147.5 kB view details)

Uploaded Jan 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

timeseries_formula_finder-0.1.1-py3-none-any.whl (64.9 kB view details)

Uploaded Jan 13, 2026 Python 3

File details

Details for the file timeseries_formula_finder-0.1.1.tar.gz.

File metadata

Download URL: timeseries_formula_finder-0.1.1.tar.gz
Upload date: Jan 13, 2026
Size: 147.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for timeseries_formula_finder-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`20ca7519badd50290ccd074134edbb7086bdd6fe1717de2f2d86c748bd11b1dc`
MD5	`4875a4a03d09d2764452c6536f044b61`
BLAKE2b-256	`7c50804a48256f93080a835d05fc316c1f2d0c9f162c9294d4aa36bdd2edf5fb`

See more details on using hashes here.

File details

Details for the file timeseries_formula_finder-0.1.1-py3-none-any.whl.

File metadata

Download URL: timeseries_formula_finder-0.1.1-py3-none-any.whl
Upload date: Jan 13, 2026
Size: 64.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for timeseries_formula_finder-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`19d361f708becef2dce4cb7ac644114e704ebad02a6169a37a377c4374e20931`
MD5	`fc55ebf0c823596b368a24425e3e2a97`
BLAKE2b-256	`123a940d8605db658f578e0495521111c857531424cd88ef60201b255eb567d0`

See more details on using hashes here.

timeseries-formula-finder 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PPF - Symbolic Form Discovery

Core Philosophy

Why PPF?

Quick Start

Features

Multi-Domain Discovery

Macro Templates

Export to Edge Devices

Feature Extraction for Downstream ML

Installation

From PyPI (recommended)

From source

Optional dependencies

Documentation

Use Cases

Edge AI / IoT Sensors

Biomedical Signal Analysis

Predictive Maintenance

Architecture

Benchmarks

License

Citation

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes