Skip to main content

Automatic parametric modeling with symbolic regression

Project description

Animation

Docs Paper Slides Colab pip conda
Read the Docs arXiv slides Open in Colab PyPI - Version Conda Version

An API to automate parametric modeling with symbolic regression, originally developed for data analysis in the experimental high-energy physics community, but also applicable beyond.

SymbolFit takes binned data with measurement/systematic uncertainties (optional) as input, utilizes PySR to perform a machine-search for batches of functional forms that model the data, parameterizes these functions, and utilizes LMFIT to re-optimize the functions and provide uncertainty estimation, all in one go. It is designed to maximize automation with minimal human input. Each run produces a batch of functions with uncertainty estimation, which are evaluated, saved, and plotted automatically into readable output files, ready for downstream tasks.

In short, symbolfit = pysr (symbolic regression to generate functional forms) + lmfit (re-optimization & uncertainty modeling) + auto-evaluation tools (parameter correlation, uncertainty variation and coverage, statistical tests, etc.).

Installation

Installation via PyPI

With python>=3.10 and pip:

pip install symbolfit
Installation via conda

(to be updated for >= v0.2.0, please use pip for now)

conda create --name symbolfit_env python=3.10
conda activate symbolfit_env
conda install -c conda-forge symbolfit

Julia (backend for PySR) will be automatically installed at first import of PySR:

import pysr

Getting Started

To run an example fit, get the example datasets by cloning this repo:

git clone https://github.com/hftsoi/symbolfit.git
cd symbolfit

Then within a python session (or simply do python fit_example.py):

from symbolfit.symbolfit import *

dataset = importlib.import_module('examples.datasets.toy_dataset_1.dataset')
pysr_config = importlib.import_module('examples.pysr_configs.pysr_config_gauss').pysr_config

model = SymbolFit(
    	x = dataset.x,
    	y = dataset.y,
    	y_up = dataset.y_up,
    	y_down = dataset.y_down,
    	pysr_config = pysr_config,
    	max_complexity = 60,
    	input_rescale = True,
    	scale_y_by = 'mean',
    	max_stderr = 20,
    	fit_y_unc = True,
    	random_seed = None,
    	loss_weights = None
)

model.fit()

After the fit, save results to csv files:

model.save_to_csv(output_dir = 'output_dir/')

and plot results to pdf files:

model.plot_to_pdf(
    	output_dir = 'output_dir/',
    	bin_widths_1d = dataset.bin_widths_1d,
    	plot_logy = False,
    	plot_logx = False,
        sampling_95quantile = False,
        #bin_edges_2d = dataset.bin_edges_2d,
        #plot_logx0 = False,
        #plot_logx1 = False,
        #cbar_min = None,
        #cbar_max = None,
        #cmap = None,
        #contour = None,
        # ^ additional options for 2D plotting
)

Candidate functions with full substitutions can be printed promptly:

model.print_candidate(candidate_number = 20)

When preparing for your own data, a graphical illustration of the input data format can be found here.

Each fit will produce a batch of candidate functions and will automatically save all results to six output files:

  1. candidates.csv: saves all candidate functions and evaluations in a csv table.
  2. candidates_compact.csv: saves a reduced version for essential information without intermediate results.
  3. candidates.pdf: plots all candidate functions (1D/2D only for now) with associated uncertainties one by one for fit quality evaluation.
  4. candidates_sampling.pdf: plots all candidate functions (1D only for now) with total uncertainty coverage generated by sampling parameters.
  5. candidates_gof.pdf: plots the goodness-of-fit scores.
  6. candidates_correlation.pdf: plots the correlation matrices for the parameters of the candidate functions.

Output files from an example fit can be found and downloaded here for illustration.

Note: The function space is usually huge, even when constrained by the pysr config. This means that if you are not satisfied with the results from a fit, you can simply rerun it with the exact same config and obtain a completely different set of candidate functions—the only difference being the random seed that initiates the seeding functions. Therefore, you can rerun the fit as many times as you want until you are satisfied with the results. If you use model = SymbolFit(..., random_seed = None, ...), nothing needs to be changed—just rerun the fit. If you set a specific random_seed, change its value before rerunning. However, if you are still not satisfied with the results after many trials, it might indicate an issue with the config. Then you might want to try a different config, tune it, and start new runs.

For detailed instructions and more demonstrations, please check out the Colab notebook and the documentation.

Documentation

The documentation can be found here for more information and demonstrations.

Citation

If you find this useful in your research, please consider citing both SymbolFit PySR:

@article{Tsoi:2024pbn,
    author = "Tsoi, Ho Fung and Rankin, Dylan and Caillol, Cecile and Cranmer, Miles and Dasu, Sridhara and Duarte, Javier and Harris, Philip and Lipeles, Elliot and Loncar, Vladimir",
    title = "{SymbolFit: Automatic Parametric Modeling with Symbolic Regression}",
    eprint = "2411.09851",
    archivePrefix = "arXiv",
    primaryClass = "hep-ex",
    doi = "10.1007/s41781-025-00140-9",
    journal = "Comput. Softw. Big Sci.",
    volume = "9",
    pages = "12",
    year = "2025"
}

@misc{cranmerInterpretableMachineLearning2023,
      title = {Interpretable {Machine} {Learning} for {Science} with {PySR} and {SymbolicRegression}.jl},
      url = {http://arxiv.org/abs/2305.01582},
      doi = {10.48550/arXiv.2305.01582},
      urldate = {2023-07-17},
      publisher = {arXiv},
      author = {Cranmer, Miles},
      month = may,
      year = {2023},
      note = {arXiv:2305.01582 [astro-ph, physics:physics]},
      keywords = {Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Computer Science - Symbolic Computation, Physics - Data Analysis, Statistics and Probability},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

symbolfit-0.2.3.tar.gz (122.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

symbolfit-0.2.3-py3-none-any.whl (129.8 kB view details)

Uploaded Python 3

File details

Details for the file symbolfit-0.2.3.tar.gz.

File metadata

  • Download URL: symbolfit-0.2.3.tar.gz
  • Upload date:
  • Size: 122.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for symbolfit-0.2.3.tar.gz
Algorithm Hash digest
SHA256 3a39546396340bd520839ae12b313da9270ed40fa3c74f761540f900617f017a
MD5 4b8c503c5e94a0d76a7acdee65f3f5cd
BLAKE2b-256 721856579fd3e40b37d5b262a3a13c6f6580ea8a6b92b3cbb3bee9ea9aa4c787

See more details on using hashes here.

Provenance

The following attestation bundles were made for symbolfit-0.2.3.tar.gz:

Publisher: publish.yml on hftsoi/symbolfit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file symbolfit-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: symbolfit-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 129.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for symbolfit-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 832b23cc9deeb85128a248447e6a5689423a8b7f5da83943cc7c9a31a93c46ef
MD5 ee6609e6d6939991cded505a897aa133
BLAKE2b-256 3a10e36a4b846752015cc8a76d114f41f6daa115c2e3a13cc680cd7fcdbc629a

See more details on using hashes here.

Provenance

The following attestation bundles were made for symbolfit-0.2.3-py3-none-any.whl:

Publisher: publish.yml on hftsoi/symbolfit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page