Skip to main content

Automatic parametric modeling with symbolic regression

Project description

Documentation Status

An API to automate parametric modeling with symbolic regression, originally developed for data analysis in the experimental high-energy physics community, but also applicable beyond.

Symbolfit takes binned data with measurement/systematic uncertainties as input, utilizes PySR to perform a machine-search for batches of functional forms that model the data, parameterizes these functions, and utilizes LMFIT to re-optimize the functions and provide uncertainty estimation, all in one go. It is designed to maximize automatation with minimal human input. Each run produces a batch of functions with uncertainty estimation, which are evaluated, saved, and plotted automatically into readable output files, ready for downstream tasks.

Installation

Prerequisite

Install Julia (backend for PySR)

curl -fsSL https://install.julialang.org | sh

then check if installed properly

julia --version

Installation via PyPI

With Python>=3.9

pip install symbolfit

Upon first installation, run

python3 -m pysr install

Getting Started

To run an example fit (or python fit_example.py):

from symbolfit.symbolfit import *

dataset = importlib.import_module('examples.datasets.toy_dataset_1.dataset')
pysr_config = importlib.import_module('examples.pysr_configs.pysr_config_1')

model = SymbolFit(
    	x = dataset.x,
    	y = dataset.y,
    	y_up = dataset.y_up,
    	y_down = dataset.y_down,
    	pysr_config = pysr_config,
    	max_complexity = 60,
    	input_rescale = True,
    	scale_y_by = 'mean',
    	max_stderr = 40,
    	fit_y_unc = True,
    	random_seed = None,
    	loss_weights = None
)

model.fit()

After the fit, save results to csv:

model.save_to_csv(output_dir = 'output_dir/')

and plot results to pdf:

model.plot_to_pdf(
    	output_dir = 'output_dir/',
    	bin_widths_1d = dataset.bin_widths_1d,
    	#bin_edges_2d = dataset.bin_edges_2d,
    	plot_logy = False,
    	plot_logx = False
)

Candidate functions with full substitutions can be printed in prompt:

model.print_candidate(candidate_number = 10)

Each run will produce a batch of candidate functions and will automatically save all results to five output files:

  1. candidates.csv: saves all candidate functions and evaluations in a csv table.
  2. candidates_reduced.csv: saves a reduced version for essential information without intermediate results.
  3. candidates.pdf: plots all candidate functions with associated uncertainties one by one for fit quality evaluation.
  4. candidates_gof.pdf: plots the goodness-of-fit scores.
  5. candidates_correlation.pdf: plots the correlation matrices for the parameters of each candidate function.

Documentation

The documentation can be found here for more info and demonstrations.

Citation

If you find this useful in your research, please consider citing Symbolfit:

Coming soon!

and PySR:

@misc{cranmerInterpretableMachineLearning2023,
    title = {Interpretable {Machine} {Learning} for {Science} with {PySR} and {SymbolicRegression}.jl},
    url = {http://arxiv.org/abs/2305.01582},
    doi = {10.48550/arXiv.2305.01582},
    urldate = {2023-07-17},
    publisher = {arXiv},
    author = {Cranmer, Miles},
    month = may,
    year = {2023},
    note = {arXiv:2305.01582 [astro-ph, physics:physics]},
    keywords = {Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Computer Science - Symbolic Computation, Physics - Data Analysis, Statistics and Probability},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

symbolfit-0.0.1.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

symbolfit-0.0.1-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file symbolfit-0.0.1.tar.gz.

File metadata

  • Download URL: symbolfit-0.0.1.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.18

File hashes

Hashes for symbolfit-0.0.1.tar.gz
Algorithm Hash digest
SHA256 dff5e5cac4e057deda3ab464e3292ab2be389afc46b4b00a4b9bba7b20f8d651
MD5 ec237691fbc03e8533dc1314360df77e
BLAKE2b-256 fae3f701bc1eec616aff93ee6b1e1472253c83d8e595e78373c8459272627a36

See more details on using hashes here.

File details

Details for the file symbolfit-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: symbolfit-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.18

File hashes

Hashes for symbolfit-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f1959ab16bc2b380e090100ac42ff483ef97f4b5f11ce2840641ad43801c627f
MD5 d72da3bb904db19057ca9481ef1b3b87
BLAKE2b-256 3aa20e9b80897f50b963d524c9afbcac32903cccd6438ca3dcbd7eef6b8bfafe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page