Skip to main content

Automatic parametric modeling with symbolic regression

Project description

Docs pip conda Paper Colab
Read the Docs PyPI - Version
Pepy Total Downloads
Conda Version
Conda Downloads
Open in Colab

An API to automate parametric modeling with symbolic regression, originally developed for data analysis in the experimental high-energy physics community, but also applicable beyond.

Symbolfit takes binned data with measurement/systematic uncertainties as input, utilizes PySR to perform a machine-search for batches of functional forms that model the data, parameterizes these functions, and utilizes LMFIT to re-optimize the functions and provide uncertainty estimation, all in one go. It is designed to maximize automatation with minimal human input. Each run produces a batch of functions with uncertainty estimation, which are evaluated, saved, and plotted automatically into readable output files, ready for downstream tasks.

Installation

Prerequisite

Install Julia (backend for PySR):

curl -fsSL https://install.julialang.org | sh

Then check if installed properly:

julia --version

Installation via PyPI

With python>=3.9 and pip:

pip install symbolfit

Installation via conda

conda create --name symbolfit_env python=3.9
conda activate symbolfit_env
conda install -c conda-forge symbolfit

Editable installation for developers

git clone https://github.com/hftsoi/symbolfit.git
cd symbolfit
pip install -e .

Getting Started

To run an example fit, get the example datasets by cloning this repo:

git clone https://github.com/hftsoi/symbolfit.git
cd symbolfit

Then within a python session (or simply do python fit_example.py):

from symbolfit.symbolfit import *

dataset = importlib.import_module('examples.datasets.toy_dataset_1.dataset')
pysr_config = importlib.import_module('examples.pysr_configs.pysr_config_gauss').pysr_config

model = SymbolFit(
    	x = dataset.x,
    	y = dataset.y,
    	y_up = dataset.y_up,
    	y_down = dataset.y_down,
    	pysr_config = pysr_config,
    	max_complexity = 60,
    	input_rescale = True,
    	scale_y_by = 'mean',
    	max_stderr = 20,
    	fit_y_unc = True,
    	random_seed = None,
    	loss_weights = None
)

model.fit()

After the fit, save results to csv files:

model.save_to_csv(output_dir = 'output_dir/')

and plot results to pdf files:

model.plot_to_pdf(
    	output_dir = 'output_dir/',
    	bin_widths_1d = dataset.bin_widths_1d,
    	#bin_edges_2d = dataset.bin_edges_2d,
    	plot_logy = False,
    	plot_logx = False,
        sampling_95quantile = False
)

Candidate functions with full substitutions can be printed promptly:

model.print_candidate(candidate_number = 10)

Each fit will produce a batch of candidate functions and will automatically save all results to six output files:

  1. candidates.csv: saves all candidate functions and evaluations in a csv table.
  2. candidates_reduced.csv: saves a reduced version for essential information without intermediate results.
  3. candidates.pdf: plots all candidate functions with associated uncertainties one by one for fit quality evaluation.
  4. candidates_sampling.pdf: plots all candidate functions with total uncertainty coverage generated by sampling parameters.
  5. candidates_gof.pdf: plots the goodness-of-fit scores.
  6. candidates_correlation.pdf: plots the correlation matrices for the parameters of the candidate functions.

More demonstrations can be found in the documentation.

Documentation

The documentation can be found here for more info and demonstrations.

Citation

If you find this useful in your research, please consider citing Symbolfit:

Coming soon!

and PySR:

@misc{cranmerInterpretableMachineLearning2023,
    title = {Interpretable {Machine} {Learning} for {Science} with {PySR} and {SymbolicRegression}.jl},
    url = {http://arxiv.org/abs/2305.01582},
    doi = {10.48550/arXiv.2305.01582},
    urldate = {2023-07-17},
    publisher = {arXiv},
    author = {Cranmer, Miles},
    month = may,
    year = {2023},
    note = {arXiv:2305.01582 [astro-ph, physics:physics]},
    keywords = {Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Computer Science - Symbolic Computation, Physics - Data Analysis, Statistics and Probability},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

symbolfit-0.1.2.tar.gz (149.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

symbolfit-0.1.2-py3-none-any.whl (156.6 kB view details)

Uploaded Python 3

File details

Details for the file symbolfit-0.1.2.tar.gz.

File metadata

  • Download URL: symbolfit-0.1.2.tar.gz
  • Upload date:
  • Size: 149.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for symbolfit-0.1.2.tar.gz
Algorithm Hash digest
SHA256 083228ad694ec85d9222fbd8d79be22e32fb1c2a3d8d1d8718e0e55293b15b90
MD5 ff773bf9b24f6baff20c5f97cbbb19c2
BLAKE2b-256 fb30278f5e2b67aae69da9b6fff1094ab8fceb0c72d2286920fb25f02b94def2

See more details on using hashes here.

File details

Details for the file symbolfit-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: symbolfit-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 156.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for symbolfit-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2cff832b1e0ed16fab80c887e10d30df3577d4747e79b10daf2f955c0a8faa49
MD5 ce6d512bbe0fc8b91350838da23898f5
BLAKE2b-256 e266c96d8b43a647172ee0356ccf7efe8375f7d48161b2dcf41889525729bc60

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page