Skip to main content

Automatic parametric modeling with symbolic regression

Project description

Docs pip version conda version pip stats conda stats
Read the Docs PyPI - Version Conda Version Pepy Total Downloads Conda Downloads

An API to automate parametric modeling with symbolic regression, originally developed for data analysis in the experimental high-energy physics community, but also applicable beyond.

Symbolfit takes binned data with measurement/systematic uncertainties as input, utilizes PySR to perform a machine-search for batches of functional forms that model the data, parameterizes these functions, and utilizes LMFIT to re-optimize the functions and provide uncertainty estimation, all in one go. It is designed to maximize automatation with minimal human input. Each run produces a batch of functions with uncertainty estimation, which are evaluated, saved, and plotted automatically into readable output files, ready for downstream tasks.

Installation

Prerequisite

Install Julia (backend for PySR)

curl -fsSL https://install.julialang.org | sh

then check if installed properly

julia --version

Installation via PyPI

# With Python>=3.9
pip install symbolfit

Installation via conda

conda create --name symbolfit_env python=3.9
conda activate symbolfit_env
conda install conda-forge::symbolfit

Getting Started

To run an example fit (or simply python fit_example.py):

from symbolfit.symbolfit import *

dataset = importlib.import_module('examples.datasets.toy_dataset_1.dataset')
pysr_config = importlib.import_module('examples.pysr_configs.pysr_config_1').pysr_config

model = SymbolFit(
    	x = dataset.x,
    	y = dataset.y,
    	y_up = dataset.y_up,
    	y_down = dataset.y_down,
    	pysr_config = pysr_config,
    	max_complexity = 60,
    	input_rescale = True,
    	scale_y_by = 'mean',
    	max_stderr = 20,
    	fit_y_unc = True,
    	random_seed = None,
    	loss_weights = None
)

model.fit()

After the fit, save results to csv:

model.save_to_csv(output_dir = 'output_dir/')

and plot results to pdf:

model.plot_to_pdf(
    	output_dir = 'output_dir/',
    	bin_widths_1d = dataset.bin_widths_1d,
    	#bin_edges_2d = dataset.bin_edges_2d,
    	plot_logy = False,
    	plot_logx = False,
        sampling_95quantile = False
)

Candidate functions with full substitutions can be printed in prompt:

model.print_candidate(candidate_number = 10)

Each run will produce a batch of candidate functions and will automatically save all results to five output files:

  1. candidates.csv: saves all candidate functions and evaluations in a csv table.
  2. candidates_reduced.csv: saves a reduced version for essential information without intermediate results.
  3. candidates.pdf: plots all candidate functions with associated uncertainties one by one for fit quality evaluation.
  4. candidates_sampling.pdf: plots all candidate functions with total uncertainty coverage generated by sampling parameters.
  5. candidates_gof.pdf: plots the goodness-of-fit scores.
  6. candidates_correlation.pdf: plots the correlation matrices for the parameters of each candidate function.

Documentation

The documentation can be found here for more info and demonstrations.

Citation

If you find this useful in your research, please consider citing Symbolfit:

Coming soon!

and PySR:

@misc{cranmerInterpretableMachineLearning2023,
    title = {Interpretable {Machine} {Learning} for {Science} with {PySR} and {SymbolicRegression}.jl},
    url = {http://arxiv.org/abs/2305.01582},
    doi = {10.48550/arXiv.2305.01582},
    urldate = {2023-07-17},
    publisher = {arXiv},
    author = {Cranmer, Miles},
    month = may,
    year = {2023},
    note = {arXiv:2305.01582 [astro-ph, physics:physics]},
    keywords = {Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Computer Science - Symbolic Computation, Physics - Data Analysis, Statistics and Probability},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

symbolfit-0.1.0.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

symbolfit-0.1.0-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file symbolfit-0.1.0.tar.gz.

File metadata

  • Download URL: symbolfit-0.1.0.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for symbolfit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6ccb834e38d6b297cc4fa16900baaad2083a077775ac723b428871947fa1744d
MD5 fe6746d5eb7b8d2f21eafea063c66bc7
BLAKE2b-256 48db8eb9c7005168e0d9091eaeb383d3f94ac8a658548ff0bf2ff5297083c18d

See more details on using hashes here.

File details

Details for the file symbolfit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: symbolfit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for symbolfit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 18aee8c8397cb17f26fb8f48bbe5ca560c7e0338b51ab2ee6ef5008358aaba85
MD5 ff8f9250c72511d9cab1c8a5c8045de7
BLAKE2b-256 99e4225a7936a9ecc51db0aa8e07053cbebbf26569d15b4c7bfe625fc687d128

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page