Skip to main content

Automatic parametric modeling with symbolic regression

Project description

Documentation Status PyPI version

An API to automate parametric modeling with symbolic regression, originally developed for data analysis in the experimental high-energy physics community, but also applicable beyond.

Symbolfit takes binned data with measurement/systematic uncertainties as input, utilizes PySR to perform a machine-search for batches of functional forms that model the data, parameterizes these functions, and utilizes LMFIT to re-optimize the functions and provide uncertainty estimation, all in one go. It is designed to maximize automatation with minimal human input. Each run produces a batch of functions with uncertainty estimation, which are evaluated, saved, and plotted automatically into readable output files, ready for downstream tasks.

Installation

Prerequisite

Install Julia (backend for PySR)

curl -fsSL https://install.julialang.org | sh

then check if installed properly

julia --version

Installation via PyPI

With Python>=3.9

pip install symbolfit

Getting Started

To run an example fit (or python fit_example.py):

from symbolfit.symbolfit import *

dataset = importlib.import_module('examples.datasets.toy_dataset_1.dataset')
pysr_config = importlib.import_module('examples.pysr_configs.pysr_config_1')

model = SymbolFit(
    	x = dataset.x,
    	y = dataset.y,
    	y_up = dataset.y_up,
    	y_down = dataset.y_down,
    	pysr_config = pysr_config,
    	max_complexity = 60,
    	input_rescale = True,
    	scale_y_by = 'mean',
    	max_stderr = 40,
    	fit_y_unc = True,
    	random_seed = None,
    	loss_weights = None
)

model.fit()

After the fit, save results to csv:

model.save_to_csv(output_dir = 'output_dir/')

and plot results to pdf:

model.plot_to_pdf(
    	output_dir = 'output_dir/',
    	bin_widths_1d = dataset.bin_widths_1d,
    	#bin_edges_2d = dataset.bin_edges_2d,
    	plot_logy = False,
    	plot_logx = False
)

Candidate functions with full substitutions can be printed in prompt:

model.print_candidate(candidate_number = 10)

Each run will produce a batch of candidate functions and will automatically save all results to five output files:

  1. candidates.csv: saves all candidate functions and evaluations in a csv table.
  2. candidates_reduced.csv: saves a reduced version for essential information without intermediate results.
  3. candidates.pdf: plots all candidate functions with associated uncertainties one by one for fit quality evaluation.
  4. candidates_gof.pdf: plots the goodness-of-fit scores.
  5. candidates_correlation.pdf: plots the correlation matrices for the parameters of each candidate function.

Documentation

The documentation can be found here for more info and demonstrations.

Citation

If you find this useful in your research, please consider citing Symbolfit:

Coming soon!

and PySR:

@misc{cranmerInterpretableMachineLearning2023,
    title = {Interpretable {Machine} {Learning} for {Science} with {PySR} and {SymbolicRegression}.jl},
    url = {http://arxiv.org/abs/2305.01582},
    doi = {10.48550/arXiv.2305.01582},
    urldate = {2023-07-17},
    publisher = {arXiv},
    author = {Cranmer, Miles},
    month = may,
    year = {2023},
    note = {arXiv:2305.01582 [astro-ph, physics:physics]},
    keywords = {Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Computer Science - Symbolic Computation, Physics - Data Analysis, Statistics and Probability},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

symbolfit-0.0.2.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

symbolfit-0.0.2-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file symbolfit-0.0.2.tar.gz.

File metadata

  • Download URL: symbolfit-0.0.2.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.18

File hashes

Hashes for symbolfit-0.0.2.tar.gz
Algorithm Hash digest
SHA256 54df18acf9605a5e3afc70f9da8db71a5a4c9b5aea92fa4a03c8a04bad97348d
MD5 ed3365683dca427dc2d735ee9fb503f0
BLAKE2b-256 624de8e9162ac0bdb9db10f2e14e4647725db37434e560f300976ffbc111f9a5

See more details on using hashes here.

File details

Details for the file symbolfit-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: symbolfit-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.18

File hashes

Hashes for symbolfit-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e3a1a2ecf925d7cbcd95f6a40bcde70b1368d3f154cc9e08893c2941728507ab
MD5 48af00f1386e961efec6b088a2fadc9a
BLAKE2b-256 afaf69f1fc08635305588645bba9c9550a2f59fd1ae695daaddd4ef98396780b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page