Simple and efficient symbolic regression

These details have not been verified by PyPI

Project links

Homepage

Project description

PySR.jl

Symbolic regression built on Julia, and interfaced by Python. Uses regularized evolution and simulated annealing.

Backstory: we used the original eureqa in our paper to convert a graph neural network into an analytic equation describing dark matter overdensity. However, eureqa is GUI-only, doesn't allow for user-defined operators, has no distributed capabilities, and has become proprietary. Thus, the goal of this package is to have an open-source symbolic regression tool as efficient as eureqa, while also exposing a configurable python interface.

The algorithms here implement regularized evolution, as in AutoML-Zero, but with additional algorithmic changes such as simulated annealing, and classical optimization of constants.

Installation

Install Julia. Then, at the command line, install the Optim package via: julia -e 'import Pkg; Pkg.add("Optim")'. For python, you need to have Python 3, numpy, and pandas installed.

Running:

Quickstart

import numpy as np
from pysr import pysr

# Dataset
X = 2*np.random.randn(100, 5)
y = 2*np.cos(X[:, 3]) + X[:, 0]**2 - 2

# Learn equations
equations = pysr(X, y, niterations=5)

...

print(equations)

which gives:

   Complexity       MSE                                                Equation
0           5  1.947431                          plus(-1.7420927, mult(x0, x0))
1           8  0.486858           plus(-1.8710494, plus(cos(x3), mult(x0, x0)))
2          11  0.000000  plus(plus(mult(x0, x0), cos(x3)), plus(-2.0, cos(x3)))

API

What follows is the API reference for running the numpy interface. You likely don't need to tune the hyperparameters yourself, but if you would like, you can use hyperopt.py as an example. However, you should adjust threads, niterations, binary_operators, unary_operators, and maxsize to your requirements.

The program will output a pandas DataFrame containing the equations, mean square error, and complexity. It will also dump to a csv at the end of every iteration, which is hall_of_fame.csv by default. It also prints the equations to stdout.

You can add more operators in operators.jl, or use default Julia ones. Make sure all operators are defined for scalar Float32. Then just specify the operator names in your call, as above. You can also change the dataset learned on by passing in X and y as numpy arrays to pysr(...).

pysr(X=None, y=None, threads=4, niterations=20,
   ncyclesperiteration=int(default_ncyclesperiteration),
   binary_operators=["plus", "mult"], unary_operators=["cos", "exp", "sin"],
   alpha=default_alpha, annealing=True, fractionReplaced=default_fractionReplaced,
   fractionReplacedHof=default_fractionReplacedHof, npop=int(default_npop),
   parsimony=default_parsimony, migration=True, hofMigration=True
   shouldOptimizeConstants=True, topn=int(default_topn),
   weightAddNode=default_weightAddNode, weightDeleteNode=default_weightDeleteNode,
   weightDoNothing=default_weightDoNothing,
   weightMutateConstant=default_weightMutateConstant,
   weightMutateOperator=default_weightMutateOperator,
   weightRandomize=default_weightRandomize, weightSimplify=default_weightSimplify,
   timeout=None, equation_file='hall_of_fame.csv', test='simple1', maxsize=20)

Run symbolic regression to fit f(X[i, :]) ~ y[i] for all i.

Arguments:

X: np.ndarray, 2D array. Rows are examples, columns are features.
y: np.ndarray, 1D array. Rows are examples.
threads: int, Number of threads (=number of populations running). You can have more threads than cores - it actually makes it more efficient.
niterations: int, Number of iterations of the algorithm to run. The best equations are printed, and migrate between populations, at the end of each.
ncyclesperiteration: int, Number of total mutations to run, per 10 samples of the population, per iteration.
binary_operators: list, List of strings giving the binary operators in Julia's Base, or in operator.jl.
unary_operators: list, Same but for operators taking a single Float32.
alpha: float, Initial temperature.
annealing: bool, Whether to use annealing. You should (and it is default).
fractionReplaced: float, How much of population to replace with migrating equations from other populations.
fractionReplacedHof: float, How much of population to replace with migrating equations from hall of fame.
npop: int, Number of individuals in each population
parsimony: float, Multiplicative factor for how much to punish complexity.
migration: bool, Whether to migrate.
hofMigration: bool, Whether to have the hall of fame migrate.
shouldOptimizeConstants: bool, Whether to numerically optimize constants (Nelder-Mead/Newton) at the end of each iteration.
topn: int, How many top individuals migrate from each population.
weightAddNode: float, Relative likelihood for mutation to add a node
weightDeleteNode: float, Relative likelihood for mutation to delete a node
weightDoNothing: float, Relative likelihood for mutation to leave the individual
weightMutateConstant: float, Relative likelihood for mutation to change the constant slightly in a random direction.
weightMutateOperator: float, Relative likelihood for mutation to swap an operator.
weightRandomize: float, Relative likelihood for mutation to completely delete and then randomly generate the equation
weightSimplify: float, Relative likelihood for mutation to simplify constant parts by evaluation
timeout: float, Time in seconds to timeout search
equation_file: str, Where to save the files (.csv separated by |)
test: str, What test to run, if X,y not passed.
maxsize: int, Max size of an equation.

Returns:

pd.DataFrame, Results dataframe, giving complexity, MSE, and equations (as strings).

TODO

Rename package to avoid trademark issues
- PySR?
Calculate feature importances of future mutations, by looking at correlation between residual of model, and the features.
- Store feature importances of future, and periodically update it.
Implement more parts of the original Eureqa algorithms: https://www.creativemachineslab.com/eureqa.html
Sympy printing
Consider adding mutation for constant<->variable
Hierarchical model, so can re-use functional forms. Output of one equation goes into second equation?
Use NN to generate weights over all probability distribution conditional on error and existing equation, and train on some randomly-generated equations
Performance:
- Use an enum for functions instead of storing them?
- Current most expensive operations:
  - Calculating the loss function - there is duplicate calculations happening.
  - Declaration of the weights array every iteration
Make scaling of changes to constant a hyperparameter
Make deletion op join deleted subtree to parent
Update hall of fame every iteration?
- Seems to overfit early if we do this.
Consider adding mutation to pass an operator in through a new binary operator (e.g., exp(x3)->plus(exp(x3), ...))
- (Added full insertion operator
Add a node at the top of a tree
Insert a node at the top of a subtree
Record very best individual in each population, and return at end.
Write our own tree copy operation; deepcopy() is the slowest operation by far.
Hyperparameter tune
Create a benchmark for accuracy
Add interface for either defining an operation to learn, or loading in arbitrary dataset.
- Could just write out the dataset in julia, or load it.
Create a Python interface
Explicit constant optimization on hall-of-fame
- Create method to find and return all constants, from left to right
- Create method to find and set all constants, in same order
- Pull up some optimization algorithm and add it. Keep the package small!
Create a benchmark for speed
Simplify subtrees with only constants beneath them. Or should I? Maybe randomly simplify sometimes?
Record hall of fame
Optionally (with hyperparameter) migrate the hall of fame, rather than current bests
Test performance of reduced precision integers
- No effect
Create struct to pass through all hyperparameters, instead of treating as constants
- Make sure doesn't affect performance

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.0.0a1 pre-release

Oct 8, 2025

1.5.9

Jul 15, 2025

1.5.8

May 20, 2025

1.5.7

May 19, 2025

1.5.6

May 4, 2025

1.5.5

Apr 2, 2025

1.5.4

Apr 1, 2025

1.5.3

Mar 28, 2025

1.5.2

Mar 5, 2025

1.5.1

Mar 1, 2025

1.5.0

Feb 25, 2025

1.4.0

Feb 13, 2025

1.3.1

Dec 27, 2024

1.3.0

Dec 15, 2024

1.2.0

Dec 14, 2024

1.1.0

Dec 9, 2024

1.0.2

Dec 7, 2024

1.0.1

Dec 6, 2024

1.0.0

Dec 1, 2024

0.19.4

Aug 23, 2024

0.19.3

Jul 29, 2024

0.19.2

Jul 15, 2024

0.19.1

Jul 15, 2024

0.19.0

Jun 22, 2024

0.18.5

Jun 16, 2024

0.18.4

May 4, 2024

0.18.3

Apr 26, 2024

0.18.2

Apr 15, 2024

0.18.1

Mar 26, 2024

0.18.0

Mar 24, 2024

0.17.4

Mar 21, 2024

0.17.3

Mar 20, 2024

0.17.2

Mar 12, 2024

0.17.1

Feb 13, 2024

0.17.0

Feb 12, 2024

0.16.9

Jan 5, 2024

0.16.8

Dec 31, 2023

0.16.7

Dec 31, 2023

0.16.6

Dec 24, 2023

0.16.5

Dec 14, 2023

0.16.4

Dec 13, 2023

0.16.3

Aug 21, 2023

0.16.2

Aug 17, 2023

0.16.1

Aug 10, 2023

0.16.0

Aug 7, 2023

0.15.4

Aug 4, 2023

0.15.3

Aug 2, 2023

0.15.2

Aug 1, 2023

0.15.1

Jul 30, 2023

0.15.0

Jul 28, 2023

0.14.3

Jul 4, 2023

0.14.2

Jun 20, 2023

0.14.1

May 28, 2023

0.14.0

May 20, 2023

0.13.0

May 12, 2023

0.12.3

Apr 27, 2023

0.12.2

Apr 22, 2023

0.12.1

Mar 25, 2023

0.12.0

Mar 21, 2023

0.11.17

Mar 7, 2023

0.11.16

Mar 1, 2023

0.11.15

Feb 18, 2023

0.11.14

Feb 13, 2023

0.11.13

Feb 9, 2023

0.11.12

Jan 16, 2023

0.11.11

Nov 22, 2022

0.11.10

Nov 21, 2022

0.11.9

Nov 5, 2022

0.11.8

Nov 4, 2022

0.11.7

Nov 3, 2022

0.11.6

Oct 31, 2022

0.11.5

Oct 24, 2022

0.11.4

Oct 10, 2022

0.11.3

Oct 6, 2022

0.11.2

Sep 28, 2022

0.11.1.post1

Sep 27, 2022

0.11.1

Sep 26, 2022

0.11.0

Sep 11, 2022

0.10.4.post1

Sep 8, 2022

0.10.4

Sep 8, 2022

0.10.3

Sep 6, 2022

0.10.2

Sep 6, 2022

0.10.1

Aug 14, 2022

0.10.0

Aug 14, 2022

0.9.5

Jul 9, 2022

0.9.4

Jul 8, 2022

0.9.3

Jun 22, 2022

0.9.2

Jun 20, 2022

0.9.1

Jun 6, 2022

0.9.0.post2

Jun 6, 2022

0.9.0.post1

Jun 4, 2022

0.9.0

Jun 4, 2022

0.8.7

May 25, 2022

0.8.6

May 22, 2022

0.8.5

May 20, 2022

0.8.4

May 12, 2022

0.8.3

May 9, 2022

0.8.2

May 8, 2022

0.8.1

May 8, 2022

0.8.0

May 8, 2022

0.7.13

May 7, 2022

0.7.12

May 4, 2022

0.7.11

Apr 26, 2022

0.7.10

Apr 10, 2022

0.7.9

Feb 23, 2022

0.7.8

Feb 23, 2022

0.7.7

Feb 15, 2022

0.7.6

Feb 14, 2022

0.7.5

Feb 13, 2022

0.7.4

Feb 13, 2022

0.7.3

Feb 12, 2022

0.7.2

Feb 5, 2022

0.7.1

Feb 5, 2022

0.7.0.post6

Feb 4, 2022

0.7.0.post5

Feb 3, 2022

0.7.0.post4

Feb 3, 2022

0.7.0.post3

Feb 2, 2022

0.7.0.post2

Feb 1, 2022

0.7.0.post1

Feb 1, 2022

0.7.0

Jan 31, 2022

0.7.0a3 pre-release

Jan 25, 2022

0.7.0a2 pre-release

Jan 22, 2022

0.7.0a1 pre-release

Jan 20, 2022

0.6.14

Nov 19, 2021

0.6.13

Sep 14, 2021

0.6.12.post1

Aug 4, 2021

0.6.12

Aug 4, 2021

0.6.11

Jul 12, 2021

0.6.10

Jun 19, 2021

0.6.9

Jun 16, 2021

0.6.8.post3

Jun 15, 2021

0.6.8.post2

Jun 14, 2021

0.6.8.post1

Jun 14, 2021

0.6.8

Jun 13, 2021

0.6.7

Jun 12, 2021

0.6.6

Jun 9, 2021

0.6.5

Jun 8, 2021

0.6.4

Jun 6, 2021

0.6.3

Jun 6, 2021

0.6.2

Jun 4, 2021

0.6.1

Jun 3, 2021

0.6.0.post1

Jun 1, 2021

0.6.0

Jun 1, 2021

0.6.0rc2 pre-release

May 31, 2021

0.6.0rc1 pre-release

May 30, 2021

0.5.16.post2

Mar 17, 2021

0.5.16.post1

Mar 16, 2021

0.5.16

Mar 7, 2021

0.5.13.post2

Mar 4, 2021

0.5.13.post1

Mar 1, 2021

0.5.13

Feb 27, 2021

0.5.10

Feb 26, 2021

0.5.9

Feb 25, 2021

0.5.2

Feb 24, 2021

0.5.0.post1

Feb 23, 2021

0.5.0

Feb 12, 2021

0.4.9

Feb 9, 2021

0.4.7

Feb 6, 2021

0.4.6

Feb 4, 2021

0.4.4

Feb 4, 2021

0.4.2.post1

Feb 3, 2021

0.4.2

Feb 3, 2021

0.4.0

Feb 1, 2021

0.3.37

Jan 19, 2021

0.3.36

Jan 3, 2021

0.3.35

Dec 30, 2020

0.3.34

Dec 23, 2020

0.3.33

Dec 22, 2020

0.3.32

Dec 22, 2020

0.3.31

Nov 18, 2020

0.3.30

Nov 9, 2020

0.3.29

Oct 31, 2020

0.3.28

Oct 19, 2020

0.3.27

Oct 16, 2020

0.3.26

Oct 16, 2020

0.3.25

Oct 13, 2020

0.3.24

Oct 11, 2020

0.3.23

Oct 11, 2020

0.3.22

Oct 11, 2020

0.3.21

Oct 10, 2020

0.3.20

Oct 10, 2020

0.3.19

Oct 8, 2020

0.3.18

Oct 8, 2020

0.3.17

Oct 2, 2020

0.3.16

Oct 1, 2020

0.3.15

Sep 29, 2020

0.3.14

Sep 29, 2020

0.3.13

Sep 29, 2020

0.3.12

Sep 29, 2020

0.3.11

Sep 29, 2020

0.3.10

Sep 28, 2020

0.3.9

Sep 28, 2020

0.3.8

Sep 27, 2020

0.3.7

Sep 27, 2020

0.3.6

Sep 27, 2020

0.3.4

Sep 26, 2020

0.3.3

Sep 26, 2020

0.3.2

Sep 26, 2020

0.3.0

Sep 25, 2020

0.2.3

Sep 24, 2020

0.2.2

Sep 22, 2020

0.2.1

Sep 21, 2020

0.2.0

Sep 21, 2020

This version

0.1.0

Sep 19, 2020

0.0.1

Sep 19, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysr-0.1.0.tar.gz (12.4 kB view details)

Uploaded Sep 19, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pysr-0.1.0-py3-none-any.whl (17.6 kB view details)

Uploaded Sep 19, 2020 Python 3

File details

Details for the file pysr-0.1.0.tar.gz.

File metadata

Download URL: pysr-0.1.0.tar.gz
Upload date: Sep 19, 2020
Size: 12.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200323 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.10

File hashes

Hashes for pysr-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`29f4a4d5e64127067a812da72aade0da08de95e39e05d148d6e762b98a67a613`
MD5	`6a56185e1be830497777f92a69a1e69b`
BLAKE2b-256	`062c69f591c8c1d3dfd7315567d7187c3d4c91366b1e1deafe6029e7beb08648`

See more details on using hashes here.

File details

Details for the file pysr-0.1.0-py3-none-any.whl.

File metadata

Download URL: pysr-0.1.0-py3-none-any.whl
Upload date: Sep 19, 2020
Size: 17.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200323 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.10

File hashes

Hashes for pysr-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9f73148bbc3a0b049a939493f2511bccc105445a51f0052e566171fdb9ed9a94`
MD5	`890bbb348c3bf1c1f02bb8c19cbb7328`
BLAKE2b-256	`d8f0611f233f5db9cd2bc48b546859836c20a739277931bb08334e126587687a`

See more details on using hashes here.

pysr 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PySR.jl

Installation

Running:

Quickstart

API

TODO

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes