Skip to main content

Symbolic regression by genetic programming (C++ engine, Python bindings)

Project description

eqhunt

Symbolic regression by genetic programming. C++ engine, Python bindings via nanobind.

Give it a table of (inputs, target) pairs; it returns a human-readable formula that approximates the relationship. No neural network, no black box — just an algebraic expression you can read, paste into a calculator, or hand-tune.

import eqhunt

X = [[1, 1], [2, 3], [4, 5], [7, 2], [9, 9]]
y = [2, 5, 9, 9, 18]

model = eqhunt.fit(X, y)
print(model.formula)        # e.g.  f(x,y) = (x+y)
print(model.predict([6, 7])) # -> 13.0

Install

pip install eqhunt

Prebuilt wheels are published for Linux, macOS and Windows on common Python versions. If pip falls back to building from source you'll need a C++17 compiler.

Two ways to use it

Ultra-simple

import eqhunt

model = eqhunt.fit(X, y, generations=5000)
print(model.formula)
model.predict([1, 2])         # single row
model.predict([[1, 2], [3, 4]])  # batch

fit() accepts any Config field as a keyword argument:

eqhunt.fit(X, y, pop=800, trig_penalty=2.0, bloat_penalty=0.3)

Fully configurable

import eqhunt

cfg = eqhunt.Config()
cfg.pop               = 800
cfg.gen               = 50000
cfg.tournament_size   = 5
cfg.initial_depth     = 5
cfg.bloat_penalty     = 0.3
cfg.trig_penalty      = 1.5
cfg.accepted_error    = 0.01

# Re-weight individual operators (higher = more likely to appear)
cfg.op_weights.sin = 1.0      # boost sine
cfg.op_weights.cos = 1.0
cfg.op_weights.exp = 0.0      # disable exp entirely
cfg.pi_prob = 0.10            # 'pi' more frequent in terminals

model = eqhunt.Model(cfg).fit(X, y)
print(model.formula)

You can also train from a CSV file (one row per sample, last column = target, lines starting with # are comments):

eqhunt.Model().fit_csv("nivel_embase.csv")

Operators available

Category Operators
Arithmetic + - * / -x
Powers sqrt **
Conditional if(cond, then, else) (cond > 0)
Trig sin cos tan
Exp / log exp log
Constants numeric literals, pi

Trigonometric, log and exp nodes have low default weights so they only appear after enough mutation pressure — useful for cyclic / physical data, ignored otherwise. Adjust via Config.op_weights.

How error and validity are handled

  • Per-sample error is |prediction - target|; total error is the sum.
  • Invalid evaluations (/0, sqrt(<0), log(<=0), exp(huge)) get a soft per-sample penalty rather than killing the whole formula — a single out-of-domain sample no longer disqualifies an otherwise good candidate. If more than 25% of samples fail, the formula is rejected.

Stopping early

Config.accepted_error stops the search as soon as total error drops below the threshold. You can also call model.stop() from another thread (or a signal handler) to ask the loop to wrap up after the current generation.

Config reference

Field Default Meaning
pop 400 Population size
gen 15000 Max generations
tournament_size 4 Tournament selection pool
crossover_prob 0.7 Crossover probability per pair
mutation_prob 0.25 Mutation probability per offspring
initial_depth 4 Depth used to seed the initial population
mutation_depth 3 Depth for mutation-generated subtrees
const_min/max -9, 9 Range for random numeric terminals
pi_prob 0.01 Probability a terminal is pi
bloat_penalty 0.1 Per-node penalty (favours smaller trees)
trig_penalty 0.5 Extra penalty per sin/cos/tan/log/exp node
immigrant_rate 0.05 Fraction of population replaced by random each gen
weak_parent_rate 0.2 Prob. 2nd parent is random (not tournament)
accepted_error 0.5 Stop training once total error < this value
verbose False* Print best-so-far per improvement
simplify True Run algebraic simplification on the final tree
simplify_interval 500 Periodically simplify top-N members during training
simplify_top_n 10 How many to simplify periodically

*C++ default is True; the Python fit() helper defaults to False.

Building from source

git clone https://github.com/uid0/eqhunt
cd eqhunt
pip install -e .
pytest

Requires Python 3.8+, a C++17 compiler, CMake 3.15+.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eqhunt-0.0.1.tar.gz (17.6 kB view details)

Uploaded Source

File details

Details for the file eqhunt-0.0.1.tar.gz.

File metadata

  • Download URL: eqhunt-0.0.1.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for eqhunt-0.0.1.tar.gz
Algorithm Hash digest
SHA256 96adc01dfa6580e5c48ee0d29bce36f06361eebaf8c7cf1cfc2e7109b25987ae
MD5 a257e9553ed08b06f64d96f243f164ce
BLAKE2b-256 28108abeefba1824bf75aac142f075dea15f8395ddcfd04510e7166e37efc9cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page