A lightweight and user-friendly Python implementation of the SISSO algorithm for symbolic regression, compatible with scikit-learn.

These details have not been verified by PyPI

Project links

Project description

mini-sisso

mini-sisso is a lightweight and user-friendly Python implementation of the SISSO (Sure Independence Screening and Sparsifying Operator) symbolic regression algorithm. It offers full compatibility with the scikit-learn ecosystem for discovering interpretable mathematical models from data.

Inheriting the advanced exploration capabilities of the original C++/Fortran-based implementation, mini-sisso provides these features in a more modern and accessible package:

🚀 Easy Adoption: Simple pip install. The default CPU version has minimal dependencies (NumPy/SciPy), ensuring a hassle-free setup.
🧠 Memory Efficiency & Fast Exploration:
- A "recipe-based" architecture dramatically reduces memory consumption during Feature Expansion.
- The "Level-wise SIS" feature (toggleable) speeds up exploration by pruning unpromising features early.
🤝 Full scikit-learn Compatibility: Seamlessly integrates with powerful tools like GridSearchCV and Pipeline, in addition to the standard fit()/predict() interface.
⚡ Optional GPU Support: Achieve significant speedups with GPU acceleration by installing the optional PyTorch backend.

📥 Installation

CPU Version (Default, Recommended)

Installs the lightweight CPU version from PyPI, which depends only on NumPy/SciPy.

pip install mini-sisso

GPU Version (Optional)

To enable GPU acceleration with the PyTorch backend, install with the [gpu] option.

pip install "mini-sisso[gpu]"

🚀 Quick Start

Discover a mathematical model from your data in just a few lines of code.

import pandas as pd
import numpy as np
from mini_sisso.model import MiniSisso

# 1. Prepare Data
np.random.seed(42) # Set seed for reproducibility
# Create feature data (X)
X_df = pd.DataFrame(np.random.rand(100, 2) *, columns=["feature_A", "feature_B"])
# Create target data (y) from a true equation: y = 2*sin(feature_A) + feature_B^2 + noise
y_series = pd.Series(2 * np.sin(X_df["feature_A"]) + X_df["feature_B"]**2 + np.random.randn(100) * 0.1)

# 2. Instantiate the Model
# You can set all hyperparameters. Comment out or use defaults for those you don't need.
model = MiniSisso(
    # --- Control the search space ---
    n_expansion=2,          # Depth of feature expansion (higher value finds more complex equations but takes longer)
    operators=["+", "sin", "pow2"], # List of operators for feature expansion
    
    # --- Control model complexity ---
    n_term=2,               # Max number of terms in the equation (for 'exhaustive' method)
    
    # --- Select the search strategy ---
    so_method="exhaustive", # Model search strategy ('exhaustive' or 'lasso')
    # alpha=0.01,           # Regularization parameter for so_method='lasso'
    
    # --- Control computational efficiency ---
    use_levelwise_sis=True, # Use staged feature pruning for speed (strongly recommended)
    k_per_level=50,         # If use_levelwise_sis=True, number of promising features to keep at each level
    k=10,                   # Number of feature candidates for each term in the final model
    
    # --- Select the execution environment ---
    # device="cuda",          # Specify 'cuda' to use GPU (requires PyTorch)
)

# 3. Fit the Model
# Uses the same fit(X, y) interface as scikit-learn
model.fit(X_df, y_series)

# 4. Check the Results
# Access fitted attributes (ending with an underscore)
print(f"Discovered Equation: {model.equation_}")
print(f"Training RMSE: {model.rmse_:.4f}")
print(f"Training R2 Score: {model.r2_:.4f}")

# 5. Make Predictions
# Uses the same predict(X) interface as scikit-learn
X_test_df = pd.DataFrame(np.array([, ]), columns=["feature_A", "feature_B"])
predictions = model.predict(X_test_df)
print(f"\nPredictions for new data: {predictions}")

Example Output:

Using NumPy/SciPy backend for CPU execution.
*** Starting Level-wise Recipe Generation (Level-wise SIS: ON, k_per_level=50) ***
... (training logs) ...
Best Model Found (2 terms):
  RMSE: 0.092124
  R2:   0.998806
  Equation: +0.998492 * ^2(feature_B) +1.971237 * sin(feature_A) +0.030610

Discovered Equation: +0.998492 * ^2(feature_B) +1.971237 * sin(feature_A) +0.030610
Training RMSE: 0.0921
Training R2 Score: 0.9988

Predictions for new data: [2.0016012 5.6796584]

🛠️ Usage Guide

`use_levelwise_sis`: Toggling the Feature Generation Strategy

This parameter toggles the "Level-wise SIS" feature, which is key to the high performance of mini-sisso.

`True` (Default)

Performs feature expansion level by level, with a screening (SIS) step immediately after each level. Only promising features are used to generate the next level, significantly reducing computation time and memory usage. This is the recommended setting.

# k_per_level controls how many features are kept at each level
model_fast = MiniSisso(use_levelwise_sis=True, k_per_level=100)

`False`

Generates all possible features (recipes) for all expansion levels at once before proceeding to the final SIS/SO step.

Pros: Explores a wider feature space, potentially finding unexpected feature combinations.
Cons: Memory usage and computation time increase exponentially. There is a high risk of MemoryError for larger n_expansion or a greater number of base features.

# It is recommended to set n_expansion to a small value
model_full_search = MiniSisso(use_levelwise_sis=False, n_expansion=2)

`so_method`: Selecting the Model Search Strategy

`exhaustive` (Default)

An exhaustive search that tests every possible combination of candidate features. It's more likely to find the optimal, interpretable model but can be slow. The number of terms is specified with n_term.

# Exhaustively search for models up to 3 terms
model_exhaustive = MiniSisso(
    so_method="exhaustive", 
    n_term=3,
    operators=["+", "-", "*", "sqrt"]
)

`lasso`

Uses Lasso regression to quickly select important features from a large pool of candidates. It's extremely fast and effective for large search spaces. The regularization strength is controlled by alpha.

# Use Lasso to select features quickly
# A smaller alpha tends to select more features
model_lasso = MiniSisso(
    so_method="lasso",
    alpha=0.01,
    operators=["+", "-", "*", "/", "sin", "cos", "exp", "log", "pow2", "pow3"]
)

Available Operators

Specify as a list of strings in the operators argument.

Operator	Description
`'+'`	Addition (a + b)
`'-'`	Subtraction (a - b)
`'*'`	Multiplication (a * b)
`'/'`	Division (a / b)
`'sin'`	Sine (sin(a))
`'cos'`	Cosine (cos(a))
`'exp'`	Exponential (e^a)
`'log'`	Natural Log (ln(a))
`'sqrt'`	Square Root (sqrt(
`'pow2'`	Square (a^2)
`'pow3'`	Cube (a^3)
`'inv'`	Inverse (1/a)

🤝 `scikit-learn` Ecosystem Integration

As a fully compliant scikit-learn estimator, mini-sisso works seamlessly with the entire ecosystem.

`Pipeline` for Preprocessing

from sklearn.pipeline import Pipeline
from mini_sisso.model import MiniSisso

# Note: MiniSisso can be sensitive to feature scaling. 
# Preprocessing like StandardScaler is often not recommended.
pipeline = Pipeline([
    # ('scaler', StandardScaler()),
    ('sisso', MiniSisso(n_expansion=2, n_term=2, operators=["+", "sin", "pow2"]))
])

pipeline.fit(X_df, y_series)
predictions = pipeline.predict(X_df)

`GridSearchCV` for Hyperparameter Tuning

Automatically find the best hyperparameters like n_term, k (number of SIS candidates), or alpha.

from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_term':,
    'k':,
}

grid_search = GridSearchCV(
    MiniSisso(operators=["+", "sin", "pow2"]),
    param_grid, cv=3, scoring='neg_root_mean_squared_error', n_jobs=-1
)
grid_search.fit(X_df, y_series)

print(f"Best Hyperparameters: {grid_search.best_params_}")
print(f"Best Model Equation: {grid_search.best_estimator_.equation_}")

⚙️ API Reference

`MiniSisso`

class MiniSisso(BaseEstimator, RegressorMixin):
    def __init__(self, n_expansion: int = 2, n_term: int = 2, k: int = 10, 
                 k_per_level: int = 50, use_levelwise_sis: bool = True,
                 operators: list = None, so_method: str = "exhaustive", alpha: float = 0.01,
                 device: str = "cpu"):

Parameters

n_expansion (int, default=2): The maximum level of feature expansion.
n_term (int, default=2): The maximum number of terms in the final model (for exhaustive search).
k (int, default=10): The number of promising features to select in each iteration of the SIS step.
k_per_level (int, default=50): If use_levelwise_sis=True, this is the number of promising recipes to carry over to the next expansion level.
use_levelwise_sis (bool, default=True): Toggles the level-wise SIS feature.
device (str, default="cpu"): The computing device to use ('cpu' or 'cuda').
operators (list[str], required): A list of operators to use for feature expansion.
so_method (str, default="exhaustive"): The model search strategy. Can be "exhaustive" or "lasso".
alpha (float, default=0.01): The regularization parameter used when so_method="lasso".

`fit(X, y)`

Fits the model to the training data.

Parameters

X (array-like or pd.DataFrame): The feature data, shape (n_samples, n_features).
y (array-like or pd.Series): The target variable data, shape (n_samples,).

Returns

self: The fitted MiniSisso instance.

`predict(X)`

Makes predictions using the fitted model.

Parameters

X (array-like or pd.DataFrame): The data to make predictions on.

Returns

np.ndarray: A NumPy array of the predictions.

`score(X, y)`

Returns the coefficient of determination (R² score) of the prediction.

Parameters

X (array-like or pd.DataFrame): The feature data.
y (array-like or pd.Series): The true target variable data.

Returns

float: The R² score.

Fitted Attributes

After calling fit(), you can access the following attributes:

model.equation_ (str): The best mathematical model found.
model.rmse_ (float): The RMSE of the best model on the training data.
model.r2_ (float): The R2 score of the best model on the training data.
model.coef_ (np.ndarray): The coefficients for each term in the best model.
model.intercept_ (float): The intercept of the best model.

📜 License

This project is licensed under the MIT License.

🙏 Acknowledgements

This library was greatly inspired by the original SISSO algorithm paper and is built upon the fantastic open-source projects NumPy, SciPy, Pandas, scikit-learn, and PyTorch.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.6.0

Feb 6, 2026

1.4.1

Dec 10, 2025

1.4.0

Dec 10, 2025

1.3.8

Dec 9, 2025

1.3.7

Dec 9, 2025

1.3.6

Nov 28, 2025

1.3.5

Nov 28, 2025

1.3.4

Nov 28, 2025

1.3.3

Nov 28, 2025

1.3.2

Nov 28, 2025

1.3.1

Nov 28, 2025

1.3.0

Nov 28, 2025

1.2.0

Nov 7, 2025

This version

1.1.0

Nov 5, 2025

1.0.0

Nov 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mini_sisso-1.1.0.tar.gz (105.9 kB view details)

Uploaded Nov 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mini_sisso-1.1.0-py3-none-any.whl (17.8 kB view details)

Uploaded Nov 5, 2025 Python 3

File details

Details for the file mini_sisso-1.1.0.tar.gz.

File metadata

Download URL: mini_sisso-1.1.0.tar.gz
Upload date: Nov 5, 2025
Size: 105.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for mini_sisso-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6563a616f75301f6d93e60f93d8df3574cdd1678babd26482875db84cd70b0cc`
MD5	`a4cf9bb67a2064fe6f93a4e47f0a182d`
BLAKE2b-256	`330ded24ab85547a879296f9e8d81ebd7c66efd276a16825ef372fb813350661`

See more details on using hashes here.

File details

Details for the file mini_sisso-1.1.0-py3-none-any.whl.

File metadata

Download URL: mini_sisso-1.1.0-py3-none-any.whl
Upload date: Nov 5, 2025
Size: 17.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for mini_sisso-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`63b513bf0d99e83386e854277e350f2f4071950132f74f889b3d3e359be1060b`
MD5	`b619e26bf64845e9bfe3bb3ce668df17`
BLAKE2b-256	`f1c7bcef9f2d9a4d755ac3782c9a0a7aed9758fa6e793e661fc1e076731cd137`

See more details on using hashes here.

mini-sisso 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mini-sisso

mini-sisso

📥 Installation

CPU Version (Default, Recommended)

GPU Version (Optional)

🚀 Quick Start

🛠️ Usage Guide

use_levelwise_sis: Toggling the Feature Generation Strategy

True (Default)

False

so_method: Selecting the Model Search Strategy

exhaustive (Default)

lasso

Available Operators

🤝 scikit-learn Ecosystem Integration

Pipeline for Preprocessing

GridSearchCV for Hyperparameter Tuning

⚙️ API Reference

MiniSisso

Parameters

fit(X, y)

Parameters

Returns

predict(X)

Parameters

Returns

score(X, y)

Parameters

Returns

Fitted Attributes

📜 License

🙏 Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`use_levelwise_sis`: Toggling the Feature Generation Strategy

`True` (Default)

`False`

`so_method`: Selecting the Model Search Strategy

`exhaustive` (Default)

`lasso`

🤝 `scikit-learn` Ecosystem Integration

`Pipeline` for Preprocessing

`GridSearchCV` for Hyperparameter Tuning

`MiniSisso`

`fit(X, y)`

`predict(X)`

`score(X, y)`