High-performance SISSO implementation with Rust backend.

These details have not been verified by PyPI

Project description

mini-sisso

mini-sisso is a lightweight and user-friendly Python implementation of the SISSO (Sure Independence Screening and Sparsifying Operator) symbolic regression algorithm. It offers full compatibility with the scikit-learn ecosystem for discovering interpretable mathematical models from data.

Inheriting the advanced exploration capabilities of the original C++/Fortran-based implementation, mini-sisso provides these features in a more modern and accessible package, powered by a blazing-fast Rust backend:

🚀 Easy Adoption: Simple pip install. The default CPU version has minimal dependencies (NumPy/SciPy), ensuring a hassle-free setup.
🦀 High-Performance Rust Backend:
- The computationally intensive exhaustive search is implemented in Rust, delivering performance far superior to pure Python implementations while remaining completely transparent to the user.
- Gram Matrix Pre-calculation: Mathematically optimized the Ordinary Least Squares (OLS) calculation to eliminate dependence on the sample size $N$ within the search loop ($O(N k^2) \to O(k^3)$). This achieves massive speedups without any loss of accuracy.
🧠 Memory Efficiency & Fast Exploration:
- A "recipe-based" architecture dramatically reduces memory consumption during Feature Expansion.
- The "Level-wise SIS" feature (toggleable) speeds up exploration by pruning unpromising features early.
⚖️ Balanced Feature Selection:
- Implements a "Split Selection" strategy during SIS. This ensures that unary operators (like sin, exp) are not "crowded out" by the overwhelming number of binary combinations, preserving feature diversity.
🤝 Full scikit-learn Compatibility: Seamlessly integrates with powerful tools like GridSearchCV and Pipeline, in addition to the standard fit()/predict() interface.
⚡ Optional GPU Support: Achieve significant speedups with GPU acceleration by installing the optional PyTorch backend.

📥 Installation

CPU Version (Default, Recommended)

Installs the lightweight CPU version from PyPI. It includes the optimized Rust backend.

pip install mini-sisso

GPU Version (Optional)

To enable GPU acceleration with the PyTorch backend, install with the [gpu] option.

pip install "mini-sisso[gpu]"

🚀 Quick Start

Discover a mathematical model from your data in just a few lines of code.

import pandas as pd
import numpy as np
from mini_sisso.model import MiniSisso

# 1. Prepare data
np.random.seed(42) # Set seed for reproducibility
X_df = pd.DataFrame(np.random.rand(100, 2) * [2, 3], columns=["feature_A", "feature_B"])
# True equation: y = 2*sin(feature_A) + feature_B^2 + noise
y_series = pd.Series(2 * np.sin(X_df["feature_A"]) + X_df["feature_B"]**2 + np.random.randn(100) * 0.1)

# 2. Instantiate the Model (Full Hyperparameter List)
# Uncomment the parameters you need to change.
model = MiniSisso(
    # --- Control the fundamental search space ---
    n_expansion=2,                      # Depth of feature expansion (deeper finds more complex equations)
    operators=["+", "sin", "pow2"],     # List of operators for feature expansion
    
    # --- Select the main search strategy ---
    so_method="exhaustive",             # Model search strategy ('exhaustive', 'lasso', 'lightgbm')
    
    # --- Detailed settings for each strategy (selection_params) ---
    selection_params={
        # -- Parameters for "exhaustive" method --
        'n_term': 2,                    # Maximum number of terms in the discovered equation
        'n_sis_features': 10,           # Number of SIS candidates for each term
        
        # -- Parameters for "lasso" method --
        # 'alpha': 0.01,                # Regularization strength for Lasso
        
        # -- Parameters for "lightgbm" method --
        # 'n_features_to_select': 20,   # Number of features to select with LightGBM
        # 'lightgbm_params': {'n_estimators': 100, 'random_state': 42}, # Parameters for the LightGBM model itself
        
        # -- Optional preprocessing filters for "lasso"/"lightgbm" --
        # 'n_global_sis_features': 200, # Number of candidates to pre-screen based on correlation with target
        # 'collinearity_filter': 'mi',  # Method to calculate correlation between candidates ('mi' or 'dcor')
        # 'collinearity_threshold': 0.9, # Correlation threshold for the above filter
    },
    
    # --- Control computational efficiency ---
    use_levelwise_sis=True,             # Use staged search for speed (strongly recommended)
    n_level_sis_features=50,            # Number of promising features to keep at each expansion level

    # --- Feature Selection Strategy ---
    use_split_selection=True,           # Balance selection between unary/binary operators (Recommended: True)
    
    # --- Select the execution environment ---
    # device="cuda",                      # Specify 'cuda' to use GPU
)

# 3. Fit the model
model.fit(X_df, y_series)

# 4. Check the results
print("\n--- Fit Results ---")
print(f"Discovered Equation: {model.equation_}")
print(f"Training RMSE: {model.rmse_:.4f}")
print(f"Training R2 Score: {model.r2_:.4f}")

# 5. Make predictions
print("\n--- Prediction ---")
X_test_df = pd.DataFrame(np.array([[0.5, 1.0], [1.0, 2.0]]), columns=["feature_A", "feature_B"])
predictions = model.predict(X_test_df)
print(f"Predictions for new data ([0.5, 1.0], [1.0, 2.0]): {predictions}")

Example Output:

Using NumPy/SciPy backend for CPU execution.
*** Starting Level-wise Recipe Generation (Level-wise SIS: ON, k_per_level=50) ***
Level 1: Generated 5, selected top 5. Total promising: 7. Time: 0.00s
Level 2: Generated 30, selected top 30. Total promising: 37. Time: 0.00s
***************** Starting SISSO Regressor (NumPy/SciPy Backend, Method: exhaustive) *****************

===== Searching for 1-term models =====
...
===== Searching for 2-term models =====
...
Best 2-term model: RMSE=0.092124, Eq: +0.998492 * ^2(feature_B) +1.971237 * sin(feature_A) +0.030610
Time: 0.01 seconds

==================================================
SISSO fitting finished. Total time: 0.02s
==================================================

Best Model Found (2 terms):
  RMSE: 0.092124
  R2:   0.998806
  Equation: +0.998492 * ^2(feature_B) +1.971237 * sin(feature_A) +0.030610

--- Fit Results ---
Discovered Equation: +0.998492 * ^2(feature_B) +1.971237 * sin(feature_A) +0.030610
Training RMSE: 0.0921
Training R2 Score: 0.9988

--- Predictions ---
Predictions for new data ([0.5, 1.0], [1.0, 2.0]): [2.0016012 5.6796584]

🛠️ Usage Guide: Controlling the Search with Hyperparameters

The mini-sisso search process follows this workflow, with each step controlled by hyperparameters.

Workflow Overview

Feature Expansion: Generates a large number of candidate features based on operators and n_expansion.
- This process is made efficient by use_levelwise_sis=True and n_level_sis_features.
[Optional] Preprocessing Filters: A set of filters to prune candidate features when using lasso or lightgbm. (Configured in selection_params).
- Global SIS: Removes features with low correlation to the target y.
- Collinearity Filter: Removes highly correlated features from each other.
Model Search (Sparsifying Operator): The final model is discovered from the pruned candidates using the strategy specified by so_method.

Main Hyperparameters

`so_method`: The Three Model Search Strategies

The so_method parameter determines the core search approach.

1. `so_method="exhaustive"` (Default)

The classic SISSO approach. It uses iterative SIS and an exhaustive search powered by Rust to find the optimal model. Best for finding simple, interpretable models.

# Exhaustively search for models up to 3 terms
model = MiniSisso(
    so_method="exhaustive",
    selection_params={
        'n_term': 3,          # Max number of terms to search for
        'n_sis_features': 15  # Number of candidates to add to the pool at each SIS step
    }
)

2. `so_method="lasso"`

Uses Lasso regression as a feature selector to build a model quickly. Effective for large feature spaces.

# Select features using Lasso
model = MiniSisso(
    so_method="lasso",
    selection_params={
        'alpha': 0.01 # Regularization parameter for Lasso
    }
)

3. `so_method="lightgbm"`

Uses LightGBM as a feature selector. Excels at capturing non-linear relationships.

# Select top 20 features using LightGBM
model = MiniSisso(
    so_method="lightgbm",
    selection_params={
        'n_features_to_select': 20
    }
)

`selection_params`: Detailed Control for Each Strategy

The selection_params dictionary allows you to apply preprocessing filters and fine-tune each so_method.

Preprocessing Filters (for `lasso`/`lightgbm`)

n_global_sis_features: Pre-screens candidates by removing those with low correlation to the target y.
collinearity_filter: Removes highly correlated features to stabilize Lasso/LightGBM. Can be 'mi' (Mutual Information) or 'dcor' (Distance Correlation).

# Before running LightGBM, pre-screen to the top 200 features,
# then remove pairs with an MI score > 0.9
model = MiniSisso(
    so_method='lightgbm',
    selection_params={
        'n_global_sis_features': 200,
        'collinearity_filter': 'mi',
        'collinearity_threshold': 0.9,
        'n_features_to_select': 20
    }
)

Expert Settings (for `lightgbm`)

You can also directly specify the internal hyperparameters for lightgbm.

model = MiniSisso(
    so_method='lightgbm',
    selection_params={
        'n_features_to_select': 20,
        'lightgbm_params': {
            'n_estimators': 200,         # Number of trees
            'num_leaves': 31,            # Max number of leaves in one tree
            'learning_rate': 0.05,       # Learning rate
            'colsample_bytree': 0.8,     # Fraction of features to be considered for each tree
            'subsample': 0.8,            # Fraction of data to be used for each tree
            'reg_alpha': 0.1,            # L1 regularization
            'reg_lambda': 0.1,           # L2 regularization
            'random_state': 42,
            'n_jobs': -1,
            'verbosity': -1,
        }
    }
)

Other Key Parameters

use_levelwise_sis (bool, default=True): Strongly recommended. Speeds up feature generation and saves memory.
n_level_sis_features (int, default=50): Number of features to keep at each stage when use_levelwise_sis=True.
use_split_selection (bool, default=True): If True, ensures a balanced selection of unary and binary operators during SIS to prevent "crowding out." Applicable to all methods (exhaustive, lasso, lightgbm).
device (str, default="cpu"): Set to "cuda" to use the GPU backend.

Available Operators

Specify the operators argument as a list of strings.

Operator	Description
`'+'`	Addition (a + b)
`'-'`	Subtraction (a - b)
`'*'`	Multiplication (a * b)
`'/'`	Division (a / b)
`'sin'`	Sine (sin(a))
`'cos'`	Cosine (cos(a))
`'exp'`	Exponential (e^a)
`'log'`	Natural logarithm (ln(a))
`'sqrt'`	Square root (sqrt(
`'pow2'`	Square (a^2)
`'pow3'`	Cube (a^3)
`'inv'`	Reciprocal (1/a)
`'\|-\|'`	Absolute difference (\|a - b\|)
`'cbrt'`	Cube root (a^(1/3))
`'abs'`	Absolute value (\|a\|)
`'scd'`	Standard Cauchy Distribution (1 / (π * (1 + a^2)))

🤝 `scikit-learn` Ecosystem Integration

mini-sisso inherits BaseEstimator and RegressorMixin from scikit-learn, allowing it to seamlessly integrate with the powerful tools provided by scikit-learn.

More detailed usage of `Pipeline`

Pipeline is a tool for connecting multiple processing steps and treating them as a single estimator.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from mini_sisso.model import MiniSisso

# Pipeline definition
# Note: MiniSisso is sensitive to the scale of input features, so preprocessing such as StandardScaler
# may impair the interpretability of the discovered formula. It is generally not recommended.
# Here is an example to demonstrate how Pipeline technically works.
pipeline = Pipeline([
# Step 1: Run standardization using the name 'scaler'
('scaler', StandardScaler()), # Usually unnecessary/not recommended for MiniSisso
# Step 2: Run MiniSisso using the name 'sisso'
('sisso', MiniSisso(n_expansion=2, selection_params={'n_term': 2}, operators=["+", "sin", "pow2"]))
])

# Train the entire pipeline: X -> scaler.fit_transform -> sisso.fit
pipeline.fit(X_df, y_series)

# Predict using the pipeline: X -> scaler.transform -> sisso.predict
predictions = pipeline.predict(X_df)

# You can also access and change parameters for each step of the pipeline.
# Example: Changing the number of SISSO terms after training
# pipeline.set_params(sisso__selection_params={'n_term': 3})
print(f"Number of terms in the SISSO step of the pipeline: {pipeline.named_steps['sisso'].selection_params['n_term']}")

Advanced `GridSearchCV` Usage

GridSearchCV can automatically find the best combination of hyperparameters, including the so_method itself. The __ (double underscore) syntax allows you to search nested parameters within selection_params.

from sklearn.model_selection import GridSearchCV

# Define a list of parameter grids to search over
param_grid = [
    # Case 1: Search patterns for exhaustive method
    {
        'so_method': ['exhaustive'],
        'selection_params': [
            {'n_term': 2, 'n_sis_features': 10},
            {'n_term': 3, 'n_sis_features': 15}
        ]
    },
    # Case 2: Search patterns for lasso method
    {
        'so_method': ['lasso'],
        'selection_params': [
            {'alpha': 0.01, 'collinearity_filter': 'mi'},
            {'alpha': 0.005}
        ]
    },
    # Case 3: Search patterns for lightgbm method
    {
        'so_method': ['lightgbm'],
        'selection_params__n_features_to_select':,
        'selection_params__lightgbm_params__n_estimators':,
    }
]

grid_search = GridSearchCV(
    MiniSisso(n_expansion=2, operators=['+', 'sin', 'pow2']),
    param_grid, cv=3, scoring='neg_root_mean_squared_error', n_jobs=-1, verbose=1
)

print("Starting GridSearchCV to find the best method and parameters...")
grid_search.fit(X_df, y_series)

print(f"\nBest search method and params: {grid_search.best_params_}")
print(f"Equation from the best model: {grid_search.best_estimator_.equation_}")

⚙️ API Reference

`MiniSisso`

class MiniSisso(BaseEstimator, RegressorMixin):
    def __init__(self, n_expansion: int = 2, operators: list = None,
                 so_method: str = "exhaustive", selection_params: dict = None,
                 use_levelwise_sis: bool = True, n_level_sis_features: int = 50,
                 use_split_selection: bool = True,
                 device: str = "cpu"):

`MiniSisso`

n_expansion (int, default=2): Max level of feature expansion.
operators (list[str], required): List of operators for feature generation.
so_method (str, default="exhaustive"): Model search strategy ("exhaustive", "lasso", "lightgbm").
selection_params (dict, optional): Dictionary of detailed parameters for the selected so_method and preprocessing filters.
use_levelwise_sis (bool, default=True): Toggles the level-wise SIS feature.
n_level_sis_features (int, default=50): Number of features to keep at each level if use_levelwise_sis=True.
use_split_selection (bool, default=True): Toggles the "Split Selection" strategy for balanced operator selection.
device (str, default="cpu"): Computation device ("cpu" or "cuda").

`fit(X, y)`

Fits the model to the training data.

Parameters

X (array-like or pd.DataFrame): The feature data, shape (n_samples, n_features).
y (array-like or pd.Series): The target variable data, shape (n_samples,).

Returns

self: The fitted MiniSisso instance.

`predict(X)`

Makes predictions using the fitted model.

Parameters

X (array-like or pd.DataFrame): The data to make predictions on.

Returns

np.ndarray: A NumPy array of the predictions.

`score(X, y)`

Returns the coefficient of determination (R² score) of the prediction.

Parameters

X (array-like or pd.DataFrame): The feature data.
y (array-like or pd.Series): The true target variable data.

Returns

float: The R² score.

Fitted Attributes

After calling fit(), you can access the following attributes:

model.equation_ (str): The best mathematical model found.
model.rmse_ (float): The RMSE of the best model on the training data.
model.r2_ (float): The R2 score of the best model on the training data.
model.coef_ (np.ndarray): The coefficients for each term in the best model.
model.intercept_ (float): The intercept of the best model.

📜 License

This project is licensed under the MIT License.

🙏 Acknowledgements

This library was greatly inspired by the original SISSO algorithm paper and is built upon the fantastic open-source projects NumPy, SciPy, Pandas, scikit-learn, and PyTorch.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.6.0

Feb 6, 2026

1.4.1

Dec 10, 2025

1.4.0

Dec 10, 2025

1.3.8

Dec 9, 2025

1.3.7

Dec 9, 2025

1.3.6

Nov 28, 2025

1.3.5

Nov 28, 2025

1.3.4

Nov 28, 2025

1.3.3

Nov 28, 2025

1.3.2

Nov 28, 2025

1.3.1

Nov 28, 2025

1.3.0

Nov 28, 2025

1.2.0

Nov 7, 2025

1.1.0

Nov 5, 2025

1.0.0

Nov 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mini_sisso-1.6.0.tar.gz (37.0 kB view details)

Uploaded Feb 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mini_sisso-1.6.0-cp313-cp313-macosx_11_0_arm64.whl (369.4 kB view details)

Uploaded Feb 6, 2026 CPython 3.13macOS 11.0+ ARM64

File details

Details for the file mini_sisso-1.6.0.tar.gz.

File metadata

Download URL: mini_sisso-1.6.0.tar.gz
Upload date: Feb 6, 2026
Size: 37.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.10.2

File hashes

Hashes for mini_sisso-1.6.0.tar.gz
Algorithm	Hash digest
SHA256	`30bfc4f4c9277bdf3653ac8a0d22ed7908519f5eeb1635fac43c8c424b20dfd6`
MD5	`f36c24a8dafbb296acbda64545e4929e`
BLAKE2b-256	`69fdec7105efb5794d1b395dbdc1fb389519fc9092f992fcf494093f04538d20`

See more details on using hashes here.

File details

Details for the file mini_sisso-1.6.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

Download URL: mini_sisso-1.6.0-cp313-cp313-macosx_11_0_arm64.whl
Upload date: Feb 6, 2026
Size: 369.4 kB
Tags: CPython 3.13, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.10.2

File hashes

Hashes for mini_sisso-1.6.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`0bc4f18eb78a531911ff69abedcbf0d6883dfdf47bc269adadfcaf677903432f`
MD5	`18ba788c17e0dcc30150ecb1219ef475`
BLAKE2b-256	`2a450ea2817be55fa4d541fad29e7b0c922fe94df450f0d68d6deb5a84cfab46`

See more details on using hashes here.

mini-sisso 1.6.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

mini-sisso

📥 Installation

CPU Version (Default, Recommended)

GPU Version (Optional)

🚀 Quick Start

🛠️ Usage Guide: Controlling the Search with Hyperparameters

Workflow Overview

Main Hyperparameters

so_method: The Three Model Search Strategies

1. so_method="exhaustive" (Default)

2. so_method="lasso"

3. so_method="lightgbm"

selection_params: Detailed Control for Each Strategy

Preprocessing Filters (for lasso/lightgbm)

Expert Settings (for lightgbm)

Other Key Parameters

Available Operators

🤝 scikit-learn Ecosystem Integration

More detailed usage of Pipeline

Advanced GridSearchCV Usage

⚙️ API Reference

MiniSisso

MiniSisso

fit(X, y)

Parameters

Returns

predict(X)

Parameters

Returns

score(X, y)

Parameters

Returns

Fitted Attributes

📜 License

🙏 Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`so_method`: The Three Model Search Strategies

1. `so_method="exhaustive"` (Default)

2. `so_method="lasso"`

3. `so_method="lightgbm"`

`selection_params`: Detailed Control for Each Strategy

Preprocessing Filters (for `lasso`/`lightgbm`)

Expert Settings (for `lightgbm`)

🤝 `scikit-learn` Ecosystem Integration

More detailed usage of `Pipeline`

Advanced `GridSearchCV` Usage

`MiniSisso`

`MiniSisso`

`fit(X, y)`

`predict(X)`

`score(X, y)`