Skip to main content

OIKAN: Neuro-Symbolic ML for Scientific Discovery

Project description

OIKAN Logo

OIKAN: Neuro-Symbolic ML for Scientific Discovery

Overview

OIKAN is a neuro-symbolic machine learning framework inspired by Kolmogorov-Arnold representation theorem. It combines the power of modern neural networks with techniques for extracting clear, interpretable symbolic formulas from data. OIKAN is designed to make machine learning models both accurate and Interpretable.

PyPI version PyPI Downloads per month PyPI Total Downloads License GitHub issues Docs

Important Disclaimer: OIKAN is an experimental research project. It is not intended for production use or real-world applications. This framework is designed for research purposes, experimentation, and academic exploration of neuro-symbolic machine learning concepts.

Key Features

  • 🧠 Neuro-Symbolic ML: Combines neural network learning with symbolic mathematics
  • 📊 Automatic Formula Extraction: Generates human-readable mathematical expressions
  • 🎯 Scikit-learn Compatible: Familiar .fit() and .predict() interface
  • 🔬 Research-Focused: Designed for academic exploration and experimentation
  • 📈 Multi-Task: Supports both regression and classification problems

Scientific Foundation

OIKAN implements a modern interpretation of the Kolmogorov-Arnold Representation Theorem through a hybrid neural architecture:

  1. Theoretical Foundation: The Kolmogorov-Arnold theorem states that any continuous n-dimensional function can be decomposed into a combination of single-variable functions:

    f(x₁,...,xₙ) = ∑(j=0 to 2n){ φⱼ( ∑(i=1 to n) ψᵢⱼ(xᵢ) ) }
    

    where φⱼ and ψᵢⱼ are continuous univariate functions.

  2. Neural Implementation: OIKAN uses a specialized architecture combining:

    • Feature transformation layers with interpretable basis functions
    • Symbolic regression for formula extraction
    • Automatic pruning of insignificant terms
    class OIKANRegressor:
        def __init__(self, hidden_sizes=[64, 64], activation='relu',
                     polynomial_degree=2, alpha=0.1):
            # Neural network for learning complex patterns
            self.neural_net = TabularNet(input_size, hidden_sizes, activation)
            # Symbolic regression for interpretable formulas
            self.symbolic_model = None
    
  3. Basis Functions: Core set of interpretable transformations:

    SYMBOLIC_FUNCTIONS = {
        'linear': 'x',           # Direct relationships
        'quadratic': 'x^2',      # Non-linear patterns
        'interaction': 'x_i x_j', # Feature interactions
        'higher_order': 'x^n'    # Polynomial terms
    }
    
  4. Formula Extraction Process:

    • Train neural network on raw data
    • Generate augmented samples for better coverage
    • Perform L1-regularized symbolic regression
    • Prune terms with coefficients below threshold
    • Export human-readable mathematical expressions

Quick Start

Installation

Method 1: Via PyPI (Recommended)

pip install -qU oikan

Method 2: Local Development

git clone https://github.com/silvermete0r/OIKAN.git
cd OIKAN
pip install -e .  # Install in development mode

Regression Example

from oikan.model import OIKANRegressor
from sklearn.metrics import mean_squared_error

# Initialize model
model = OIKANRegressor(
    hidden_sizes=[32, 32], # Hidden layer sizes
    activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
    augmentation_factor=5, # Augmentation factor for data generation
    polynomial_degree=2, # Degree of polynomial basis functions
    alpha=0.1, # L1 regularization strength
    sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
    epochs=100, # Number of training epochs
    lr=0.001, # Learning rate
    batch_size=32, # Batch size for training
    verbose=True # Verbose output during training
)

# Fit the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate performance
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

# Get symbolic formula
formula = model.get_formula()
print("Symbolic Formula:", formula)

# Get feature importances
importances = model.feature_importances()
print("Feature Importances:", importances)

# Save the model (optional)
model.save("outputs/model.json")

# Load the model (optional)
loaded_model = OIKANRegressor()
loaded_model.load("outputs/model.json")

Example of the saved symbolic formula (regression model): outputs/california_housing_model.json

Classification Example

from oikan.model import OIKANClassifier
from sklearn.metrics import accuracy_score

# Initialize model
model = OIKANClassifier(
    hidden_sizes=[32, 32], # Hidden layer sizes
    activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
    augmentation_factor=10, # Augmentation factor for data generation
    polynomial_degree=2, # Degree of polynomial basis functions
    alpha=0.1, # L1 regularization strength
    sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
    epochs=100, # # Number of training epochs
    lr=0.001, # Learning rate
    batch_size=32, # Batch size for training
    verbose=True # Verbose output during training
)

# Fit the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate performance
accuracy = model.score(X_test, y_test)
print("Accuracy:", accuracy)

# Get symbolic formulas for each class
formulas = model.get_formula()
for i, formula in enumerate(formulas):
    print(f"Class {i} Formula:", formula)
   
# Get feature importances
importances = model.feature_importances()
print("Feature Importances:", importances)

# Save the model (optional)
model.save("outputs/model.json")

# Load the model (optional)
loaded_model = OIKANClassifier()
loaded_model.load("outputs/model.json")

Example of the saved symbolic formula (classification model): outputs/iris_model.json

Architecture Diagram

OIKAN v0.0.3(1) Architecture

Contributing

We welcome contributions! Key areas of interest:

  • Model architecture improvements
  • Novel basis function implementations
  • Improved symbolic extraction algorithms
  • Real-world case studies and applications
  • Performance optimizations

Please see CONTRIBUTING.md for guidelines.

Citation

If you use OIKAN in your research, please cite:

@software{oikan2025,
  title = {OIKAN: Neuro-Symbolic ML for Scientific Discovery},
  author = {Zhalgasbayev, Arman},
  year = {2025},
  url = {https://github.com/silvermete0r/OIKAN}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oikan-0.0.3.2.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oikan-0.0.3.2-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file oikan-0.0.3.2.tar.gz.

File metadata

  • Download URL: oikan-0.0.3.2.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for oikan-0.0.3.2.tar.gz
Algorithm Hash digest
SHA256 0c8c54c63ad3e6fd19eb21f97c88398961bbb8fad0f8c060cbae128f0300611d
MD5 cfdad2178e25199f8df21cdefc8ff696
BLAKE2b-256 969540e2f4445b55dc8b14bf3dc450d85e66b96fe03f07e9e553c280fdc84e6b

See more details on using hashes here.

File details

Details for the file oikan-0.0.3.2-py3-none-any.whl.

File metadata

  • Download URL: oikan-0.0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for oikan-0.0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8015285729d0cb84d3546bae89814ecffdcb98f7dd2118ae46baec6df15e83d3
MD5 63122b5cb30a67b75a34bc35f8a5dee7
BLAKE2b-256 acc4fc43682621aa4bfae0bed015670da1d037040698415072f91f7b910b0167

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page