OIKAN: Neuro-Symbolic ML for Scientific Discovery

These details have not been verified by PyPI

Project description

OIKAN: Neuro-Symbolic ML for Scientific Discovery

Overview

OIKAN is a neuro-symbolic machine learning framework inspired by Kolmogorov-Arnold representation theorem. It combines the power of modern neural networks with techniques for extracting clear, interpretable symbolic formulas from data. OIKAN is designed to make machine learning models both accurate and Interpretable.

Important Disclaimer: OIKAN is an experimental research project. It is not intended for production use or real-world applications. This framework is designed for research purposes, experimentation, and academic exploration of neuro-symbolic machine learning concepts.

Key Features

🧠 Neuro-Symbolic ML: Combines neural network learning with symbolic mathematics
📊 Automatic Formula Extraction: Generates human-readable mathematical expressions
🎯 Scikit-learn Compatible: Familiar .fit() and .predict() interface
🔬 Research-Focused: Designed for academic exploration and experimentation
📈 Multi-Task: Supports both regression and classification problems

Scientific Foundation

OIKAN implements a modern interpretation of the Kolmogorov-Arnold Representation Theorem through a hybrid neural architecture:

Theoretical Foundation: The Kolmogorov-Arnold theorem states that any continuous n-dimensional function can be decomposed into a combination of single-variable functions:
```
f(x₁,...,xₙ) = ∑(j=0 to 2n){ φⱼ( ∑(i=1 to n) ψᵢⱼ(xᵢ) ) }
```
where φⱼ and ψᵢⱼ are continuous univariate functions.

Neural Implementation: OIKAN uses a specialized architecture combining:

Feature transformation layers with interpretable basis functions
Symbolic regression for formula extraction (ElasticNet-based)
Automatic pruning of insignificant terms

 class OIKAN:
     def __init__(self, hidden_sizes=[64, 64], activation='relu',
                 polynomial_degree=2, alpha=0.1):
         # Neural network for learning complex patterns
         self.neural_net = TabularNet(input_size, hidden_sizes, activation)
         # Data augmentation for better coverage
         self.augmented_data = self.augment_data(X, y, augmentation_factor=5)
         # Symbolic regression for interpretable formulas
         self.symbolic_regression = SymbolicRegression(alpha=alpha)

Basis Functions: Core set of interpretable transformations:

SYMBOLIC_FUNCTIONS = {
    'linear': 'x',           # Direct relationships
    'quadratic': 'x^2',      # Non-linear patterns
    'cubic': 'x^3',         # Higher-order relationships
    'interaction': 'x_i x_j', # Feature interactions
    'higher_order': 'x^n',    # Polynomial terms
    'trigonometric': 'sin(x)', # Trigonometric functions
    'exponential': 'exp(x)',  # Exponential growth
    'logarithmic': 'log(x)'  # Logarithmic relationships
}

Formula Extraction Process:
- Train neural network on raw data
- Generate augmented samples for better coverage
- Perform L1-regularized symbolic regression (alpha)
- Prune terms with coefficients below threshold
- Export human-readable mathematical expressions

Quick Start

Installation

Method 1: Via PyPI (Recommended)

pip install -qU oikan

Method 2: Local Development

git clone https://github.com/silvermete0r/OIKAN.git
cd OIKAN
pip install -e .  # Install in development mode

System Requirements

Requirement	Details
Python	Version 3.7 or higher
Operating System	Platform independent (Windows/macOS/Linux)
Memory	Recommended minimum 4GB RAM
Disk Space	~100MB for installation (including dependencies)
GPU	Optional (for faster training)
Dependencies	torch, numpy, scikit-learn, sympy, tqdm

Regression Example

from oikan import OIKANRegressor
from sklearn.metrics import mean_squared_error

# Initialize model
model = OIKANRegressor(
    hidden_sizes=[32, 32], # Hidden layer sizes
    activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
    augmentation_factor=5, # Augmentation factor for data generation
    alpha=0.1, # L1 regularization strength (Symbolic regression)
    sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
    top_k=5, # Number of top features to select (Symbolic regression)
    epochs=100, # Number of training epochs
    lr=0.001, # Learning rate
    batch_size=32, # Batch size for training
    verbose=True, # Verbose output during training
    evaluate_nn=True, # Validate neural network performance before full process
    random_state=42 # Random seed for reproducibility
)

# Fit the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate performance
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

# Get symbolic formula
formula = model.get_formula() # default: type='original' -> returns all formula without pruning | other options: 'sympy' -> simplified formula using sympy; 'latex' -> LaTeX format
print("Symbolic Formula:", formula)

# Get feature importances
importances = model.feature_importances()
print("Feature Importances:", importances)

# Save the model (optional)
model.save("outputs/model.json")

# Load the model (optional)
loaded_model = OIKANRegressor()
loaded_model.load("outputs/model.json")

Example of the saved symbolic formula (regression model): outputs/california_housing_model.json

Classification Example

from oikan import OIKANClassifier
from sklearn.metrics import accuracy_score

# Initialize model
model = OIKANClassifier(
    hidden_sizes=[32, 32], # Hidden layer sizes
    activation='relu', # Activation function (other options: 'tanh', 'leaky_relu', 'elu', 'swish', 'gelu')
    augmentation_factor=10, # Augmentation factor for data generation
    alpha=0.1, # L1 regularization strength (Symbolic regression)
    sigma=0.1, # Standard deviation of Gaussian noise for data augmentation
    top_k=5, # Number of top features to select (Symbolic regression)
    epochs=100, # # Number of training epochs
    lr=0.001, # Learning rate
    batch_size=32, # Batch size for training
    verbose=True, # Verbose output during training
    evaluate_nn=True, # Validate neural network performance before full process
    random_state=42 # Random seed for reproducibility
)

# Fit the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate performance
accuracy = model.score(X_test, y_test)
print("Accuracy:", accuracy)

# Get symbolic formulas for each class
formulas = model.get_formula() # default: type='original' -> returns all formula without pruning | other options: 'sympy' -> simplified formula using sympy; 'latex' -> LaTeX format
for i, formula in enumerate(formulas):
    print(f"Class {i} Formula:", formula)
   
# Get feature importances
importances = model.feature_importances()
print("Feature Importances:", importances)

# Save the model (optional)
model.save("outputs/model.json")

# Load the model (optional)
loaded_model = OIKANClassifier()
loaded_model.load("outputs/model.json")

Example of the saved symbolic formula (classification model): outputs/iris_model.json

Architecture Diagram

High-Level Architecture:

OIKAN v0.0.3 High-Level Architecture

UML Diagram:

OIKAN v0.0.3(2) Architecture

OIKAN Symbolic Model Compilers

OIKAN provides a set of symbolic model compilers to convert the symbolic formulas generated by the OIKAN model into different programming languages.

Currently, we support: Python, C++, C, JavaScript, Rust, and Go. This allows users to easily integrate the generated formulas into their applications or systems.

All compilers: model_compilers/

Example of Python Compiler

Regression Model:

import numpy as np
import json

def predict(X, symbolic_model):
    X = np.asarray(X)
    X_transformed = evaluate_basis_functions(X, symbolic_model['basis_functions'], 
                                            symbolic_model['n_features'])
    return np.dot(X_transformed, symbolic_model['coefficients'])

if __name__ == "__main__":
    with open('outputs/california_housing_model.json', 'r') as f:
        symbolic_model = json.load(f)
    X = np.random.rand(10, symbolic_model['n_features'])
    y_pred = predict(X, symbolic_model)
    print(y_pred)

Classification Model:

import numpy as np
import json

def predict(X, symbolic_model):
    X = np.asarray(X)
    X_transformed = evaluate_basis_functions(X, symbolic_model['basis_functions'], 
                                            symbolic_model['n_features'])
    logits = np.dot(X_transformed, np.array(symbolic_model['coefficients_list']).T)
    probabilities = np.exp(logits) / np.sum(np.exp(logits), axis=1, keepdims=True)
    return np.argmax(probabilities, axis=1)

if __name__ == "__main__":
    with open('outputs/iris_model.json', 'r') as f:
        symbolic_model = json.load(f)
    X = np.array([[5.1, 3.5, 1.4, 0.2],
                  [7.0, 3.2, 4.7, 1.4],
                  [6.3, 3.3, 6.0, 2.5]])
    y_pred = predict(X, symbolic_model)
    print(y_pred)

Contributing

We welcome contributions! Key areas of interest:

Model architecture improvements
Novel basis function implementations
Improved symbolic extraction algorithms
Real-world case studies and applications
Performance optimizations

Please see CONTRIBUTING.md for guidelines.

Citation

If you use OIKAN in your research, please cite:

@software{oikan2025,
  title = {OIKAN: Neuro-Symbolic ML for Scientific Discovery},
  author = {Zhalgasbayev, Arman},
  year = {2025},
  url = {https://github.com/silvermete0r/OIKAN}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.0.3.12

Oct 13, 2025

0.0.3.11

Sep 6, 2025

0.0.3.10

Aug 31, 2025

0.0.3.9

Jun 19, 2025

This version

0.0.3.8

Jun 8, 2025

0.0.3.7

Jun 5, 2025

0.0.3.6

May 31, 2025

0.0.3.5

May 11, 2025

0.0.3.4

May 10, 2025

0.0.3.3

May 9, 2025

0.0.3.2

May 9, 2025

0.0.3.1

May 7, 2025

0.0.2.5

Apr 13, 2025

0.0.2.4

Apr 12, 2025

0.0.2.3

Apr 10, 2025

0.0.2.2

Apr 10, 2025

0.0.2.1

Apr 10, 2025

0.0.1.11

Mar 10, 2025

0.0.1.10

Feb 25, 2025

0.0.1.9

Feb 25, 2025

0.0.1.8

Feb 21, 2025

0.0.1.7

Feb 18, 2025

0.0.1.6

Feb 17, 2025

0.0.1.5

Feb 17, 2025

0.0.1.4

Feb 17, 2025

0.0.1.3

Feb 17, 2025

0.0.1.2

Feb 12, 2025

0.0.1.1

Feb 12, 2025

0.0.1

Feb 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oikan-0.0.3.8.tar.gz (17.8 kB view details)

Uploaded Jun 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

oikan-0.0.3.8-py3-none-any.whl (15.5 kB view details)

Uploaded Jun 8, 2025 Python 3

File details

Details for the file oikan-0.0.3.8.tar.gz.

File metadata

Download URL: oikan-0.0.3.8.tar.gz
Upload date: Jun 8, 2025
Size: 17.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for oikan-0.0.3.8.tar.gz
Algorithm	Hash digest
SHA256	`8870b1740b4068bb39bb070b353989d779290a23df3812c5038acb6c32c9f240`
MD5	`a1585f003675a49ca14977bfb08d5d8c`
BLAKE2b-256	`258d7a33606782d8271537b894f6e0ebe579986b1b52047ed436e11239da26e6`

See more details on using hashes here.

File details

Details for the file oikan-0.0.3.8-py3-none-any.whl.

File metadata

Download URL: oikan-0.0.3.8-py3-none-any.whl
Upload date: Jun 8, 2025
Size: 15.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for oikan-0.0.3.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`178d1e7f8f5b33f97f7f6554a311c3695d5f8a253f85d5b81c8e9af9491f3af5`
MD5	`3f04293d0751a39df19670f018e90130`
BLAKE2b-256	`423d64067c5f3898a169c0be88080d71f645f9d955219d03f4314d0fe4a6c565`

See more details on using hashes here.

oikan 0.0.3.8

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

OIKAN: Neuro-Symbolic ML for Scientific Discovery

Overview

Key Features

Scientific Foundation

Quick Start

Installation

Method 1: Via PyPI (Recommended)

Method 2: Local Development

System Requirements

Regression Example

Classification Example

Architecture Diagram

High-Level Architecture:

UML Diagram:

OIKAN Symbolic Model Compilers

Example of Python Compiler

Contributing

Citation

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes