Soft Algebra Optimizer + O(N) Linear Attention for Long Context LLMs
Project description
Mobiu-Q v3.2.0
Soft Algebra for Optimization & Attention
Overview
Mobiu-Q is a framework built on Soft Algebra (nilpotent ε²=0) that provides:
- MobiuOptimizer - Stable optimization in noisy environments
- MobiuAttention 🧪 - O(N) linear attention for long sequences
Both share the same mathematical foundation but serve different purposes.
Installation
pip install mobiu-q
Quick Start
MobiuOptimizer (Stable API)
from mobiu_q import MobiuOptimizer
import torch
# Your license key (get one at https://app.mobiu.ai)
LICENSE_KEY = "your-license-key-here"
# Wrap any PyTorch optimizer
model = MyModel()
base_opt = torch.optim.Adam(model.parameters(), lr=0.0003)
opt = MobiuOptimizer(
base_opt,
license_key=LICENSE_KEY,
method="adaptive",
use_soft_algebra=True
)
for batch in dataloader:
loss = criterion(model(batch))
loss.backward()
opt.step(loss.item()) # Pass loss for Soft Algebra
opt.end() # Important: release resources
🆕 AUTO Mode - Don't Know Which Method to Use?
from mobiu_q import MobiuOptimizer
# Let Mobiu-Q choose automatically!
opt = MobiuOptimizer(base_opt, license_key=LICENSE_KEY, method="auto")
for batch in dataloader:
loss = criterion(model(batch))
loss.backward()
opt.step(loss.item())
print(f"Selected mode: {opt.current_mode}") # boost, dampen, or off
opt.end()
Results: ~35% average improvement, 0% hurts across 7 domains!
License Key
MobiuOptimizer requires a license key to access the cloud API:
from mobiu_q import MobiuOptimizer
LICENSE_KEY = "your-license-key-here"
# PyTorch mode (pass optimizer)
opt = MobiuOptimizer(base_opt, license_key=LICENSE_KEY, method="adaptive")
# Quantum/NumPy mode (pass params array)
opt = MobiuOptimizer(params, license_key=LICENSE_KEY, method="standard")
Get your key: https://app.mobiu.ai
| Tier | API Calls | Price |
|---|---|---|
| Free | 20/month | $0 |
| Pro | Unlimited | $19/month |
Note: MobiuAttention and AUTO mode run locally and do NOT require API calls for each step.
Methods
| Method | Use Case | Default LR |
|---|---|---|
auto 🆕 |
Don't know? Use this! Auto-selects best | 0.01 |
auto-safe 🆕 |
Safer auto with DecayBoost | 0.01 |
standard |
Smooth landscapes, chemistry, physics | 0.01 |
deep |
Deep circuits, noisy hardware, complex opt | 0.1 |
adaptive |
RL, LLM fine-tuning, high-variance problems | 0.0003 |
🆕 AUTO Mode Details
AUTO mode runs 3 virtual optimizers in parallel and picks the winner:
| Mode | Description | When It Wins |
|---|---|---|
boost |
Higher LR when making progress | Biased gradients (Federated, Imbalanced) |
dampen |
Lower LR when gradients noisy | High variance, oscillations |
off |
Standard Adam | Clean problems |
# Maximum improvement (~35% avg)
opt = MobiuOptimizer(base_opt, method="auto")
# Safer, more conservative (~20% avg, fewer hurts)
opt = MobiuOptimizer(base_opt, method="auto-safe")
AUTO with Maximize (RL / Trading)
AUTO mode supports both minimization and maximization:
# Loss minimization (default) - VQE, supervised learning
opt = MobiuOptimizer(base_opt, method="auto")
opt.step(loss.item())
# Reward maximization - RL, trading
opt = MobiuOptimizer(base_opt, method="auto", maximize=True)
opt.step(episode_return)
How AUTO Differs from Other Methods
| Feature | standard/deep/adaptive |
auto |
|---|---|---|
| Uses Frustration Engine | ✅ Yes | ❌ No |
| Uses Cloud API | ✅ Yes | ❌ No (local) |
| Adapts LR | Fixed strategy | Picks best strategy |
| Extra latency | ~1ms per sync | Zero |
Why no Frustration Engine in AUTO?
They solve the same problem differently:
- Frustration Engine: Reactive - detects stagnation, then boosts LR ×3
- AUTO: Proactive - continuously picks the best LR strategy
Using both together could cause conflicts (both trying to adjust LR).
Benchmark (7 domains, 30 seeds each):
| Domain | AUTO | AUTO-Safe |
|---|---|---|
| Federated Learning | +40.8% | +22.4% |
| Noisy Labels | +22.6% | +14.3% |
| Sim-to-Real | +28.0% | +16.9% |
| Imbalanced Data | +27.8% | +17.0% |
| Meta-Learning | +42.7% | +22.8% |
| GAN Training | +42.6% | +23.0% |
| Clean | +43.9% | +23.2% |
| Hurts | 0/210 | 0/210 |
Benchmarks
Reinforcement Learning & Trading
| Domain | Improvement | Win Rate | p-value |
|---|---|---|---|
| Crypto Trading | +56% profit | 100% | <0.001 |
| LunarLander-v3 | +128% | 97% | <0.001 |
| MuJoCo InvertedPendulum | +111% | 100% | <0.001 |
Quantum Computing
| Domain | Improvement | Win Rate | p-value |
|---|---|---|---|
| VQE H₂ (FakeFez) | +52% | 100% | <0.001 |
| QAOA MaxCut | +45% | 95% | <0.001 |
Noisy & Distributed Learning
| Domain | Improvement | Win Rate | p-value | Bias Source |
|---|---|---|---|---|
| Federated Learning | +67% | 100% | <0.001 | Non-IID client data |
| Imbalanced Data | +52% | 100% | <0.001 | Majority class dominates |
| Sim-to-Real | +47% | 100% | <0.001 | Simulator ≠ reality |
| Noisy Labels | +40% | 100% | <0.001 | Systematic mislabeling |
Monitoring Training
opt = MobiuOptimizer(base_opt, license_key=LICENSE_KEY, method="adaptive")
# ... training ...
# Track metrics
print(opt.lr_history) # Learning rates over time
print(opt.warp_history) # Gradient warp factors
print(opt.current_mode) # AUTO mode selection (if using auto)
Maximize vs Minimize
By default, Mobiu-Q assumes you're minimizing (loss, energy). For RL/Trading where you maximize (reward, profit), set maximize=True:
LICENSE_KEY = "your-license-key-here"
# Loss minimization (default) - for supervised learning, VQE
opt = MobiuOptimizer(base_opt, license_key=LICENSE_KEY, method="adaptive")
opt.step(loss.item())
# Reward maximization - for RL, trading
opt = MobiuOptimizer(base_opt, license_key=LICENSE_KEY, method="adaptive", maximize=True)
opt.step(episode_return)
A/B Testing
LICENSE_KEY = "your-license-key-here"
# Test with Soft Algebra
opt_on = MobiuOptimizer(base_opt, license_key=LICENSE_KEY, use_soft_algebra=True)
# Test without (baseline)
opt_off = MobiuOptimizer(base_opt, license_key=LICENSE_KEY, use_soft_algebra=False)
Examples by Domain
🆕 AUTO Mode (Recommended for New Users)
from mobiu_q import MobiuQCore
import numpy as np
LICENSE_KEY = "your-license-key-here"
# Don't know which method? Use AUTO!
params = np.random.randn(20)
opt = MobiuQCore(license_key=LICENSE_KEY, method="auto")
def energy_fn(p):
return np.sum(p ** 2) # Your loss function
for step in range(100):
params = opt.step(params, energy_fn)
if step % 20 == 0:
print(f"Step {step}: energy={energy_fn(params):.4f}, mode={opt.current_mode}")
opt.end()
Federated Learning
import numpy as np
from mobiu_q import MobiuOptimizer
LICENSE_KEY = "your-license-key-here"
# Simulate federated aggregation with non-IID clients
class FederatedTrainer:
def __init__(self, n_clients=10, non_iid_strength=0.5):
self.n_clients = n_clients
self.non_iid = non_iid_strength
self.client_biases = [np.random.randn(dim) * non_iid_strength
for _ in range(n_clients)]
def aggregate_gradients(self, params, sampled_clients):
grads = []
for c in sampled_clients:
local_grad = compute_gradient(params) + self.client_biases[c]
grads.append(local_grad)
return np.mean(grads, axis=0)
params = np.random.randn(100)
opt = MobiuOptimizer(
params,
license_key=LICENSE_KEY,
method="standard", # or "auto"!
base_lr=0.01
)
for round in range(100):
clients = np.random.choice(n_clients, size=5, replace=False)
gradient = trainer.aggregate_gradients(params, clients)
loss = compute_global_loss(params)
params = opt.step(params, gradient, loss)
opt.end()
Reinforcement Learning (REINFORCE)
import torch
import gymnasium as gym
from mobiu_q import MobiuOptimizer
LICENSE_KEY = "your-license-key-here"
policy = torch.nn.Sequential(
torch.nn.Linear(8, 64), torch.nn.Tanh(),
torch.nn.Linear(64, 64), torch.nn.Tanh(),
torch.nn.Linear(64, 4)
)
base_opt = torch.optim.Adam(policy.parameters(), lr=3e-4)
opt = MobiuOptimizer(
base_opt,
license_key=LICENSE_KEY,
method="adaptive",
maximize=True,
sync_interval=50,
verbose=True
)
env = gym.make("LunarLander-v3")
for episode in range(1000):
state, _ = env.reset()
log_probs, rewards = [], []
done = False
while not done:
logits = policy(torch.FloatTensor(state))
dist = torch.distributions.Categorical(logits=logits)
action = dist.sample()
log_probs.append(dist.log_prob(action))
state, reward, terminated, truncated, _ = env.step(action.item())
rewards.append(reward)
done = terminated or truncated
# REINFORCE update
returns = []
G = 0
for r in reversed(rewards):
G = r + 0.99 * G
returns.insert(0, G)
returns = torch.tensor(returns)
returns = (returns - returns.mean()) / (returns.std() + 1e-8)
loss = sum(-lp * G for lp, G in zip(log_probs, returns))
opt.zero_grad()
loss.backward()
opt.step(sum(rewards))
opt.end()
Quantum Chemistry (VQE with Qiskit)
import numpy as np
from qiskit.circuit.library import EfficientSU2
from qiskit.quantum_info import SparsePauliOp
from qiskit_aer import AerSimulator
from qiskit.primitives import BackendEstimatorV2
from mobiu_q import MobiuOptimizer
LICENSE_KEY = "your-license-key-here"
hamiltonian = SparsePauliOp.from_list([
("II", -0.4804), ("ZZ", 0.3435), ("ZI", -0.4347),
("IZ", 0.5716), ("XX", 0.0910), ("YY", 0.0910)
])
backend = AerSimulator()
estimator = BackendEstimatorV2(backend=backend)
estimator.options.default_shots = 4096
ansatz = EfficientSU2(2, reps=2, entanglement="linear")
params = np.random.uniform(-0.3, 0.3, ansatz.num_parameters)
opt = MobiuOptimizer(
params,
license_key=LICENSE_KEY,
method="standard", # or "auto"!
mode="hardware",
use_soft_algebra=True
)
for step in range(100):
delta = np.random.choice([-1, 1], size=len(params))
shift = 0.1
job = estimator.run([
(ansatz, hamiltonian, params),
(ansatz, hamiltonian, params + shift * delta),
(ansatz, hamiltonian, params - shift * delta)
])
results = job.result()
energy = float(results[0].data.evs)
grad = (float(results[1].data.evs) - float(results[2].data.evs)) / (2 * shift) * delta
params = opt.step(params, grad, energy)
opt.end()
print(f"Final energy: {energy:.4f}")
Base Optimizers
| Optimizer | Description | Best For |
|---|---|---|
Adam |
Adaptive moments | Default, most cases |
AdamW |
Adam with weight decay | LLM, Transformers |
NAdam |
Adam with Nesterov | Alternative to Adam |
AMSGrad |
Adam with stability | Drug discovery |
SGD |
Simple gradient descent | QAOA, convex |
Momentum |
SGD with momentum | RL, LLM fine-tuning |
LAMB |
Layer-wise adaptive | Large batch |
🛠️ Troubleshooting
1. Don't Know Which Method to Use?
# Just use AUTO!
opt = MobiuOptimizer(base_opt, method="auto")
2. Switch Base Optimizer
| Problem Type | Recommended |
|---|---|
| LoRA / LLM | Momentum or AdamW |
| VQE / Chemistry | Adam |
| QAOA | NAdam |
| RL / Trading | Momentum |
| Drug Discovery | AMSGrad |
3. Adjust Learning Rate
# Try lower LR if diverging
base_opt = torch.optim.Adam(model.parameters(), lr=0.0001)
# Try higher LR if stuck
base_opt = torch.optim.Adam(model.parameters(), lr=0.001)
MobiuAttention 🧪
Why?
Standard Transformer attention is O(N²). MobiuAttention is O(N).
| Seq Length | Transformer | MobiuAttention | Speedup |
|---|---|---|---|
| 2,048 | 21s | 9s | 2.3x |
| 4,096 | 39s | 10s | 3.9x |
| 8,192 | 42s | 7s | 6.0x |
| 16,384 | OOM 💥 | 5s ✅ | ∞ |
Usage
from mobiu_q.experimental import MobiuBlock
# No license key needed - runs locally!
class LongContextLM(nn.Module):
def __init__(self, vocab, d=512, h=8, layers=6):
super().__init__()
self.embed = nn.Embedding(vocab, d)
self.blocks = nn.Sequential(*[MobiuBlock(d, h) for _ in range(layers)])
self.head = nn.Linear(d, vocab)
def forward(self, x):
return self.head(self.blocks(self.embed(x)))
# Works with 16K+ tokens!
model = LongContextLM(50000)
x = torch.randint(0, 50000, (1, 16384))
out = model(x) # No OOM!
Full Examples
| File | Domain | Description |
|---|---|---|
test_lunarlander_hybrid.py |
RL | LunarLander with REINFORCE |
test_mujoco_maximize.py |
RL | MuJoCo continuous control |
ppo_mobiu_test.py |
RL | PPO from scratch |
crypto_trading_benchmark.py |
Trading | Crypto with regime switching |
test_fakefez_h2.py |
VQE | H₂ molecule on FakeFez |
test_fakefez_qaoa.py |
QAOA | MaxCut optimization |
test_federated_fair.py |
FL | Federated learning |
test_noisy_labels_fair.py |
Noisy | Noisy labels |
test_sim_to_real_fair.py |
Robotics | Sim-to-real |
test_imbalanced_fair.py |
Classification | Imbalanced data |
License
| Tier | API Calls | Price | Get Started |
|---|---|---|---|
| Free | 20/month | $0 | Sign up |
| Pro | Unlimited | $19/month | Get one |
Note: MobiuAttention and AUTO mode run locally, no API calls required.
Links
Citation
@software{mobiu_q,
title={Mobiu-Q: Soft Algebra for Optimization and Attention},
author={Mobiu Technologies},
year={2026},
url={https://mobiu.ai}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mobiu_q-3.2.0.tar.gz.
File metadata
- Download URL: mobiu_q-3.2.0.tar.gz
- Upload date:
- Size: 30.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
acc51c45fc3291a29a621bc63093e3efd525f3defb8c2a82a66eef17a9b14554
|
|
| MD5 |
15e41a8eb717e86881819f39997c6b99
|
|
| BLAKE2b-256 |
4e7bcdce75d91602988aad57ab64304fa8e3e1ded3cc24be4a9a19b4b3711289
|
File details
Details for the file mobiu_q-3.2.0-py3-none-any.whl.
File metadata
- Download URL: mobiu_q-3.2.0-py3-none-any.whl
- Upload date:
- Size: 28.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb288d67ae30149a7ac42b73f0494502f2534f276918302967a684806350dfbd
|
|
| MD5 |
a90ba6e7ffe0558587f460507361ee1a
|
|
| BLAKE2b-256 |
e3efc31bafead537abed9f912e01f2ea9cd57662a6d4d22606cc329a58009c53
|