Efficient Multimodal Vision-Language Model with MoE Architecture
Project description
Nthuku-Fast
Efficient Multimodal Vision-Language Model with Mixture of Experts (MoE) Architecture
Features
✨ High Performance
- Flash Attention for 2-4x speedup
- Extended 8K context window (32x larger)
- Optimized MoE routing (20-30% faster)
💰 Cost Effective
- Prompt caching (10x cost reduction)
- ~8B active parameters (efficient)
- 90%+ cache hit rates
🧠 Advanced Capabilities
- Vision understanding
- Text generation
- Speculative decoding (2-3x faster)
- Thinking traces / chain-of-thought
Installation
From PyPI (once published)
pip install nthuku-fast
From source
git clone https://github.com/elijahnzeli1/Nthuku-fast_v2.git
cd Nthuku-fast_v2/nthuku-fast-package
pip install -e .
Local installation (development)
cd nthuku-fast-package
pip install -e .
Quick Start
from nthuku_fast import create_nthuku_fast_model
import torch
# Create model (all optimizations enabled by default)
model = create_nthuku_fast_model(
hidden_dim=512,
num_experts=8,
top_k_experts=2
)
# Or use presets for different sizes
model = create_nthuku_fast_model(preset="150M") # 150M parameters
# Generate text from image
pixel_values = torch.randn(1, 3, 224, 224)
text = model.generate_text(
pixel_values,
max_length=100,
use_cache=True, # Enable prompt caching
show_thinking=False # Show reasoning traces
)
Model Presets
# 50M parameters (default)
model = create_nthuku_fast_model(preset="50M")
# 150M parameters (recommended)
model = create_nthuku_fast_model(preset="150M")
# 500M parameters (high capacity)
model = create_nthuku_fast_model(preset="500M")
# 1B parameters (maximum)
model = create_nthuku_fast_model(preset="1B")
Advanced Features
Prompt Caching
# Get cache statistics
stats = model.get_cache_stats()
print(f"Cache hit rate: {stats['hit_rate']:.2%}")
Speculative Decoding
from nthuku_fast import SpeculativeDecoder
spec_decoder = SpeculativeDecoder(model, num_speculative_tokens=4)
generated, stats = spec_decoder.generate(
input_ids, vision_features,
max_new_tokens=100,
show_stats=True
)
Thinking Traces
# Enable visible reasoning
text = model.generate_text(
pixel_values,
show_thinking=True # Shows step-by-step reasoning
)
Training
from nthuku_fast import train_nthuku_fast, MultiDatasetManager
# Load datasets
dataset_manager = MultiDatasetManager()
data_sources = dataset_manager.load_all_datasets()
# Train
results = train_nthuku_fast(
model=model,
data_sources=data_sources,
batch_size=8,
num_epochs=10,
learning_rate=2e-4
)
Performance
| Feature | Improvement |
|---|---|
| Flash Attention | 2-4x faster |
| Extended Context | 32x longer (8K tokens) |
| Optimized MoE | 20-30% faster |
| Prompt Caching | 10x cost reduction |
| Speculative Decoding | 2-3x faster generation |
Combined: 5-7x faster, 81% cheaper!
Requirements
- Python ≥ 3.8
- PyTorch ≥ 2.0.0 (for Flash Attention)
- transformers ≥ 4.30.0
- Other dependencies (auto-installed)
License
MIT License
Citation
@software{nthuku_fast,
title={Nthuku-Fast: Efficient Multimodal Vision-Language Model},
author={Nthuku Team},
year={2025},
url={https://github.com/elijahnzeli1/Nthuku-fast_v2}
}
Links
- GitHub: https://github.com/elijahnzeli1/Nthuku-fast_v2
- Documentation: [Coming soon]
- HuggingFace: https://huggingface.co/Qybera/nthuku-fast-1.5
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nthuku_fast-0.1.0.tar.gz
(39.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nthuku_fast-0.1.0.tar.gz.
File metadata
- Download URL: nthuku_fast-0.1.0.tar.gz
- Upload date:
- Size: 39.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c3b2f1bc43f078584480ca8b3e281d7700cfe1a4735e7a37fe29a0377b1d589
|
|
| MD5 |
d96e1416f9fa37ee8ca177ed196b1e99
|
|
| BLAKE2b-256 |
0683fb4a58eddc0766f3104965224b64080dc8ff7c666bf980f077a30c5d8c80
|
File details
Details for the file nthuku_fast-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nthuku_fast-0.1.0-py3-none-any.whl
- Upload date:
- Size: 39.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42f76d0ecb68005b89f9d7c5048008e8949562337e8053524caac3ac42530b82
|
|
| MD5 |
9f6c442cc296400d45d2833a0cf75d66
|
|
| BLAKE2b-256 |
45ab6f1dadb7e6a78d775b37e0f82feada0f59cdbd2b657318d996fc9e3629a1
|