LlamaMlx-RS
Project description
LlamaMlx-RS
High-performance MLX models in Rust for Apple Silicon
Overview
LlamaMlx-RS is a comprehensive Rust ecosystem for running MLX models on Apple Silicon devices. It provides efficient, type-safe Rust bindings to Apple's MLX framework along with high-level libraries for different ML tasks.
The ecosystem consists of the following components: e
- Core Library:
llamamlx-rs- Rust bindings to MLX with tensor operations, device management, and model loading - ML Libraries:
llama-textgen-rs- Text generation with LLMsllama-embed-rs- Text embedding generationllama-image-rs- Computer vision tasks (classification, detection, segmentation)llama-vlm-rs- Vision-language models for multimodal processing
- Utility Libraries:
llama-shard-rs- Model sharding for distributed inferencellama-arxiv-rs- ArXiv paper downloading and processingllama-moonlight-rs- Web scraping and CAPTCHA solving
- Integration Tools:
- Server applications
- CLI tools
- Example applications
Features
- ๐ High Performance: Optimized for Apple Silicon M1/M2/M3 chips
- ๐ Easy Conversion: Utilities for converting models from PyTorch/ONNX to MLX
- ๐ฆ Production-Ready: Comprehensive error handling, performance monitoring, and testing
- ๐ Distributed Inference: Shard large models across multiple devices
- ๐ API Compatibility: Drop-in replacement for popular APIs like OpenAI
- ๐ Flexible I/O: Load and save models, weights, and tensors in various formats
- ๐ Visualization: Rich tools for visualizing tensors, model outputs, and performance metrics
- ๐งฉ Modular Design: Use only the components you need
Installation
Prerequisites
- macOS 13+ with Apple Silicon (M1/M2/M3)
- Rust 1.75+
- Xcode Command Line Tools
- Python 3.9+ (for model conversion)
Setting up the Ecosystem
# Clone the repository
git clone https://github.com/llamamlx-rs/llamamlx-rs.git
cd llamamlx-rs
# Run the setup script
./setup-ecosystem.sh
# Build all components
cargo build --release
Quick Start
Text Generation with Llama 3
use llamamlx_rs::device::Device;
use llama_textgen::{TextGenerator, GenerationOptions};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create a text generator with Llama 3
let generator = TextGenerator::new_from_path(
"models/Llama-3-8B-iq4",
&Device::gpu(0)
)?;
// Generate text
let options = GenerationOptions {
temperature: 0.7,
top_p: 0.9,
max_tokens: 100,
stop_sequences: vec!["\n\n".to_string()],
};
let result = generator.generate(
"Explain quantum computing in simple terms:",
&options
)?;
println!("{}", result.text);
Ok(())
}
Image Classification
use llamamlx_rs::device::Device;
use llama_image::{
image::Image,
classification::ImageClassifier,
};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Load an image
let image = Image::from_file("examples/cat.jpg")?;
// Create a classifier with MobileNet
let classifier = ImageClassifier::new_from_path(
"models/mobilenet-v2-mlx",
Some("models/mobilenet-v2-mlx/labels.txt"),
&Device::gpu(0)
)?;
// Classify the image
let result = classifier.classify(&image)?;
println!("Class: {} ({:.2}% confidence)",
result.class_name,
result.confidence * 100.0
);
Ok(())
}
Visualization
use llamamlx_rs::{
tensor::Array,
visualization::{
terminal::print_tensor_heatmap,
file::save_classification_tsv,
},
};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create a sample 2D tensor
let data = vec![
0.1, 0.2, 0.3, 0.4,
0.5, 0.9, 0.8, 0.7,
0.2, 0.3, 0.8, 0.5,
0.4, 0.5, 0.6, 0.1,
];
let tensor = Array::from_slice(&data, [4, 4]);
// Display as a heatmap in the terminal
print_tensor_heatmap(&tensor, Some("Sample Heatmap"), None, None)?;
// Create classification results
let categories = vec![
("Cat".to_string(), 0.85),
("Dog".to_string(), 0.12),
("Bird".to_string(), 0.03),
];
// Save classification results to TSV file
save_classification_tsv(&categories, "classification.tsv")?;
Ok(())
}
Using the CLI for Visualization
# Generate a heatmap visualization of a tensor from a CSV file
llamamlx visualize --input tensor_data.csv --viz-type heatmap --terminal
# Generate a classification visualization from a JSON file
llamamlx visualize --input classification.json --viz-type classification --terminal
# Create an HTML report from model results
llamamlx visualize --input results.json --viz-type report --html --output report.html
# Create a PNG plot with a specific color scheme
llamamlx visualize --input tensor_data.csv --viz-type heatmap --output heatmap.png --color-scheme viridis
Distributed Inference
use llamamlx_rs::device::Device;
use llama_shard::{
config::ShardingConfig,
coordinator::Coordinator,
sharding::ShardStrategy,
};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create sharding configuration
let config = ShardingConfig::new(
"models/Llama-3-8B-iq4".into(),
2, // number of shards
ShardStrategy::LayerSharding,
);
// Create and start coordinator
let coordinator = Coordinator::new(config)?;
coordinator.start().await?;
// In a production setup, you would run workers on different machines
// For this example, we'll register local workers
println!("Coordinator ready at localhost:50051");
println!("Run worker instances with:");
println!(" cargo run --bin llamashard -- worker --shard-id 0 --coordinator localhost:50051");
println!(" cargo run --bin llamashard -- worker --shard-id 1 --coordinator localhost:50051");
// Wait for Ctrl+C
tokio::signal::ctrl_c().await?;
coordinator.shutdown().await?;
Ok(())
}
Available Models
Text Models
| Model | Size | Quantization | Performance (tokens/sec) |
|---|---|---|---|
| Llama 3 Instruct | 8B | Q4 | ~30 (M2 Pro) |
| Llama 3 Instruct | 8B | Q8 | ~25 (M2 Pro) |
| Llama 2 Chat | 7B | Q4 | ~35 (M2 Pro) |
| Mistral Instruct | 7B | Q4 | ~32 (M2 Pro) |
Vision Models
| Model | Task | Size | Performance (images/sec) |
|---|---|---|---|
| MobileNet V2 | Classification | 14MB | ~90 (M2 Pro) |
| YOLOv8n | Detection | 25MB | ~45 (M2 Pro) |
| SegFormer-B0 | Segmentation | 14MB | ~30 (M2 Pro) |
Multimodal Models
| Model | Tasks | Size | Performance |
|---|---|---|---|
| LLaVA 1.6 | VQA, Captioning | 8.5GB | ~5 img/sec (M2 Pro) |
| MobileVLM | VQA, Captioning | 1.5GB | ~12 img/sec (M2 Pro) |
Documentation
- API Reference
- [User Guide](https://llamasearch.ai
- Examples
- Model Zoo
Architecture
The LlamaMlx-RS ecosystem is designed with a modular architecture:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Applications โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โ
โ โ REST Server โ โ CLI Tools โ โ GUI Apps โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ML Libraries โ
โ โโโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโ โโโโโโโโโโโโ โ
โ โTextGen โ โEmbeddingโ โImage โ โVLM โ โSharding โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโ โโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Core Library โ
โ โโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โ
โ โTensor โ โDevice โ โModel โ โGraph โ โOps โ โ
โ โโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Apple MLX Framework โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Performance
LlamaMlx-RS is designed to leverage the full power of Apple Silicon, with performance comparable to or better than Python-based MLX implementations:
| Model | Task | LlamaMlx-RS | Python MLX | LlamaMlx-RS vs Python |
|---|---|---|---|---|
| Llama 3 (8B) | Generation | 30 tok/s | 28 tok/s | 1.07x faster |
| MobileNet | Image | 90 img/s | 85 img/s | 1.06x faster |
| Embedding | Embed | 250 txt/s | 230 txt/s | 1.09x faster |
Contributing
Contributions are welcome! Please check out our contribution guidelines for details.
License
LlamaMlx-RS is licensed under the MIT License.
Acknowledgements
- Apple MLX Team - For creating the MLX framework
- Rust Community - For the amazing language and tools
- Hugging Face - For model weights and architectures
- All Contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llamamlx_rs_llamasearch-0.1.0rc180.tar.gz.
File metadata
- Download URL: llamamlx_rs_llamasearch-0.1.0rc180.tar.gz
- Upload date:
- Size: 6.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7a09b184a21c7d3a9e73a83bbde2b5b52c15c9f30ae30727b3f63c4d40fdef8
|
|
| MD5 |
b652bbcf427ff7cbdda96c55292abdfd
|
|
| BLAKE2b-256 |
f9a4b505dac9b9ca77040703f15b8c5a973539454bc964cda156cc135d78ba17
|
File details
Details for the file llamamlx_rs_llamasearch-0.1.0rc180-py3-none-any.whl.
File metadata
- Download URL: llamamlx_rs_llamasearch-0.1.0rc180-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa6c79339ab13744833dc07d184250fef8a0bf8c6fac919b788c3368e593792e
|
|
| MD5 |
09ba71fa1c650cb13cabe89e5aada1ee
|
|
| BLAKE2b-256 |
8249833c8e8e6895be23b16b8f1cd91d0e42823590a6078381c368e47c104667
|