LlamaMlx-RS

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Project description

LlamaMlx-RS

LlamaMlx-RS Logo

High-performance MLX models in Rust for Apple Silicon

Overview

LlamaMlx-RS is a comprehensive Rust ecosystem for running MLX models on Apple Silicon devices. It provides efficient, type-safe Rust bindings to Apple's MLX framework along with high-level libraries for different ML tasks.

The ecosystem consists of the following components: e

Core Library: llamamlx-rs - Rust bindings to MLX with tensor operations, device management, and model loading
ML Libraries:
- llama-textgen-rs - Text generation with LLMs
- llama-embed-rs - Text embedding generation
- llama-image-rs - Computer vision tasks (classification, detection, segmentation)
- llama-vlm-rs - Vision-language models for multimodal processing
Utility Libraries:
- llama-shard-rs - Model sharding for distributed inference
- llama-arxiv-rs - ArXiv paper downloading and processing
- llama-moonlight-rs - Web scraping and CAPTCHA solving
Integration Tools:
- Server applications
- CLI tools
- Example applications

Features

🚀 High Performance: Optimized for Apple Silicon M1/M2/M3 chips
🔄 Easy Conversion: Utilities for converting models from PyTorch/ONNX to MLX
📦 Production-Ready: Comprehensive error handling, performance monitoring, and testing
🌐 Distributed Inference: Shard large models across multiple devices
🔌 API Compatibility: Drop-in replacement for popular APIs like OpenAI
📊 Flexible I/O: Load and save models, weights, and tensors in various formats
📈 Visualization: Rich tools for visualizing tensors, model outputs, and performance metrics
🧩 Modular Design: Use only the components you need

Installation

Prerequisites

macOS 13+ with Apple Silicon (M1/M2/M3)
Rust 1.75+
Xcode Command Line Tools
Python 3.9+ (for model conversion)

Setting up the Ecosystem

# Clone the repository
git clone https://github.com/llamamlx-rs/llamamlx-rs.git
cd llamamlx-rs

# Run the setup script
./setup-ecosystem.sh

# Build all components
cargo build --release

Quick Start

Text Generation with Llama 3

use llamamlx_rs::device::Device;
use llama_textgen::{TextGenerator, GenerationOptions};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create a text generator with Llama 3
    let generator = TextGenerator::new_from_path(
        "models/Llama-3-8B-iq4", 
        &Device::gpu(0)
    )?;

    // Generate text
    let options = GenerationOptions {
        temperature: 0.7,
        top_p: 0.9,
        max_tokens: 100,
        stop_sequences: vec!["\n\n".to_string()],
    };
    
    let result = generator.generate(
        "Explain quantum computing in simple terms:", 
        &options
    )?;
    
    println!("{}", result.text);

    Ok(())
}

Image Classification

use llamamlx_rs::device::Device;
use llama_image::{
    image::Image,
    classification::ImageClassifier,
};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load an image
    let image = Image::from_file("examples/cat.jpg")?;
    
    // Create a classifier with MobileNet
    let classifier = ImageClassifier::new_from_path(
        "models/mobilenet-v2-mlx",
        Some("models/mobilenet-v2-mlx/labels.txt"),
        &Device::gpu(0)
    )?;
    
    // Classify the image
    let result = classifier.classify(&image)?;
    
    println!("Class: {} ({:.2}% confidence)", 
        result.class_name, 
        result.confidence * 100.0
    );

    Ok(())
}

Visualization

use llamamlx_rs::{
    tensor::Array,
    visualization::{
        terminal::print_tensor_heatmap,
        file::save_classification_tsv,
    },
};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create a sample 2D tensor
    let data = vec![
        0.1, 0.2, 0.3, 0.4,
        0.5, 0.9, 0.8, 0.7,
        0.2, 0.3, 0.8, 0.5,
        0.4, 0.5, 0.6, 0.1,
    ];
    let tensor = Array::from_slice(&data, [4, 4]);
    
    // Display as a heatmap in the terminal
    print_tensor_heatmap(&tensor, Some("Sample Heatmap"), None, None)?;
    
    // Create classification results
    let categories = vec![
        ("Cat".to_string(), 0.85),
        ("Dog".to_string(), 0.12), 
        ("Bird".to_string(), 0.03),
    ];
    
    // Save classification results to TSV file
    save_classification_tsv(&categories, "classification.tsv")?;
    
    Ok(())
}

Using the CLI for Visualization

# Generate a heatmap visualization of a tensor from a CSV file
llamamlx visualize --input tensor_data.csv --viz-type heatmap --terminal

# Generate a classification visualization from a JSON file
llamamlx visualize --input classification.json --viz-type classification --terminal

# Create an HTML report from model results
llamamlx visualize --input results.json --viz-type report --html --output report.html

# Create a PNG plot with a specific color scheme
llamamlx visualize --input tensor_data.csv --viz-type heatmap --output heatmap.png --color-scheme viridis

Distributed Inference

use llamamlx_rs::device::Device;
use llama_shard::{
    config::ShardingConfig,
    coordinator::Coordinator,
    sharding::ShardStrategy,
};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create sharding configuration
    let config = ShardingConfig::new(
        "models/Llama-3-8B-iq4".into(),
        2,  // number of shards
        ShardStrategy::LayerSharding,
    );
    
    // Create and start coordinator
    let coordinator = Coordinator::new(config)?;
    coordinator.start().await?;
    
    // In a production setup, you would run workers on different machines
    // For this example, we'll register local workers
    
    println!("Coordinator ready at localhost:50051");
    println!("Run worker instances with:");
    println!("  cargo run --bin llamashard -- worker --shard-id 0 --coordinator localhost:50051");
    println!("  cargo run --bin llamashard -- worker --shard-id 1 --coordinator localhost:50051");
    
    // Wait for Ctrl+C
    tokio::signal::ctrl_c().await?;
    coordinator.shutdown().await?;

    Ok(())
}

Available Models

Text Models

Model	Size	Quantization	Performance (tokens/sec)
Llama 3 Instruct	8B	Q4	~30 (M2 Pro)
Llama 3 Instruct	8B	Q8	~25 (M2 Pro)
Llama 2 Chat	7B	Q4	~35 (M2 Pro)
Mistral Instruct	7B	Q4	~32 (M2 Pro)

Vision Models

Model	Task	Size	Performance (images/sec)
MobileNet V2	Classification	14MB	~90 (M2 Pro)
YOLOv8n	Detection	25MB	~45 (M2 Pro)
SegFormer-B0	Segmentation	14MB	~30 (M2 Pro)

Multimodal Models

Model	Tasks	Size	Performance
LLaVA 1.6	VQA, Captioning	8.5GB	~5 img/sec (M2 Pro)
MobileVLM	VQA, Captioning	1.5GB	~12 img/sec (M2 Pro)

Documentation

Architecture

The LlamaMlx-RS ecosystem is designed with a modular architecture:

┌───────────────────────────────────────────────────────────┐
│                      Applications                          │
│   ┌─────────────┐  ┌──────────────┐  ┌───────────────┐    │
│   │ REST Server │  │ CLI Tools    │  │ GUI Apps      │    │
│   └─────────────┘  └──────────────┘  └───────────────┘    │
└───────────────────────────────────────────────────────────┘
               │              │              │
┌───────────────────────────────────────────────────────────┐
│                     ML Libraries                           │
│ ┌──────────┐ ┌─────────┐ ┌─────────┐ ┌─────┐ ┌──────────┐ │
│ │TextGen   │ │Embedding│ │Image    │ │VLM  │ │Sharding  │ │
│ └──────────┘ └─────────┘ └─────────┘ └─────┘ └──────────┘ │
└───────────────────────────────────────────────────────────┘
                           │
┌───────────────────────────────────────────────────────────┐
│                       Core Library                         │
│  ┌────────┐ ┌─────────┐ ┌────────┐ ┌────────┐ ┌────────┐  │
│  │Tensor  │ │Device   │ │Model   │ │Graph   │ │Ops     │  │
│  └────────┘ └─────────┘ └────────┘ └────────┘ └────────┘  │
└───────────────────────────────────────────────────────────┘
                           │
┌───────────────────────────────────────────────────────────┐
│                     Apple MLX Framework                    │
└───────────────────────────────────────────────────────────┘

Performance

LlamaMlx-RS is designed to leverage the full power of Apple Silicon, with performance comparable to or better than Python-based MLX implementations:

Model	Task	LlamaMlx-RS	Python MLX	LlamaMlx-RS vs Python
Llama 3 (8B)	Generation	30 tok/s	28 tok/s	1.07x faster
MobileNet	Image	90 img/s	85 img/s	1.06x faster
Embedding	Embed	250 txt/s	230 txt/s	1.09x faster

Contributing

Contributions are welcome! Please check out our contribution guidelines for details.

License

LlamaMlx-RS is licensed under the MIT License.

Acknowledgements

Apple MLX Team - For creating the MLX framework
Rust Community - For the amazing language and tools
Hugging Face - For model weights and architectures
All Contributors

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

This version

0.1.0rc180 pre-release

Apr 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llamamlx_rs_llamasearch-0.1.0rc180.tar.gz (6.7 kB view details)

Uploaded Apr 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llamamlx_rs_llamasearch-0.1.0rc180-py3-none-any.whl (6.4 kB view details)

Uploaded Apr 4, 2025 Python 3

File details

Details for the file llamamlx_rs_llamasearch-0.1.0rc180.tar.gz.

File metadata

Download URL: llamamlx_rs_llamasearch-0.1.0rc180.tar.gz
Upload date: Apr 4, 2025
Size: 6.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for llamamlx_rs_llamasearch-0.1.0rc180.tar.gz
Algorithm	Hash digest
SHA256	`c7a09b184a21c7d3a9e73a83bbde2b5b52c15c9f30ae30727b3f63c4d40fdef8`
MD5	`b652bbcf427ff7cbdda96c55292abdfd`
BLAKE2b-256	`f9a4b505dac9b9ca77040703f15b8c5a973539454bc964cda156cc135d78ba17`

See more details on using hashes here.

File details

Details for the file llamamlx_rs_llamasearch-0.1.0rc180-py3-none-any.whl.

File metadata

Download URL: llamamlx_rs_llamasearch-0.1.0rc180-py3-none-any.whl
Upload date: Apr 4, 2025
Size: 6.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for llamamlx_rs_llamasearch-0.1.0rc180-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fa6c79339ab13744833dc07d184250fef8a0bf8c6fac919b788c3368e593792e`
MD5	`09ba71fa1c650cb13cabe89e5aada1ee`
BLAKE2b-256	`8249833c8e8e6895be23b16b8f1cd91d0e42823590a6078381c368e47c104667`

See more details on using hashes here.

llamamlx-rs-llamasearch 0.1.0rc180

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LlamaMlx-RS

Overview

Features

Installation

Prerequisites

Setting up the Ecosystem

Quick Start

Text Generation with Llama 3

Image Classification

Visualization

Using the CLI for Visualization

Distributed Inference

Available Models

Text Models

Vision Models

Multimodal Models

Documentation

Architecture

Performance

Contributing

License

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes