Skip to main content

SAPPHIRE: High-Performance Compute Acceleration Framework for Apple Silicon

Project description

๐Ÿ”ฅ SAPPHIRE: The NVIDIA CUDA Killer for Apple Silicon ๐Ÿ”ฅ

PyPI version License: MIT Apple Silicon

SAPPHIRE is a complete CUDA replacement that extracts 1.6 TFLOPS from Apple Silicon's AMX accelerator. Train and run AI models on Mac Mini for 50x less cost and 23x less power than NVIDIA hardware.

๐Ÿš€ Performance

Operation SAPPHIRE NVIDIA H100*
SGEMM 1.56 TFLOPS 60 TFLOPS
Flash Attention 943 GFLOPS ~20 TFLOPS
Conv2D 1.57 TFLOPS ~30 TFLOPS
INT8 Quantize 6.3 B elem/s ~50 B elem/s

H100 costs $30,000 and uses 700W. Mac Mini costs $599 and uses 30W.

Price/Performance: SAPPHIRE wins by 50x!

๐Ÿ“ฆ Installation

pip install sapphire-compute

๐Ÿ”ฅ Quick Start

import sapphire
import numpy as np

# Matrix multiplication at 1.6 TFLOPS
A = np.random.randn(4096, 4096).astype(np.float32)
B = np.random.randn(4096, 4096).astype(np.float32)
C = sapphire.matmul(A, B)  # Uses AMX!

# Flash Attention V5
Q = np.random.randn(2, 16, 512, 64).astype(np.float32)
K = np.random.randn(2, 16, 512, 64).astype(np.float32)
V = np.random.randn(2, 16, 512, 64).astype(np.float32)
out = sapphire.flash_attention(Q, K, V)

# CUDA compatibility (drop-in replacement!)
cuda = sapphire.cuda
cuda.is_available()  # True on Mac!

๐Ÿง  LLM Inference

from sapphire.llm import LlamaInference

# Load and run Llama on Mac Mini
model = LlamaInference("meta-llama/Llama-2-7b")
output = model.generate("The future of AI is", max_tokens=100)
print(output)

๐Ÿ”— S-Fabric Clustering

Connect multiple Mac Minis for distributed compute:

from sapphire.sfabric import Cluster

# Create cluster over Thunderbolt 5
cluster = Cluster(["mac1:9999", "mac2:9999", "mac3:9999"])
cluster.connect()

# Distributed training
cluster.allreduce(gradients)

๐Ÿ—๏ธ Architecture

SAPPHIRE Stack
โ”œโ”€โ”€ Python API (numpy-compatible)
โ”œโ”€โ”€ Native Library (159 C functions)
โ”‚   โ”œโ”€โ”€ SGEMM (cblas โ†’ AMX)
โ”‚   โ”œโ”€โ”€ Flash Attention V5
โ”‚   โ”œโ”€โ”€ Conv2D (cuDNN replacement)
โ”‚   โ”œโ”€โ”€ Quantization (INT8/INT4)
โ”‚   โ””โ”€โ”€ cuSOLVER (LU, QR, SVD, Cholesky)
โ”œโ”€โ”€ Lariat Transpiler (CUDA โ†’ Sapphire)
โ””โ”€โ”€ S-Fabric RDMA (Multi-Mac clustering)

๐Ÿ“Š Benchmarks

Run the full benchmark suite:

python -m sapphire.benchmark

๐ŸŽฏ Key Features

  • 159 Native Functions: Complete ML/AI operation coverage
  • Flash Attention V5: Memory-efficient attention at 943 GFLOPS
  • Zero-Copy UMA: Unified Memory Architecture exploitation
  • Lariat CUDA Transpiler: Run CUDA code unchanged
  • S-Fabric RDMA: Thunderbolt 5 multi-Mac clustering
  • INT8 Quantization: 6.3 billion elements/second

๐Ÿ†š NVIDIA Comparison

Metric Mac Mini + Sapphire NVIDIA H100
Cost $599 $30,000
Power 30W 700W
TFLOPS/$ 0.0026 0.002
TFLOPS/W 0.052 0.086

Conclusion: For most AI workloads, Sapphire on Mac Mini is the most cost-effective solution.

๐Ÿ“„ License

MIT License - Use freely, no NVIDIA required!

๐Ÿ™ Credits

Built by Svector Corporation - Making AI accessible to everyone.


๐Ÿ”ฅ NVIDIA's monopoly is over. The future runs on Apple Silicon. ๐Ÿ”ฅ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sapphire_compute-1.0.0-py3-none-any.whl (212.9 kB view details)

Uploaded Python 3

File details

Details for the file sapphire_compute-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sapphire_compute-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 02e261fe6c78a74b1190be3714d862a3de40773d1b7781371a167f349646168d
MD5 e79fca6f3ed295001de4af025ef57f14
BLAKE2b-256 81ddbc590b61b37e0c8303acf338683785e30f9d180dfecc3ef939751a825978

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page