CUDA-compatible GPU runtime for Apple Silicon that transparently maps CUDA kernels and memory operations to Metal, enabling Numba and Python GPU workloads with stream overlap and tier-aware memory management.

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
- Science/Research
Operating System
- MacOS
Programming Language
- Python :: 3
Topic
- Scientific/Engineering
- Software Development :: Libraries :: Python Modules

Project description

MetaXuda

MetaXuda is an experimental CUDA-compatible runtime shim for Apple Silicon, written in Rust, that allows Numba CUDA kernels to run unmodified by transparently mapping CUDA runtime calls to Apple Metal.

It is designed as a drop-in replacement for core CUDA runtime libraries, enabling GPU-accelerated Python workflows on macOS without requiring the NVIDIA CUDA Toolkit or NVIDIA hardware.

✨ Features

Drop-in replacement for libcudart.dylib and libcuda.dylib
Run Numba CUDA kernels (@cuda.jit) directly on Apple Metal
Metal-backed implementations of core CUDA APIs:
- cudaMalloc / cudaFree
- cudaMemcpy / cudaMemcpyAsync
- cudaLaunchKernel
Asynchronous execution with stream-style overlap (copy / compute / copy)
Tier-aware memory management (GPU-first execution)
Ships with:
- Stubbed libdevice.bc for Numba compatibility
- Precompiled Metal .metallib shaders for fused math operations
- cuda_pipeline.so, exposing a low-level execution API that allows Numba and other callers to bypass the CUDA runtime shim and dispatch operations directly
No CUDA Toolkit, NVIDIA drivers, or NVIDIA GPU required

⚠️ Project Status

Alpha / Research Prototype

MetaXuda is under active development and currently targets:

Numba CUDA kernels
Single-GPU execution on Apple Silicon

Not all CUDA APIs are implemented, and behavior may differ from NVIDIA CUDA in edge cases.

⚙️ Installation

Requirements

macOS 13+
Python >= 3.10
NumPy >= 1.23
Numba >= 0.59

Install (Editable / Dev)

# Clone the repository
git clone https://github.com/perinban/MetaXuda.git
cd MetaXuda

# Install in editable mode
pip install -e .

The installation places the required shim libraries (libcudart.dylib, libcuda.dylib, and libdevice.bc) inside the package so they can be discovered by Numba at runtime.

📂 Package Layout

MetaXuda ships demos and helper modules inside the Python package so they are available in editable and installed modes:

metaxuda/
├── buffers/        # GPU, managed, and tiered buffer abstractions
├── execution/      # Direct and pooled execution backends
├── streams/        # Stream and async execution helpers (Numba-compatible)
├── demos/          # End-to-end demos and debug examples
├── native/         # Native shims and pipelines
│   ├── libcudart.dylib
│   ├── libcuda.dylib
│   ├── libnvvm.dylib
│   ├── libdevice.bc
│   └── cuda_pipeline.so
├── env.py          # Environment detection and setup
├── patch.py        # Numba / runtime patching hooks
└── __init__.py

The demos/ directory contains runnable examples covering kernel execution, buffers, streams, disk tiering, and the direct math pipeline.

You can run them directly once the package is installed:

python -m metaxuda.demos.add
python -m metaxuda.demos.pipeline

🚀 Usage

Once installed, existing Numba CUDA code should run without modification:

from numba import cuda
import numpy as np

@cuda.jit
def add(a, b, out):
    i = cuda.grid(1)
    if i < out.size:
        out[i] = a[i] + b[i]

n = 1024
a = np.arange(n, dtype=np.float32)
b = np.arange(n, dtype=np.float32)
out = np.zeros_like(a)

add[32, 32](a, b, out)
print(out[:5])

Execution is transparently dispatched to Metal via the MetaXuda runtime.

🗜️ Quantization, Compression, and Disk Tiering

MetaXuda supports quantized and compressed data storage for non-resident buffers and intermediate results. These behaviors are controlled via environment variables and handled by the runtime initialization logic in env.py.

This is primarily used for Tier‑3 (disk-backed) storage, allowing large workloads to exceed GPU memory limits while minimizing I/O and storage overhead.

Environment Configuration

The shim reads the following environment variables at startup:

MX_ENABLE_DATASTORE_COMPRESSION (default: 1) Enable or disable compression for spilled data blocks.
MX_DATASTORE_COMPRESSION_TYPE (default: lz4) Compression algorithm to use (e.g. lz4).
MX_DATASTORE_COMPRESSION_LEVEL (default: 3) Compression level passed to the backend compressor.
MX_DISK_PARALLELISM_LEVEL (default: auto) Controls parallel read/write behavior for disk operations.
MX_DISK_SPILL_ENABLED (default: 0) Enable spilling GPU buffers to disk when memory pressure occurs.
MX_TIER3_STRATEGY (default: prefer_external) Strategy for selecting Tier‑3 storage locations.
MX_TIER3_INTERNAL_PATH (default: block_store) Directory used for internal Tier‑3 storage.
MX_TIER3_EXTERNAL_DEVICES (format: id:path,id:path) Comma‑separated list of external devices or paths for Tier‑3 storage.
MX_DEBUG (options: memory) Enable debug logging for specific subsystems.

These settings allow fine‑grained control over compression, quantization, disk spill behavior, and debugging without changing application code.

🧮 Operation Coverage

MetaXuda includes a precompiled Metal math pipeline (cuda_pipeline.so) implementing a broad set of scalar and elementwise operations that can be invoked directly by Numba or higher-level tooling.

230+ operations covering:
- Arithmetic, comparison, and logical ops
- Trigonometric and hyperbolic functions
- Exponentials, logarithms, and powers
- Reductions and distance metrics
- Activation functions (ReLU, GELU, SiLU, Mish, etc.)
- Probability distributions and loss functions
- Signal, interpolation, and utility math
Each operation is mapped to a corresponding Metal expression
Selected ops support fast-math variants where numerically safe

This allows many Numba-generated kernels to execute without requiring full PTX → Metal translation, significantly reducing overhead.

🧠 Architecture Overview

Rust-based CUDA shim implementing core CUDA runtime APIs
Metal compute pipelines for kernel execution
Stubbed NVVM / libdevice layer for Numba compilation compatibility
Python package acts as a loader and distribution mechanism for native libraries

License

MetaXuda is free for students and personal use. Commercial use requires a license.

🎓 Students: Free with valid educational email
👤 Personal: Free for non-commercial projects
🏢 Commercial: Contact p.perinban@gmail.com

See LICENSE for full terms.

🙏 Disclaimer

MetaXuda is not affiliated with NVIDIA. CUDA is a trademark of NVIDIA Corporation. This project is an independent compatibility layer intended for research and development purposes.

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
- Science/Research
Operating System
- MacOS
Programming Language
- Python :: 3
Topic
- Scientific/Engineering
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

This version

2.0.0

Dec 16, 2025

0.1.9

Oct 2, 2025

0.1.8

Oct 2, 2025

0.1.7

Sep 28, 2025

0.1.6

Sep 23, 2025

0.1.5

Sep 23, 2025

0.1.3

Sep 22, 2025

0.1.1

Sep 22, 2025

0.1.0

Sep 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaxuda-2.0.0.tar.gz (2.8 MB view details)

Uploaded Dec 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

metaxuda-2.0.0-py3-none-any.whl (2.8 MB view details)

Uploaded Dec 16, 2025 Python 3

File details

Details for the file metaxuda-2.0.0.tar.gz.

File metadata

Download URL: metaxuda-2.0.0.tar.gz
Upload date: Dec 16, 2025
Size: 2.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for metaxuda-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`124fa7cdb87b10dc264ffb8e166c61e9b2d3dbfc986c27836173bb0847304b9c`
MD5	`c6f4434217d83a1fbbf7374ccac92f89`
BLAKE2b-256	`4fa2e7b58449f53a08036ec2044846b50096ddd74825489f3be2330cd6cf178c`

See more details on using hashes here.

File details

Details for the file metaxuda-2.0.0-py3-none-any.whl.

File metadata

Download URL: metaxuda-2.0.0-py3-none-any.whl
Upload date: Dec 16, 2025
Size: 2.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for metaxuda-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f49ceeb5d25f7315d939c4615e927c19ac19b2c40994d4d92019a1766da5058d`
MD5	`35ab909cfae6331bb6274a0d091a135d`
BLAKE2b-256	`50b89a25311e902d8752f7ddf9abf3b683c992296a2cbc5b91e9661866aa5f21`

See more details on using hashes here.

metaxuda 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

MetaXuda

✨ Features

⚠️ Project Status

⚙️ Installation

Requirements

Install (Editable / Dev)

📂 Package Layout

🚀 Usage

🗜️ Quantization, Compression, and Disk Tiering

Environment Configuration

🧮 Operation Coverage

🧠 Architecture Overview

License

🙏 Disclaimer

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes