Skip to main content

Cross-Platform ML Optimization Framework with ONNX Interpreter

Project description

Zenith

License Python PyPI CI CodeQL CUDA

Cross-Platform ML Optimization Framework

Zenith is a model-agnostic and hardware-agnostic unification and optimization framework for Machine Learning. It provides enterprise-grade performance optimizations that consistently outperform PyTorch in both inference and training workloads.

Project History

Zenith was conceived and architecturally designed on December 11, 2024, with the creation of its comprehensive blueprint document (CetakBiru.md) that outlines a 36-month development roadmap across 6 implementation phases. Active development began on January 12, 2025, and after 11 months of internal development, research, and rigorous testing, Zenith was publicly released on GitHub on December 16, 2025.

The project represents nearly a year of dedicated work in building a production-ready ML optimization framework from the ground up, implementing CUDA backends with cuDNN/cuBLAS integration, graph optimization passes, mixed precision support, and comprehensive testing infrastructure.


Performance Highlights

Benchmark Workload Result
GPU Memory Pool MatMul 1024x1024 50x faster than PyTorch
BERT Inference 12-layer encoder 1.09x faster than PyTorch
Training Loop 6-layer Transformer 1.02x faster than PyTorch
Memory Efficiency Zero-copy allocation 93.5% cache hit rate
INT8 Quantization Model compression 4x memory reduction

Benchmarked on NVIDIA Tesla T4 (Google Colab). See BENCHMARK_REPORT.md for full results.


Features

Core Capabilities

  • Unified API for PyTorch, TensorFlow, JAX, and ONNX models
  • Automatic graph optimizations (operator fusion, constant folding, dead code elimination)
  • Multi-backend support (CPU with SIMD, CUDA with cuDNN/cuBLAS)
  • Mixed precision inference (FP16, BF16, INT8)
  • Zero-copy GPU memory pooling for minimal allocation overhead

Optimization Passes

  • Conv-BatchNorm-ReLU fusion
  • Linear-GELU fusion (BERT-optimized)
  • LayerNorm-Add fusion
  • Constant folding and dead code elimination
  • INT8 quantization with calibration

Hardware Support

  • CPU: AVX2/FMA SIMD optimizations
  • NVIDIA GPU: CUDA 12.x with cuDNN 8.x and cuBLAS
  • AMD GPU: ROCm support (planned)
  • Intel: OneAPI support (planned)

Installation

Quick Install

pip install pyzenith

Installation Options

Choose the right installation based on your needs:

Command Use Case What's Included
pip install pyzenith Quick start, testing Core only (numpy)
pip install pyzenith[pytorch] PyTorch users + PyTorch 2.0+
pip install pyzenith[onnx] Model deployment, inference + ONNX + ONNX Runtime
pip install pyzenith[tensorflow] TensorFlow users + TensorFlow + tf2onnx
pip install pyzenith[jax] JAX/Flax users + JAX + JAXlib
pip install pyzenith[all] Full functionality All frameworks
pip install pyzenith[dev] Contributors + pytest, black, mypy, ruff

Recommended Installation

# For most ML users (PyTorch + ONNX export)
pip install pyzenith[pytorch,onnx]

# For full framework support
pip install pyzenith[all]

# For development/contribution
pip install pyzenith[dev]

Development Installation

git clone https://github.com/vibeswithkk/ZENITH.git
cd ZENITH
pip install -e ".[dev]"

CUDA Build (for Maximum GPU Performance)

For full CUDA kernel acceleration (50x speedup):

# On Google Colab or Linux with CUDA
git clone https://github.com/vibeswithkk/ZENITH.git
cd ZENITH
bash build_cuda.sh

# Verify installation
python -c "from zenith._zenith_core import backends; print(backends.list_available())"
# Output: ['cpu', 'cuda']

Note: Without CUDA build, Zenith still provides full performance via PyTorch/TensorFlow CUDA backends.


Quick Start

Basic Usage

import zenith
from zenith.core import GraphIR, DataType, Shape, TensorDescriptor

# Create a computation graph
graph = GraphIR(name="my_model")
graph.add_input(TensorDescriptor("x", Shape([1, 3, 224, 224]), DataType.Float32))

# Apply optimizations
from zenith.optimization import PassManager
pm = PassManager()
pm.add("constant_folding")
pm.add("dead_code_elimination")
pm.add("operator_fusion")
optimized = pm.run(graph)

CUDA Operations

import numpy as np
from zenith._zenith_core import cuda

# Check CUDA availability
print(f"CUDA available: {cuda.is_available()}")

# Matrix multiplication (50x faster than PyTorch)
A = np.random.randn(1024, 1024).astype(np.float32)
B = np.random.randn(1024, 1024).astype(np.float32)
C = cuda.matmul(A, B)

# GPU operations
cuda.gelu(input_tensor)
cuda.layernorm(input_tensor, gamma, beta, eps=1e-5)
cuda.softmax(input_tensor)

Architecture

+-------------------------------------------------------------+
|                    Python User Interface                    |
|                  (zenith.api, zenith.core)                  |
+-------------------------------------------------------------+
|              Framework-Specific Adapters Layer              |
|          (PyTorch, TensorFlow, JAX -> ONNX -> IR)           |
+-------------------------------------------------------------+
|       Core Optimization & Compilation Engine (C++)          |
|  - Graph IR with type-safe operations                       |
|  - PassManager with optimization passes                     |
|  - Kernel Registry and Dispatcher                           |
+-------------------------------------------------------------+
|           Hardware Abstraction Layer (HAL)                  |
|     CPU (AVX2/FMA) | CUDA (cuDNN/cuBLAS) | ROCm | OneAPI    |
+-------------------------------------------------------------+

Benchmarks

BERT-Base Inference (12 layers, batch=1, seq=128)

Mode Latency vs PyTorch
Pure PyTorch 10.60 ms baseline
Zenith + PyTorch 9.74 ms 1.09x faster

ResNet-50 Throughput

Batch Size Throughput
1 150 img/sec
64 377 img/sec
512 359 img/sec

GPU Memory Pool

Metric Value
Cache Hit Rate 93.5%
Speedup vs naive 330x

Testing

# Run all Python tests
pytest tests/python/ -v

# Run with coverage
pytest tests/python/ --cov=zenith --cov-report=term-missing

# Run C++ unit tests (after CUDA build)
./build/tests/test_core

# Security scan
bandit -r zenith/ -ll

Test Status

  • Python Tests: 198+ passed
  • C++ Tests: 34/34 passed
  • Code Coverage: 66%+
  • Security Issues: 0 HIGH severity

Documentation


Project Status

Zenith is currently in active development with the following milestones completed:

  • Phase 1: Core Graph IR and C++ foundation
  • Phase 2: CUDA backend with cuDNN/cuBLAS integration
  • Phase 3: Optimization passes and quantization
  • Phase 4: Quality assurance and documentation

Contributing

Contributions are welcome. Please ensure all tests pass before submitting pull requests.

# Setup development environment
pip install -e ".[dev]"

# Run tests before committing
pytest tests/python/ -v

Author

Wahyu Ardiansyah - Lead Architect and Developer

License

Apache License 2.0 - See LICENSE for details.

Copyright 2025 Wahyu Ardiansyah. All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyzenith-0.2.6.tar.gz (410.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyzenith-0.2.6-py3-none-any.whl (394.3 kB view details)

Uploaded Python 3

File details

Details for the file pyzenith-0.2.6.tar.gz.

File metadata

  • Download URL: pyzenith-0.2.6.tar.gz
  • Upload date:
  • Size: 410.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pyzenith-0.2.6.tar.gz
Algorithm Hash digest
SHA256 65d0444c634437e3a9a64029002a0e332b4ede2c87dd5c094401eb5cc0f84ba3
MD5 068489a01b9cf61b4296c9bbfcc2c0ea
BLAKE2b-256 b011b851c580a38932187701268f235052d9052d66823dc85322ca585277adc9

See more details on using hashes here.

File details

Details for the file pyzenith-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: pyzenith-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 394.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pyzenith-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 7e60a3c2385f70c90197eacc2226eed8f450752ff14d6bfb2d4a94b2d4a60032
MD5 12600dd79dc97fc8d1193fdcb5333050
BLAKE2b-256 988cd398646c987a3199d9eeee926ec1441da76738c20ad05c4dacc4bd6940b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page