High-Performance BLAS Library by OktoSeek - Tensor Core GEMM and Fused Attention
Project description
OktoBLAS
The Independent BLAS Engine Powering OktoEngine
What is OktoBLAS?
OktoBLAS is a proprietary, high-performance Basic Linear Algebra Subprograms (BLAS) engine developed by OktoSeek. It is the core computational backbone of OktoEngine, our native AI training and inference platform.
Unlike wrapper libraries, OktoBLAS is built entirely from scratch using Rust and hand-tuned CUDA PTX assembly — with zero dependency on NVIDIA cuBLAS.
🎯 Key Highlights
| 100% Independent | No cuBLAS, no external BLAS dependencies |
| Hand-Tuned PTX | Every kernel optimized at assembly level |
| Tensor Core Native | Built for NVIDIA Tensor Cores (WMMA) |
| Production Ready | Powers OktoEngine in production |
| Python Available | Also released as standalone Python package |
🏆 Performance
All benchmarks on NVIDIA RTX 4070 Laptop GPU using CUDA Events.
FP16 GEMM (Tensor Cores)
| Matrix Size | OktoBLAS | PyTorch | Performance |
|---|---|---|---|
| 1024×1024 | 29.1 TF | 23.3 TF | 125% ✓ |
| 2048×2048 | 35.1 TF | 34.6 TF | 101% ✓ |
| 4096×4096 | 36.5 TF | 38.9 TF | 94% |
Fused Attention
| Config | OktoBLAS | PyTorch | Speedup |
|---|---|---|---|
| B4 S256 D64 | 0.96 TF | 0.28 TF | 3.4x |
| B4 S512 D64 | 1.22 TF | 0.93 TF | 1.3x |
📦 Installation
pip install oktoblas
📖 Quick Start
import oktoblas as ob
import numpy as np
# FP16 Matrix Multiplication (Tensor Cores)
A = np.random.randn(2048, 2048).astype(np.float16)
B = np.random.randn(2048, 2048).astype(np.float16)
C = ob.matmul_fp16(A, B) # 35+ TFLOPS
# Fused Attention (3x faster)
Q = np.random.randn(4, 512, 64).astype(np.float32)
K = np.random.randn(4, 512, 64).astype(np.float32)
V = np.random.randn(4, 512, 64).astype(np.float32)
output = ob.attention(Q, K, V)
# Library info
ob.info()
Output
============================================================
OktoBLAS by OktoSeek
High-Performance BLAS Library
============================================================
Version: 1.0.2
License: Proprietary (c) 2025 OktoSeek AI
Backend: CUDA PTX (Tensor Cores)
Features:
- FP16/FP32 GEMM with Tensor Cores
- Fused Attention kernel
- 100% Independent (no cuBLAS)
https://www.oktoseek.com
============================================================
🔥 API Reference
# GEMM Operations
ob.matmul(A, B) # FP32 matrix multiplication
ob.matmul_fp16(A, B) # FP16 with Tensor Cores
# Fused Operations
ob.attention(Q, K, V) # Fused Q×K^T×V attention
# Utilities
ob.info() # Library information
ob.is_cuda_available() # Check GPU availability
ob.benchmark(op, size) # Run benchmarks
🧪 OktoScript Integration
Within OktoEngine, OktoBLAS is configured through OktoScript:
BLAS {
backend: "oktoblas"
precision: "fp16"
}
ACCELERATE {
gemm: "oktoblas"
attention: "oktoblas"
}
TENSOR_CORES {
enabled: true
}
🌐 OktoSeek Ecosystem
OktoBLAS is a core component of OktoSeek AI:
| Component | Description |
|---|---|
| OktoScript | AI programming language |
| OktoEngine | Native AI training runtime |
| OktoBLAS | High-performance BLAS engine |
| OkTensor | GPU tensor library |
| OktoStudio | AI development IDE |
📜 License
Proprietary License — Free for personal and commercial use.
Copyright © 2025 OktoSeek AI. All Rights Reserved.
🔗 Links
- Website: oktoseek.com
- GitHub: github.com/oktoseek
- PyPI: pypi.org/project/oktoblas
OktoBLAS — The BLAS engine built for AI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oktoblas-1.0.4-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: oktoblas-1.0.4-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 176.8 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a79e6e5143cdd30653d10a5cb12633ff6999a290e3d9641578357d4aa1e0582
|
|
| MD5 |
775f4576d7b09ce1421de492e46a3ee1
|
|
| BLAKE2b-256 |
ef4e745ac5823bc382028594416229f1c5ec75409f68862977c41eaa7f1fb6ed
|