Skip to main content

High-Performance BLAS Library by OktoSeek - Tensor Core GEMM and Fused Attention

Project description

OktoBLAS

The Independent BLAS Engine Powering OktoEngine


What is OktoBLAS?

OktoBLAS is a proprietary, high-performance Basic Linear Algebra Subprograms (BLAS) engine developed by OktoSeek. It is the core computational backbone of OktoEngine, our native AI training and inference platform.

Unlike wrapper libraries, OktoBLAS is built entirely from scratch using Rust and hand-tuned CUDA PTX assembly — with zero dependency on NVIDIA cuBLAS.

🎯 Key Highlights

100% Independent No cuBLAS, no external BLAS dependencies
Hand-Tuned PTX Every kernel optimized at assembly level
Tensor Core Native Built for NVIDIA Tensor Cores (WMMA)
Production Ready Powers OktoEngine in production
Python Available Also released as standalone Python package

🏆 Performance

All benchmarks on NVIDIA RTX 4070 Laptop GPU using CUDA Events.

FP16 GEMM (Tensor Cores)

Matrix Size OktoBLAS PyTorch Performance
1024×1024 29.1 TF 23.3 TF 125%
2048×2048 35.1 TF 34.6 TF 101%
4096×4096 36.5 TF 38.9 TF 94%

Fused Attention

Config OktoBLAS PyTorch Speedup
B4 S256 D64 0.96 TF 0.28 TF 3.4x
B4 S512 D64 1.22 TF 0.93 TF 1.3x

📦 Installation

pip install oktoblas

📖 Quick Start

import oktoblas as ob
import numpy as np

# FP16 Matrix Multiplication (Tensor Cores)
A = np.random.randn(2048, 2048).astype(np.float16)
B = np.random.randn(2048, 2048).astype(np.float16)
C = ob.matmul_fp16(A, B)  # 35+ TFLOPS

# Fused Attention (3x faster)
Q = np.random.randn(4, 512, 64).astype(np.float32)
K = np.random.randn(4, 512, 64).astype(np.float32)
V = np.random.randn(4, 512, 64).astype(np.float32)
output = ob.attention(Q, K, V)

# Library info
ob.info()

Output

============================================================
OktoBLAS by OktoSeek
High-Performance BLAS Library
============================================================
Version: 1.0.2
License: Proprietary (c) 2025 OktoSeek AI
Backend: CUDA PTX (Tensor Cores)

Features:
  - FP16/FP32 GEMM with Tensor Cores
  - Fused Attention kernel
  - 100% Independent (no cuBLAS)

https://www.oktoseek.com
============================================================

🔥 API Reference

# GEMM Operations
ob.matmul(A, B)           # FP32 matrix multiplication
ob.matmul_fp16(A, B)      # FP16 with Tensor Cores

# Fused Operations
ob.attention(Q, K, V)     # Fused Q×K^T×V attention

# Utilities
ob.info()                 # Library information
ob.is_cuda_available()    # Check GPU availability
ob.benchmark(op, size)    # Run benchmarks

🧪 OktoScript Integration

Within OktoEngine, OktoBLAS is configured through OktoScript:

BLAS {
    backend: "oktoblas"
    precision: "fp16"
}

ACCELERATE {
    gemm: "oktoblas"
    attention: "oktoblas"
}

TENSOR_CORES {
    enabled: true
}

🌐 OktoSeek Ecosystem

OktoBLAS is a core component of OktoSeek AI:

Component Description
OktoScript AI programming language
OktoEngine Native AI training runtime
OktoBLAS High-performance BLAS engine
OkTensor GPU tensor library
OktoStudio AI development IDE

📜 License

Proprietary License — Free for personal and commercial use.

Copyright © 2025 OktoSeek AI. All Rights Reserved.


🔗 Links


OktoBLAS — The BLAS engine built for AI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oktoblas-1.0.3-cp310-cp310-win_amd64.whl (176.8 kB view details)

Uploaded CPython 3.10Windows x86-64

File details

Details for the file oktoblas-1.0.3-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for oktoblas-1.0.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 2e76c6a3543aad580913abe95f60c985bdcd90faaf936f0cc891e2f7446cd9da
MD5 84bfc38d45843668d230c6e25794dd0e
BLAKE2b-256 67b65a1a872f36ec91246f1cc1c76875f2fb496ccaa1fe3de9ab55af0ffcd6d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page