Skip to main content

High-Performance BLAS Library by OktoSeek - Tensor Core GEMM and Fused Attention

Project description

OktoBLAS

The Independent BLAS Engine Powering OktoEngine


What is OktoBLAS?

OktoBLAS is a proprietary, high-performance Basic Linear Algebra Subprograms (BLAS) engine developed by OktoSeek. It is the core computational backbone of OktoEngine, our native AI training and inference platform.

Unlike wrapper libraries, OktoBLAS is built entirely from scratch using Rust and hand-tuned CUDA PTX assembly — with zero dependency on NVIDIA cuBLAS.

🎯 Key Highlights

100% Independent No cuBLAS, no external BLAS dependencies
Hand-Tuned PTX Every kernel optimized at assembly level
Tensor Core Native Built for NVIDIA Tensor Cores (WMMA)
Production Ready Powers OktoEngine in production
Python Available Also released as standalone Python package

🏆 Performance

All benchmarks on NVIDIA RTX 4070 Laptop GPU using CUDA Events.

FP16 GEMM (Tensor Cores)

Matrix Size OktoBLAS PyTorch Performance
1024×1024 29.1 TF 23.3 TF 125%
2048×2048 35.1 TF 34.6 TF 101%
4096×4096 36.5 TF 38.9 TF 94%

Fused Attention

Config OktoBLAS PyTorch Speedup
B4 S256 D64 0.96 TF 0.28 TF 3.4x
B4 S512 D64 1.22 TF 0.93 TF 1.3x

📦 Installation

pip install oktoblas

📖 Quick Start

import oktoblas as ob
import numpy as np

# FP16 Matrix Multiplication (Tensor Cores)
A = np.random.randn(2048, 2048).astype(np.float16)
B = np.random.randn(2048, 2048).astype(np.float16)
C = ob.matmul_fp16(A, B)  # 35+ TFLOPS

# Fused Attention (3x faster)
Q = np.random.randn(4, 512, 64).astype(np.float32)
K = np.random.randn(4, 512, 64).astype(np.float32)
V = np.random.randn(4, 512, 64).astype(np.float32)
output = ob.attention(Q, K, V)

# Library info
ob.info()

Output

============================================================
OktoBLAS by OktoSeek
High-Performance BLAS Library
============================================================
Version: 1.0.2
License: Proprietary (c) 2025 OktoSeek AI
Backend: CUDA PTX (Tensor Cores)

Features:
  - FP16/FP32 GEMM with Tensor Cores
  - Fused Attention kernel
  - 100% Independent (no cuBLAS)

https://www.oktoseek.com
============================================================

🔥 API Reference

# GEMM Operations
ob.matmul(A, B)           # FP32 matrix multiplication
ob.matmul_fp16(A, B)      # FP16 with Tensor Cores

# Fused Operations
ob.attention(Q, K, V)     # Fused Q×K^T×V attention

# Utilities
ob.info()                 # Library information
ob.is_cuda_available()    # Check GPU availability
ob.benchmark(op, size)    # Run benchmarks

🧪 OktoScript Integration

Within OktoEngine, OktoBLAS is configured through OktoScript:

BLAS {
    backend: "oktoblas"
    precision: "fp16"
}

ACCELERATE {
    gemm: "oktoblas"
    attention: "oktoblas"
}

TENSOR_CORES {
    enabled: true
}

🌐 OktoSeek Ecosystem

OktoBLAS is a core component of OktoSeek AI:

Component Description
OktoScript AI programming language
OktoEngine Native AI training runtime
OktoBLAS High-performance BLAS engine
OkTensor GPU tensor library
OktoStudio AI development IDE

📜 License

Proprietary License — Free for personal and commercial use.

Copyright © 2025 OktoSeek AI. All Rights Reserved.


🔗 Links


OktoBLAS — The BLAS engine built for AI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oktoblas-1.0.2-cp310-cp310-win_amd64.whl (176.8 kB view details)

Uploaded CPython 3.10Windows x86-64

File details

Details for the file oktoblas-1.0.2-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for oktoblas-1.0.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 6a9c550ba6b24ff735094bd519fbc78323441a5397edf3f6415c2ecfb2f3c98b
MD5 a634549e4eb1de39b2c354179f914632
BLAKE2b-256 44172a904f90da857080e1c0bb15e056e83e9adae6fa346d79a8af49f0199eff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page