Skip to main content

High-Performance BLAS Library by OktoSeek - Tensor Core GEMM and Fused Attention

Project description

OktoBLAS

The Independent BLAS Engine Powering OktoEngine


What is OktoBLAS?

OktoBLAS is a proprietary, high-performance Basic Linear Algebra Subprograms (BLAS) engine developed by OktoSeek. It is the core computational backbone of OktoEngine, our native AI training and inference platform.

Unlike wrapper libraries, OktoBLAS is built entirely from scratch using Rust and hand-tuned CUDA PTX assembly — with zero dependency on NVIDIA cuBLAS.

🎯 Key Highlights

100% Independent No cuBLAS, no external BLAS dependencies
Hand-Tuned PTX Every kernel optimized at assembly level
Tensor Core Native Built for NVIDIA Tensor Cores (WMMA)
Production Ready Powers OktoEngine in production
Python Available Also released as standalone Python package

🏆 Performance

All benchmarks on NVIDIA RTX 4070 Laptop GPU using CUDA Events.

FP16 GEMM (Tensor Cores)

Matrix Size OktoBLAS PyTorch Performance
1024×1024 29.1 TF 23.3 TF 125%
2048×2048 35.1 TF 34.6 TF 101%
4096×4096 36.5 TF 38.9 TF 94%

Fused Attention

Config OktoBLAS PyTorch Speedup
B4 S256 D64 0.96 TF 0.28 TF 3.4x
B4 S512 D64 1.22 TF 0.93 TF 1.3x

📦 Installation

pip install oktoblas

📖 Quick Start

import oktoblas as ob
import numpy as np

# FP16 Matrix Multiplication (Tensor Cores)
A = np.random.randn(2048, 2048).astype(np.float16)
B = np.random.randn(2048, 2048).astype(np.float16)
C = ob.matmul_fp16(A, B)  # 35+ TFLOPS

# Fused Attention (3x faster)
Q = np.random.randn(4, 512, 64).astype(np.float32)
K = np.random.randn(4, 512, 64).astype(np.float32)
V = np.random.randn(4, 512, 64).astype(np.float32)
output = ob.attention(Q, K, V)

# Library info
ob.info()

Output

============================================================
OktoBLAS by OktoSeek
High-Performance BLAS Library
============================================================
Version: 1.0.2
License: Proprietary (c) 2025 OktoSeek AI
Backend: CUDA PTX (Tensor Cores)

Features:
  - FP16/FP32 GEMM with Tensor Cores
  - Fused Attention kernel
  - 100% Independent (no cuBLAS)

https://www.oktoseek.com
============================================================

🔥 API Reference

# GEMM Operations
ob.matmul(A, B)           # FP32 matrix multiplication
ob.matmul_fp16(A, B)      # FP16 with Tensor Cores

# Fused Operations
ob.attention(Q, K, V)     # Fused Q×K^T×V attention

# Utilities
ob.info()                 # Library information
ob.is_cuda_available()    # Check GPU availability
ob.benchmark(op, size)    # Run benchmarks

🧪 OktoScript Integration

Within OktoEngine, OktoBLAS is configured through OktoScript:

BLAS {
    backend: "oktoblas"
    precision: "fp16"
}

ACCELERATE {
    gemm: "oktoblas"
    attention: "oktoblas"
}

TENSOR_CORES {
    enabled: true
}

🌐 OktoSeek Ecosystem

OktoBLAS is a core component of OktoSeek AI:

Component Description
OktoScript AI programming language
OktoEngine Native AI training runtime
OktoBLAS High-performance BLAS engine
OkTensor GPU tensor library
OktoStudio AI development IDE

📜 License

Proprietary License — Free for personal and commercial use.

Copyright © 2025 OktoSeek AI. All Rights Reserved.


🔗 Links


OktoBLAS — The BLAS engine built for AI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oktoblas-1.0.5-cp310-cp310-win_amd64.whl (176.8 kB view details)

Uploaded CPython 3.10Windows x86-64

File details

Details for the file oktoblas-1.0.5-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for oktoblas-1.0.5-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 b295fdd82b696c707f43456d65799930e3f9774b6e4c9135ee7edf707b4bde0c
MD5 b4f79483676398ea4a237c42ca04488f
BLAKE2b-256 348de4746b950a1c22df194c79d7133f39bcff7bc0829973269c4e7d22d32c18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page