High-Performance BLAS Library by OktoSeek - Tensor Core GEMM and Fused Attention

These details have not been verified by PyPI

Project links

Project description

OktoBLAS

The Independent BLAS Engine Powering OktoEngine

What is OktoBLAS?

OktoBLAS is a proprietary, high-performance Basic Linear Algebra Subprograms (BLAS) engine developed by OktoSeek. It is the core computational backbone of OktoEngine, our native AI training and inference platform.

Unlike wrapper libraries, OktoBLAS is built entirely from scratch using Rust and hand-tuned CUDA PTX assembly — with zero dependency on NVIDIA cuBLAS.

🎯 Key Highlights


100% Independent	No cuBLAS, no external BLAS dependencies
Hand-Tuned PTX	Every kernel optimized at assembly level
Tensor Core Native	Built for NVIDIA Tensor Cores (WMMA)
Production Ready	Powers OktoEngine in production
Python Available	Also released as standalone Python package

🏆 Performance

All benchmarks on NVIDIA RTX 4070 Laptop GPU using CUDA Events.

FP16 GEMM (Tensor Cores)

Matrix Size	OktoBLAS	PyTorch	Performance
1024×1024	29.1 TF	23.3 TF	125% ✓
2048×2048	35.1 TF	34.6 TF	101% ✓
4096×4096	36.5 TF	38.9 TF	94%

Fused Attention

Config	OktoBLAS	PyTorch	Speedup
B4 S256 D64	0.96 TF	0.28 TF	3.4x
B4 S512 D64	1.22 TF	0.93 TF	1.3x

📦 Installation

pip install oktoblas

📖 Quick Start

import oktoblas as ob
import numpy as np

# FP16 Matrix Multiplication (Tensor Cores)
A = np.random.randn(2048, 2048).astype(np.float16)
B = np.random.randn(2048, 2048).astype(np.float16)
C = ob.matmul_fp16(A, B)  # 35+ TFLOPS

# Fused Attention (3x faster)
Q = np.random.randn(4, 512, 64).astype(np.float32)
K = np.random.randn(4, 512, 64).astype(np.float32)
V = np.random.randn(4, 512, 64).astype(np.float32)
output = ob.attention(Q, K, V)

# Library info
ob.info()

Output

============================================================
OktoBLAS by OktoSeek
High-Performance BLAS Library
============================================================
Version: 1.0.2
License: Proprietary (c) 2025 OktoSeek AI
Backend: CUDA PTX (Tensor Cores)

Features:
  - FP16/FP32 GEMM with Tensor Cores
  - Fused Attention kernel
  - 100% Independent (no cuBLAS)

https://www.oktoseek.com
============================================================

🔥 API Reference

# GEMM Operations
ob.matmul(A, B)           # FP32 matrix multiplication
ob.matmul_fp16(A, B)      # FP16 with Tensor Cores

# Fused Operations
ob.attention(Q, K, V)     # Fused Q×K^T×V attention

# Utilities
ob.info()                 # Library information
ob.is_cuda_available()    # Check GPU availability
ob.benchmark(op, size)    # Run benchmarks

🧪 OktoScript Integration

Within OktoEngine, OktoBLAS is configured through OktoScript:

BLAS {
    backend: "oktoblas"
    precision: "fp16"
}

ACCELERATE {
    gemm: "oktoblas"
    attention: "oktoblas"
}

TENSOR_CORES {
    enabled: true
}

🌐 OktoSeek Ecosystem

OktoBLAS is a core component of OktoSeek AI:

Component	Description
OktoScript	AI programming language
OktoEngine	Native AI training runtime
OktoBLAS	High-performance BLAS engine
OkTensor	GPU tensor library
OktoStudio	AI development IDE

📜 License

Proprietary License — Free for personal and commercial use.

🔗 Links

Website: oktoseek.com
GitHub: github.com/oktoseek
PyPI: pypi.org/project/oktoblas

OktoBLAS — The BLAS engine built for AI

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.9.post1

Dec 6, 2025

1.0.9

Dec 6, 2025

1.0.8

Dec 6, 2025

1.0.7

Dec 6, 2025

1.0.6

Dec 5, 2025

1.0.5

Dec 5, 2025

This version

1.0.4

Dec 5, 2025

1.0.3

Dec 5, 2025

1.0.2

Dec 5, 2025

1.0.1

Dec 5, 2025

1.0.0

Dec 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

oktoblas-1.0.4-cp310-cp310-win_amd64.whl (176.8 kB view details)

Uploaded Dec 5, 2025 CPython 3.10Windows x86-64

File details

Details for the file oktoblas-1.0.4-cp310-cp310-win_amd64.whl.

File metadata

Download URL: oktoblas-1.0.4-cp310-cp310-win_amd64.whl
Upload date: Dec 5, 2025
Size: 176.8 kB
Tags: CPython 3.10, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.10.2

File hashes

Hashes for oktoblas-1.0.4-cp310-cp310-win_amd64.whl
Algorithm	Hash digest
SHA256	`6a79e6e5143cdd30653d10a5cb12633ff6999a290e3d9641578357d4aa1e0582`
MD5	`775f4576d7b09ce1421de492e46a3ee1`
BLAKE2b-256	`ef4e745ac5823bc382028594416229f1c5ec75409f68862977c41eaa7f1fb6ed`

See more details on using hashes here.

oktoblas 1.0.4

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OktoBLAS

What is OktoBLAS?

🎯 Key Highlights

🏆 Performance

FP16 GEMM (Tensor Cores)

Fused Attention

📦 Installation

📖 Quick Start

Output

🔥 API Reference

🧪 OktoScript Integration

🌐 OktoSeek Ecosystem

📜 License

🔗 Links

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes