High-Performance BLAS Library by OktoSeek - Tensor Core GEMM and Fused Attention

These details have not been verified by PyPI

Project links

Project description

OktoBLAS by OktoSeek

🚀 High-Performance BLAS Library | ⚡ Tensor Core Acceleration | 🔥 100% Independent

OktoBLAS is a high-performance, fully independent BLAS library built from scratch in Rust + CUDA PTX, with no cuBLAS dependency.

🔧 Installation

pip install oktoblas

📖 Quick Start

import oktoblas as ob
import numpy as np

# Matrix multiplication
A = np.random.randn(2048, 2048).astype(np.float32)
B = np.random.randn(2048, 2048).astype(np.float32)
C = ob.matmul(A, B)

# FP16 with Tensor Cores
A16 = np.random.randn(2048, 2048).astype(np.float16)
B16 = np.random.randn(2048, 2048).astype(np.float16)
C16 = ob.matmul_fp16(A16, B16)

# Fused Attention
batch, seq_len, head_dim = 4, 512, 64
Q = np.random.randn(batch, seq_len, head_dim).astype(np.float32)
K = np.random.randn(batch, seq_len, head_dim).astype(np.float32)
V = np.random.randn(batch, seq_len, head_dim).astype(np.float32)
output = ob.attention(Q, K, V)

# Show info
ob.info()

🔥 PyTorch Integration

import torch
import oktoblas as ob

# Use OktoBLAS with PyTorch tensors
A = torch.randn(2048, 2048, device='cuda', dtype=torch.float16)
B = torch.randn(2048, 2048, device='cuda', dtype=torch.float16)

C = ob.matmul_fp16(A.cpu().numpy(), B.cpu().numpy())

🎯 Features

Feature	Description
FP16/FP32 GEMM	Tensor Core acceleration
Fused Attention	Single kernel Q×K×V
100% Independent	No cuBLAS dependency
Hand-Tuned PTX	Optimized CUDA kernels

📊 Benchmark Results (RTX 4070 Laptop)

All benchmarks validated using CUDA Events.

FP16 GEMM (Tensor Cores)

Matrix Size	OktoBLAS	PyTorch	Ratio
1024×1024	29.1 TF	23.3 TF	125%
2048×2048	35.1 TF	34.6 TF	101%
4096×4096	36.5 TF	38.9 TF	94%

Fused Attention

Config	OktoBLAS	PyTorch	Ratio
B4 S256 D64	0.96 TF	0.28 TF	346%
B4 S512 D64	1.22 TF	0.93 TF	131%

🚀 Roadmap

FP16/FP32 GEMM with Tensor Cores
Fused Attention kernel
PyPI package release
ROCm (AMD) support
Metal (Apple) support
Full PyTorch autograd integration

📚 Part of OktoSeek Ecosystem

OktoBLAS is part of the OktoSeek ecosystem:

Project	Description	Link
OktoScript	AI programming language	GitHub
OktoEngine	Native ML inference engine	Coming soon
OktoStudio	AI Development IDE	Coming soon
OktoBLAS	High-performance BLAS	GitHub
OkTensor	GPU tensor library	Part of OktoEngine

📜 License

Proprietary License - Free for personal and commercial use.

See LICENSE.txt for details.

🙏 Credits

Built with ❤️ by OktoSeek AI.

⭐ Star us on GitHub!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.9.post1

Dec 6, 2025

1.0.9

Dec 6, 2025

1.0.8

Dec 6, 2025

1.0.7

Dec 6, 2025

1.0.6

Dec 5, 2025

1.0.5

Dec 5, 2025

1.0.4

Dec 5, 2025

1.0.3

Dec 5, 2025

1.0.2

Dec 5, 2025

This version

1.0.1

Dec 5, 2025

1.0.0

Dec 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

oktoblas-1.0.1-cp310-cp310-win_amd64.whl (198.4 kB view details)

Uploaded Dec 5, 2025 CPython 3.10Windows x86-64

File details

Details for the file oktoblas-1.0.1-cp310-cp310-win_amd64.whl.

File metadata

Download URL: oktoblas-1.0.1-cp310-cp310-win_amd64.whl
Upload date: Dec 5, 2025
Size: 198.4 kB
Tags: CPython 3.10, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.10.2

File hashes

Hashes for oktoblas-1.0.1-cp310-cp310-win_amd64.whl
Algorithm	Hash digest
SHA256	`269799967bf1df3539c06d143d5749250d6a32f80949a63d1c94ca48f5838f0c`
MD5	`9a5968c8e621af04d6b6567830a5f260`
BLAKE2b-256	`b5bdb6eaf6bf22f5ba5b6fd4f29efd84fecb47cdac868a93588171b13d1df770`

See more details on using hashes here.

oktoblas 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OktoBLAS by OktoSeek

🔧 Installation

📖 Quick Start

🔥 PyTorch Integration

🎯 Features

📊 Benchmark Results (RTX 4070 Laptop)

FP16 GEMM (Tensor Cores)

Fused Attention

🚀 Roadmap

📚 Part of OktoSeek Ecosystem

📜 License

🙏 Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes