KeyDNN is a lightweight deep learning framework with explicit CPU/CUDA execution and clean architectural boundaries.
Project description
KeyDNN
KeyDNN is a lightweight deep learning framework built from scratch in Python, with a strong focus on:
- clean architecture and explicit interfaces
- a practical CPU / CUDA execution stack
- correctness-first design validated by CPU ↔ CUDA parity tests
It is designed to be both:
- a learning-friendly implementation of modern DL abstractions (Tensor, autograd, modules), and
- a performance-oriented sandbox for building real backends (native CPU kernels, CUDA kernels, vendor libraries).
🚧 Status: v2.1.0 (beta).
The v2 public API is largely stable and actively evolving toward a v2.1 release. Breaking changes are avoided when possible and documented when necessary.
📚 Documentation: https://keywind127.github.io/keydnn_v2/
💻 Source: https://github.com/keywind127/keydnn_v2
Platform support
- OS: Windows 10 / 11 (x64 only)
- Python: ≥ 3.10
- CUDA: Optional (NVIDIA GPU required for acceleration)
CUDA acceleration requires a compatible CUDA runtime. Some backends use vendor libraries such as cuBLAS / cuDNN when available.
If CUDA is unavailable, CPU execution remains supported.
Support snapshot
- Windows (CPU): ✅ supported
- Windows (CUDA): ✅ supported (requires NVIDIA GPU + CUDA runtime; cuBLAS/cuDNN optional)
- Linux/macOS: ❌ not yet supported in v2.x (v0 has CPU-focused Linux support)
Highlights
- CUDA device-pointer–backed Tensor backend
- Explicit H2D / D2H / D2D memory boundaries (no implicit host materialization)
- Vendor-accelerated kernels:
- cuBLAS GEMM for
matmul - cuDNN acceleration for
conv2d/conv2d_transpose(when enabled)
- cuBLAS GEMM for
- CUDA implementations for core ops:
- elementwise ops
- reductions
- pooling
- in-place scalar ops (optimizer hot paths)
- Extensive CPU ↔ CUDA parity tests
- Standalone microbenchmarks under
scripts/
Installation
pip install keydnn
Development install:
git clone https://github.com/keywind127/keydnn_v2.git
cd keydnn_v2
pip install -e .
Quickstart
from keydnn.tensors import Tensor, Device
x = Tensor(shape=(2, 3), device=Device("cpu"), requires_grad=True)
y = (x * 2.0).sum()
y.backward()
print(x.grad.to_numpy())
CUDA example:
from keydnn.tensors import Tensor, Device
from keydnn.backend import cuda_available
device = Device("cuda:0") if cuda_available() else Device("cpu")
x = Tensor.rand((1024, 1024), device=device, requires_grad=True)
y = (x @ x.T).mean()
y.backward()
print("device:", device)
print("y:", y.item())
CUDA setup (Windows)
CUDA requires additional setup on Windows (CUDA runtime discovery and optional cuDNN). See the documentation for details:
Versioning note
KeyDNN v2 is a major rewrite and is not API-compatible with KeyDNN v0.
License
Licensed under the Apache License, Version 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file keydnn-2.1.0b1.tar.gz.
File metadata
- Download URL: keydnn-2.1.0b1.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3078417db12a947c19ad1054f33e91373af4dec83b32cab433a423b339ea7071
|
|
| MD5 |
6d48571ba7fad3871ae41d0848716c75
|
|
| BLAKE2b-256 |
e59c1a3bdf697fb6d3048316f589730717c3b213a60aede3b81f7b06641a2f51
|
File details
Details for the file keydnn-2.1.0b1-py3-none-any.whl.
File metadata
- Download URL: keydnn-2.1.0b1-py3-none-any.whl
- Upload date:
- Size: 1.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e802a2c128b90c699086b94e2644164557f30e050b32acab725e1fadd7226811
|
|
| MD5 |
9ec089ca9033cf4b1e6d53047afbcde7
|
|
| BLAKE2b-256 |
54ab9dd173598319f936a480f492ab740f4a255876f82a3ec5a8b7bbb60a96b9
|