26 projects
tokenspeed-smg
High-performance Rust-based inference gateway for large-scale LLM deployments
tokenspeed-smg-grpc-servicer
SMG gRPC servicer implementations for LLM inference engines (vLLM, SGLang, MLX, TokenSpeed)
tokenspeed-smg-grpc-proto
SMG gRPC proto definitions for SGLang, vLLM, TRT-LLM, and MLX
tokenspeed-kernel-nvidia
Placeholder package for TokenSpeed NVIDIA kernel distribution.
tokenspeed-kernel-amd
Placeholder package for TokenSpeed AMD kernel distribution.
tokenspeed-triton
A language and compiler for custom Deep Learning operations (vendor release for TokenSpeed)
tokenspeed-proton
A profiler for Triton (vendor release for TokenSpeed)
tokenspeed-mooncake
Python binding of a Mooncake library using pybind11
tokenspeed-mla
Speed-of-light TokenSpeed MLA kernels for Blackwell SM100 and SM103.
tokenspeed-trie
A small harness for evaluating OpenAI-compatible inference endpoints with synthetic agentic workloads.
tokenspeed-iris
Triton-based framework for Remote Memory Access (RMA) operations with SHMEM-like APIs for multi-GPU programming.
tokenspeed-tritonblas
A Lightweight Triton-based BLAS Library
tokenspeed-triton-kernels
None
tokenspeed-fa4
Flash Attention CUTE (CUDA Template Engine) implementation
tokenspeed-trtllm-kernel
Standalone TensorRT-LLM CUDA kernels as PyTorch custom ops
tokenspeed-flashmla
None
tokenspeed-deepep
None
tokenspeed-deepgemm
None
smg
High-performance Rust-based inference gateway for large-scale LLM deployments
tokenspeed-fa3
FlashAttention-3
tokenspeed-fast-hadamard-transform
Fast Hadamard Transform in CUDA, with a PyTorch interface
tokenspeed-scheduler
Name reserved for the tokenspeed-scheduler project.
modelgt
Name reserved for the modelgt project.
tokenspeed-kernel
Name reserved for the tokenspeed-kernel project.
tokenspeed
Name reserved for the tokenspeed project.
torchspec
TorchSpec (placeholder package name reservation).