Naive implementation of speculative decoding
Project description
Naive Speculate
This repository implements the speculative decoding technique naively. I coded it primarily for understanding this technique better.
I originally intended to write a rather large project to serve as a framework of speculative decoding, but finally found that it would take more time than I originally expected. Therefore it ends up as a primitive, naive reproduction of the speculative decoding technique.
Currently, the supported model family is the Qwen3 series. In experiments so far, speedup appears only when there is a large scale gap between drafter and verifier models (for example, Qwen3-0.6B as drafter and Qwen3-8B as verifier). With smaller verifier models, speculative decoding is actually slower than autoregressive decoding. (Well, maybe there is a bug in my implementation.)
Getting Started
Installation
To run the code, first clone this repo:
git clone git@github.com:VioletsOleander/naive-speculate.git
Then install dependencies with one of the optional extras:
CPU:
uv sync --extra cpu
CUDA 12.8:
uv sync --extra cu128
After that, an executable named spec will be installed in the environment.
Run an Example
Specify configuration and input context in separate files. Example files are provided at the project root: config.example.toml and context.example.json.
Run:
spec config.example.toml context.example.json --rounds 10 --verbose
which will use the example config to run the code, and execute speculative decoding for 10 rounds.
On first run, models are downloaded automatically from the Hugging Face Hub. The example config uses Qwen3-0.6B and Qwen3-8B.
For CLI options:
spec --help
Configure Input Files
To customize configuration, see config.example.toml. If you use Even Better TOML, add:
#:schema config-schema.json
at the top of your TOML file to enable completion and hover hints based on config-schema.json.
To customize context, see context.example.json. The format is a list of dict objects, where each object defines role and content.
More Information
For more information, see the docs, which briefly describe the project structure.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file naive_speculate-0.1.0.tar.gz.
File metadata
- Download URL: naive_speculate-0.1.0.tar.gz
- Upload date:
- Size: 26.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99e0e280d5526a5515ff82a4d02101f593adb979bfa895703268d1e4134862bb
|
|
| MD5 |
bfff855e8b92ba1a396bd772ae59bd18
|
|
| BLAKE2b-256 |
870b1da8d64d13265fc257ad67e36cadf5a445d0c27e80e363ab4ebc765e907f
|
File details
Details for the file naive_speculate-0.1.0-py3-none-any.whl.
File metadata
- Download URL: naive_speculate-0.1.0-py3-none-any.whl
- Upload date:
- Size: 48.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f6e835a447608329b95344af9b29464edd4a20d0803b7133b55c1c95094114c
|
|
| MD5 |
5faf5b54c9403af6cd049ffebabd85c0
|
|
| BLAKE2b-256 |
d24ae5461c3f3186309d81ed5e079b7abb43cc2e848dd2942b4fbfc5ceb80c3d
|