Skip to main content

Naive implementation of speculative decoding

Project description

Naive Speculate

This repository implements the speculative decoding technique naively. I coded it primarily for understanding this technique better.

I originally intended to write a rather large project to serve as a framework of speculative decoding, but finally found that it would take more time than I originally expected. Therefore it ends up as a primitive, naive reproduction of the speculative decoding technique.

Currently, the supported model family is the Qwen3 series. In experiments so far, speedup appears only when there is a large scale gap between drafter and verifier models (for example, Qwen3-0.6B as drafter and Qwen3-8B as verifier). With smaller verifier models, speculative decoding is actually slower than autoregressive decoding. (Well, maybe there is a bug in my implementation.)

Getting Started

Installation

To run the code, first clone this repo:

git clone git@github.com:VioletsOleander/naive-speculate.git

Then install dependencies with one of the optional extras:

CPU:

uv sync --extra cpu

CUDA 12.8:

uv sync --extra cu128

After that, an executable named spec will be installed in the environment.

Run an Example

Specify configuration and input context in separate files. Example files are provided at the project root: config.example.toml and context.example.json.

Run:

spec config.example.toml context.example.json --rounds 10 --verbose

which will use the example config to run the code, and execute speculative decoding for 10 rounds.

On first run, models are downloaded automatically from the Hugging Face Hub. The example config uses Qwen3-0.6B and Qwen3-8B.

For CLI options:

spec --help

Configure Input Files

To customize configuration, see config.example.toml. If you use Even Better TOML, add:

#:schema config-schema.json

at the top of your TOML file to enable completion and hover hints based on config-schema.json.

To customize context, see context.example.json. The format is a list of dict objects, where each object defines role and content.

More Information

For more information, see the docs, which briefly describe the project structure.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

naive_speculate-0.1.0.tar.gz (26.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

naive_speculate-0.1.0-py3-none-any.whl (48.6 kB view details)

Uploaded Python 3

File details

Details for the file naive_speculate-0.1.0.tar.gz.

File metadata

  • Download URL: naive_speculate-0.1.0.tar.gz
  • Upload date:
  • Size: 26.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for naive_speculate-0.1.0.tar.gz
Algorithm Hash digest
SHA256 99e0e280d5526a5515ff82a4d02101f593adb979bfa895703268d1e4134862bb
MD5 bfff855e8b92ba1a396bd772ae59bd18
BLAKE2b-256 870b1da8d64d13265fc257ad67e36cadf5a445d0c27e80e363ab4ebc765e907f

See more details on using hashes here.

File details

Details for the file naive_speculate-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: naive_speculate-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 48.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for naive_speculate-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9f6e835a447608329b95344af9b29464edd4a20d0803b7133b55c1c95094114c
MD5 5faf5b54c9403af6cd049ffebabd85c0
BLAKE2b-256 d24ae5461c3f3186309d81ed5e079b7abb43cc2e848dd2942b4fbfc5ceb80c3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page