Skip to main content

Python bindings for OxiLLaMa — Pure Rust LLM inference engine

Project description

oxillama-py

Python bindings for OxiLLaMa — high-performance LLM inference from Python.

Part of the OxiLLaMa workspace — a Pure Rust LLM inference engine.

What It Provides

  • EngineConfig — configuration dataclass for thread count, context size, tokenizer path, and sampler defaults
  • Engine — load a GGUF model and generate text; releases the GIL during inference
  • AsyncEngine — async/await interface; streams tokens to Python coroutines without blocking the event loop
  • SamplerConfig — all ten sampler knobs with greedy() and mirostat_v2() static constructors
  • SpeculativeConfig / SpeculativeEngine — draft + target model pair for faster generation
  • Lora — load a LoRA adapter and hot-swap it onto an Engine
  • Tokenizer — first-class tokenizer object with encode, decode, encode_batch, apply_chat_template
  • CancellationToken — cooperative cancellation handle accepted by generate() and generate_streaming()
  • Structured exception hierarchy: OxiLlamaErrorLoadError, GenerateError, TokenizerError, GrammarError, QuantError, KvCacheFullError
  • Full Python type annotations (.pyi stubs) and docstrings
  • Wheels built with maturin (ABI3, Python 3.8+)
  • Optional numpy interop (embed_numpy(), embed_batch_numpy(), forward_logits_numpy()) via numpy feature

Status

Version: 0.1.2 — Tests: 81 passing

Installation

pip install maturin
maturin develop --release          # in-place development install
# or
maturin build --release            # build a wheel
pip install target/wheels/oxillama_py-*.whl

Usage

import oxillama_py as ox

# Load model
engine = ox.Engine("llama-3.2-3b.Q4_K_M.gguf")

# Basic generation (GIL is released during the Rust inference call)
output = engine.generate(
    prompt="Tell me about the Rust programming language.",
    max_new_tokens=256,
    temperature=0.8,
    top_p=0.95,
)
print(output)

# Streaming generation with a callback
engine.generate_streaming(
    "Explain quantum computing.",
    max_tokens=256,
    callback=lambda tok: print(tok, end="", flush=True),
)

# Async engine (non-blocking, event-loop friendly)
import asyncio

async def run():
    aengine = ox.AsyncEngine("llama-3.2-3b.Q4_K_M.gguf")
    result = await aengine.generate("Hello async world", max_new_tokens=64)
    print(result)

asyncio.run(run())

# Cooperative cancellation
token = ox.CancellationToken()
engine.generate_streaming("Tell me a story", max_tokens=1024,
                          callback=print, cancel_token=token)
token.cancel()  # stop from another thread

# Speculative decoding: 3-8x faster on large models
draft  = ox.Engine("llama-3.2-1b.Q4_K_M.gguf")
target = ox.Engine("llama-3.2-8b.Q4_K_M.gguf")
spec   = ox.SpeculativeEngine(draft=draft, target=target, gamma=4)
output = spec.generate("Once upon a time", max_new_tokens=512)
print(output)

# LoRA adapter
lora   = ox.Lora.load("my-adapter.gguf")
engine.apply_lora(lora)
output = engine.generate("Write a haiku.", max_new_tokens=64)
engine.remove_lora()

# Tokenizer
tokenizer = ox.Tokenizer.from_file("tokenizer.json")
ids = tokenizer.encode("Hello, world!")
text = tokenizer.decode(ids)

# HuggingFace Hub loader
engine = ox.Engine.from_hub("meta-llama/Llama-3.2-3B-GGUF")

Feature Flags

Feature Default Description
numpy no numpy interop for embed_numpy(), embed_batch_numpy(), forward_logits_numpy()

License

Apache-2.0 — COOLJAPAN OU (Team Kitasan)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oxillama-0.1.3.tar.gz (983.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

oxillama-0.1.3-cp38-abi3-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.8+Windows x86-64

oxillama-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

oxillama-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

oxillama-0.1.3-cp38-abi3-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

oxillama-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file oxillama-0.1.3.tar.gz.

File metadata

  • Download URL: oxillama-0.1.3.tar.gz
  • Upload date:
  • Size: 983.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oxillama-0.1.3.tar.gz
Algorithm Hash digest
SHA256 32c431ecbaeef5131d3f969eaf8b2558650177247267daec7d86284079e864be
MD5 28e3c7f497e337fcf73e036fd52a405e
BLAKE2b-256 e7c05a65ef3973778783ba1c2b4468ee650d2c73899d41c41eff162cb83b108d

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.3.tar.gz:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oxillama-0.1.3-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: oxillama-0.1.3-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oxillama-0.1.3-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 cfaffdf2fc497cb196f5e93ac6e1e07fd81c7a8c169f0c9e82d621ee4f2360e1
MD5 69731f414c977f7660e23477f1e4a4ec
BLAKE2b-256 c0dc00499d07f17dd4ec14467f661d29d512f3e3f01fcced351c0963177be0fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.3-cp38-abi3-win_amd64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oxillama-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for oxillama-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2dcb8c3daee8ba2149c835113af4d5c8058cceb670a3865f96841409db1b3ee2
MD5 5e9aa3ba55ef52e362154566e1cac93a
BLAKE2b-256 008deba5c24a62673fd04b32801fe626316a63a1243c7139cebe9cc1b21ef697

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oxillama-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for oxillama-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 93ffc21d23ec8bc114cbcebbf88443d01b1c1d57e8803652147b2ca05bef7334
MD5 6a4a24d80105de4e134ec60173c9d76c
BLAKE2b-256 627c9788e5cc8529be9be2de76a0076a245711c7c29b44e407a625e2522c8c1c

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oxillama-0.1.3-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for oxillama-0.1.3-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 011bf3a68ba17d7bb040ce4d34e3576782b85aede291c94ca8cf4432606f38b9
MD5 d24e6cbb9aa03f3f7cd7726a6a94e97f
BLAKE2b-256 a2a25ce7eee1b69161c2c1dca0801c67d17531fe0e9401c0fd3a358246812e4f

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.3-cp38-abi3-macosx_11_0_arm64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oxillama-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for oxillama-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4129cb79c732006728f2a61e012d01fe10fa2b5486b9d185b49c873c541b9e29
MD5 29d7615f63e0724ce234cacb45ae643d
BLAKE2b-256 29f966e9f91834f5da44151f443a2966db011e1272385d56ecd999f6e08474fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page