Skip to main content

Python bindings for OxiLLaMa — Pure Rust LLM inference engine

Project description

oxillama-py

Python bindings for OxiLLaMa — high-performance LLM inference from Python.

Part of the OxiLLaMa workspace — a Pure Rust LLM inference engine.

What It Provides

  • EngineConfig — configuration dataclass for thread count, context size, tokenizer path, and sampler defaults
  • Engine — load a GGUF model and generate text; releases the GIL during inference
  • AsyncEngine — async/await interface; streams tokens to Python coroutines without blocking the event loop
  • SamplerConfig — all ten sampler knobs with greedy() and mirostat_v2() static constructors
  • SpeculativeConfig / SpeculativeEngine — draft + target model pair for faster generation
  • Lora — load a LoRA adapter and hot-swap it onto an Engine
  • Tokenizer — first-class tokenizer object with encode, decode, encode_batch, apply_chat_template
  • CancellationToken — cooperative cancellation handle accepted by generate() and generate_streaming()
  • Structured exception hierarchy: OxiLlamaErrorLoadError, GenerateError, TokenizerError, GrammarError, QuantError, KvCacheFullError
  • Full Python type annotations (.pyi stubs) and docstrings
  • Wheels built with maturin (ABI3, Python 3.8+)
  • Optional numpy interop (embed_numpy(), embed_batch_numpy(), forward_logits_numpy()) via numpy feature

Status

Version: 0.1.2 — Tests: 81 passing

Installation

pip install maturin
maturin develop --release          # in-place development install
# or
maturin build --release            # build a wheel
pip install target/wheels/oxillama_py-*.whl

Usage

import oxillama_py as ox

# Load model
engine = ox.Engine("llama-3.2-3b.Q4_K_M.gguf")

# Basic generation (GIL is released during the Rust inference call)
output = engine.generate(
    prompt="Tell me about the Rust programming language.",
    max_new_tokens=256,
    temperature=0.8,
    top_p=0.95,
)
print(output)

# Streaming generation with a callback
engine.generate_streaming(
    "Explain quantum computing.",
    max_tokens=256,
    callback=lambda tok: print(tok, end="", flush=True),
)

# Async engine (non-blocking, event-loop friendly)
import asyncio

async def run():
    aengine = ox.AsyncEngine("llama-3.2-3b.Q4_K_M.gguf")
    result = await aengine.generate("Hello async world", max_new_tokens=64)
    print(result)

asyncio.run(run())

# Cooperative cancellation
token = ox.CancellationToken()
engine.generate_streaming("Tell me a story", max_tokens=1024,
                          callback=print, cancel_token=token)
token.cancel()  # stop from another thread

# Speculative decoding: 3-8x faster on large models
draft  = ox.Engine("llama-3.2-1b.Q4_K_M.gguf")
target = ox.Engine("llama-3.2-8b.Q4_K_M.gguf")
spec   = ox.SpeculativeEngine(draft=draft, target=target, gamma=4)
output = spec.generate("Once upon a time", max_new_tokens=512)
print(output)

# LoRA adapter
lora   = ox.Lora.load("my-adapter.gguf")
engine.apply_lora(lora)
output = engine.generate("Write a haiku.", max_new_tokens=64)
engine.remove_lora()

# Tokenizer
tokenizer = ox.Tokenizer.from_file("tokenizer.json")
ids = tokenizer.encode("Hello, world!")
text = tokenizer.decode(ids)

# HuggingFace Hub loader
engine = ox.Engine.from_hub("meta-llama/Llama-3.2-3B-GGUF")

Feature Flags

Feature Default Description
numpy no numpy interop for embed_numpy(), embed_batch_numpy(), forward_logits_numpy()

License

Apache-2.0 — COOLJAPAN OU (Team Kitasan)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oxillama-0.1.2.tar.gz (763.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

oxillama-0.1.2-cp38-abi3-win_amd64.whl (2.0 MB view details)

Uploaded CPython 3.8+Windows x86-64

oxillama-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

oxillama-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

oxillama-0.1.2-cp38-abi3-macosx_11_0_arm64.whl (1.9 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

oxillama-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file oxillama-0.1.2.tar.gz.

File metadata

  • Download URL: oxillama-0.1.2.tar.gz
  • Upload date:
  • Size: 763.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oxillama-0.1.2.tar.gz
Algorithm Hash digest
SHA256 5b6184bddbf05375dbad3b4210f70176ce201d6a94255c72b8228a8baf0e7982
MD5 2c63ee5aae5f9587fba82a4aa4daa555
BLAKE2b-256 37ef8b50aabecf7a52a88408e3222f2eba0b1a768b6c1c3a227724261797a36f

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.2.tar.gz:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oxillama-0.1.2-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: oxillama-0.1.2-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oxillama-0.1.2-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 477a9e7181cfc43b3574c1ec1addd17fd9d0c259366764a01576ff4b7011a49b
MD5 fb275a9aecd29f402611cbcca35f628c
BLAKE2b-256 f05cd17566d4f6b7f1712a8a184332519b5bba1683a9120e7c6490999aa9d9df

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.2-cp38-abi3-win_amd64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oxillama-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for oxillama-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f3a919846e5f78cef6ad1578a0c86e4b9e0500b6bc6c90a4f4816107b60720da
MD5 0c6c055e5ce289a5bb95e513fcca6d6c
BLAKE2b-256 5f1b73e68356a8c6ed2df9b2ab897f85c1ea37ad84a2a31d199e3523d940aa00

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oxillama-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for oxillama-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7b530e8d96bb2709c0e3759b49e9220638ad82d3081142038ed85f5f9a030742
MD5 76e0adb37fd8dce40e098ac733095248
BLAKE2b-256 a5a968112d8f9c9e272be2fb6730ec71670433ca9c9688927eb9c3b32656e4e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oxillama-0.1.2-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for oxillama-0.1.2-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 dc39757c24f2fa4ffc5b72e98d4271687cb5dc55c32e26573a9f0194307068e3
MD5 6cb0f228b9ccc7f59b899b112a2aa694
BLAKE2b-256 078be854ef9e8465a8de9c0de9a25a66ed8ff1f60fdc945d49138bc8d101972d

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.2-cp38-abi3-macosx_11_0_arm64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oxillama-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for oxillama-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 00edb4764d6d7875ba049a8e9deb1aa48640649aafe90c628c6b5b9c61e1a892
MD5 b0eb7ae2720c284883c6405b4c4583e7
BLAKE2b-256 c49978691eab25f1ebbd58716492b305632f63f875366f12589e937605fc43ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page