Skip to main content

Python bindings for OxiLLaMa — Pure Rust LLM inference engine

Project description

oxillama-py

Python bindings for OxiLLaMa — high-performance LLM inference from Python.

Part of the OxiLLaMa workspace — a Pure Rust LLM inference engine.

What It Provides

  • EngineConfig — configuration dataclass for thread count, context size, tokenizer path, and sampler defaults
  • Engine — load a GGUF model and generate text; releases the GIL during inference
  • AsyncEngine — async/await interface; streams tokens to Python coroutines without blocking the event loop
  • SamplerConfig — all ten sampler knobs with greedy() and mirostat_v2() static constructors
  • SpeculativeConfig / SpeculativeEngine — draft + target model pair for faster generation
  • Lora — load a LoRA adapter and hot-swap it onto an Engine
  • Tokenizer — first-class tokenizer object with encode, decode, encode_batch, apply_chat_template
  • CancellationToken — cooperative cancellation handle accepted by generate() and generate_streaming()
  • Structured exception hierarchy: OxiLlamaErrorLoadError, GenerateError, TokenizerError, GrammarError, QuantError, KvCacheFullError
  • Full Python type annotations (.pyi stubs) and docstrings
  • Wheels built with maturin (ABI3, Python 3.8+)
  • Optional numpy interop (embed_numpy(), embed_batch_numpy(), forward_logits_numpy()) via numpy feature

Installation

pip install maturin
maturin develop --release          # in-place development install
# or
maturin build --release            # build a wheel
pip install target/wheels/oxillama_py-*.whl

Usage

import oxillama_py as ox

# Load model
engine = ox.Engine("llama-3.2-3b.Q4_K_M.gguf")

# Basic generation (GIL is released during the Rust inference call)
output = engine.generate(
    prompt="Tell me about the Rust programming language.",
    max_new_tokens=256,
    temperature=0.8,
    top_p=0.95,
)
print(output)

# Streaming generation with a callback
engine.generate_streaming(
    "Explain quantum computing.",
    max_tokens=256,
    callback=lambda tok: print(tok, end="", flush=True),
)

# Async engine (non-blocking, event-loop friendly)
import asyncio

async def run():
    aengine = ox.AsyncEngine("llama-3.2-3b.Q4_K_M.gguf")
    result = await aengine.generate("Hello async world", max_new_tokens=64)
    print(result)

asyncio.run(run())

# Cooperative cancellation
token = ox.CancellationToken()
engine.generate_streaming("Tell me a story", max_tokens=1024,
                          callback=print, cancel_token=token)
token.cancel()  # stop from another thread

# Speculative decoding: 3-8x faster on large models
draft  = ox.Engine("llama-3.2-1b.Q4_K_M.gguf")
target = ox.Engine("llama-3.2-8b.Q4_K_M.gguf")
spec   = ox.SpeculativeEngine(draft=draft, target=target, gamma=4)
output = spec.generate("Once upon a time", max_new_tokens=512)
print(output)

# LoRA adapter
lora   = ox.Lora.load("my-adapter.gguf")
engine.apply_lora(lora)
output = engine.generate("Write a haiku.", max_new_tokens=64)
engine.remove_lora()

# Tokenizer
tokenizer = ox.Tokenizer.from_file("tokenizer.json")
ids = tokenizer.encode("Hello, world!")
text = tokenizer.decode(ids)

# HuggingFace Hub loader
engine = ox.Engine.from_hub("meta-llama/Llama-3.2-3B-GGUF")

Feature Flags

Feature Default Description
numpy no numpy interop for embed_numpy(), embed_batch_numpy(), forward_logits_numpy()

License

Apache-2.0 — COOLJAPAN OU (Team Kitasan)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oxillama-0.1.1.tar.gz (698.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

oxillama-0.1.1-cp38-abi3-win_amd64.whl (2.0 MB view details)

Uploaded CPython 3.8+Windows x86-64

oxillama-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

oxillama-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.0 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

oxillama-0.1.1-cp38-abi3-macosx_11_0_arm64.whl (1.9 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

oxillama-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file oxillama-0.1.1.tar.gz.

File metadata

  • Download URL: oxillama-0.1.1.tar.gz
  • Upload date:
  • Size: 698.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oxillama-0.1.1.tar.gz
Algorithm Hash digest
SHA256 29a6b1a67d65b6715584ea144dc5dd68f6981ea93089ff062c0336d621fff46c
MD5 65eac0de56e3dd3909a6045d672928a2
BLAKE2b-256 030d5660c8e4992da229deecaaf2eb4e130c9f46c4d3222d776bc09a599a704e

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.1.tar.gz:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oxillama-0.1.1-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: oxillama-0.1.1-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oxillama-0.1.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 bbc0d7b7a94e4da5504c7e5e2a2aa1ce3c1301ea9b287534a97ef6669a210db2
MD5 0488c3aa1ebd15f6892581cca912c872
BLAKE2b-256 941a8242dd497303b63b273b896d6433a85c8e9a503170dd892a9747b0a99618

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.1-cp38-abi3-win_amd64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oxillama-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for oxillama-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3c79d091ce75c0c1842e327ecc4ea79d684e6f67f452c3151556ff1f35c7f4d1
MD5 2819b294d67a6bdf61dd5677aed4f4a6
BLAKE2b-256 0c07ccca4cdcee6ca825cf46d3f7be54b0161d2c5be083af660a7a0ef2dcfdaa

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oxillama-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for oxillama-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 81a698e0dff05d19b0f8e257c3c3ccaf76d490b7aaa92fe21336f34c21781df5
MD5 c6ce4fc16ce4d9d3d78e08fc939703ee
BLAKE2b-256 ae28c784a14915c6af5d09a20113919f55db7ec2e30e3464a7c5103647e53fb0

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oxillama-0.1.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for oxillama-0.1.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c79c67f43989a3048ac20c4690493d017b6852eab4f3c287964d11c76c386f15
MD5 7f194d1be364fb0cac41468b0d0fd280
BLAKE2b-256 653cbce55819c03264fbdbb1284049ec17ec18e3b3e7711b3e6c37814835e7ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.1-cp38-abi3-macosx_11_0_arm64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oxillama-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for oxillama-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 297ab2a2975ab2846a9d7138cc371ae460bc839dbd74417084028450dcb87738
MD5 1b715a05996899c171dd2501c56ecc4a
BLAKE2b-256 a1df5ccb99b4a2e675a70a1517437820cbffa873ecf56dbbcadc05f566a07a68

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page