Python bindings for OxiLLaMa — Pure Rust LLM inference engine

These details have not been verified by PyPI

Project description

oxillama-py

Python bindings for OxiLLaMa — high-performance LLM inference from Python.

Part of the OxiLLaMa workspace — a Pure Rust LLM inference engine.

What It Provides

EngineConfig — configuration dataclass for thread count, context size, tokenizer path, and sampler defaults
Engine — load a GGUF model and generate text; releases the GIL during inference
AsyncEngine — async/await interface; streams tokens to Python coroutines without blocking the event loop
SamplerConfig — all ten sampler knobs with greedy() and mirostat_v2() static constructors
SpeculativeConfig / SpeculativeEngine — draft + target model pair for faster generation
Lora — load a LoRA adapter and hot-swap it onto an Engine
Tokenizer — first-class tokenizer object with encode, decode, encode_batch, apply_chat_template
CancellationToken — cooperative cancellation handle accepted by generate() and generate_streaming()
Structured exception hierarchy: OxiLlamaError → LoadError, GenerateError, TokenizerError, GrammarError, QuantError, KvCacheFullError
Full Python type annotations (.pyi stubs) and docstrings
Wheels built with maturin (ABI3, Python 3.8+)
Optional numpy interop (embed_numpy(), embed_batch_numpy(), forward_logits_numpy()) via numpy feature

Installation

pip install maturin
maturin develop --release          # in-place development install
# or
maturin build --release            # build a wheel
pip install target/wheels/oxillama_py-*.whl

Usage

import oxillama_py as ox

# Load model
engine = ox.Engine("llama-3.2-3b.Q4_K_M.gguf")

# Basic generation (GIL is released during the Rust inference call)
output = engine.generate(
    prompt="Tell me about the Rust programming language.",
    max_new_tokens=256,
    temperature=0.8,
    top_p=0.95,
)
print(output)

# Streaming generation with a callback
engine.generate_streaming(
    "Explain quantum computing.",
    max_tokens=256,
    callback=lambda tok: print(tok, end="", flush=True),
)

# Async engine (non-blocking, event-loop friendly)
import asyncio

async def run():
    aengine = ox.AsyncEngine("llama-3.2-3b.Q4_K_M.gguf")
    result = await aengine.generate("Hello async world", max_new_tokens=64)
    print(result)

asyncio.run(run())

# Cooperative cancellation
token = ox.CancellationToken()
engine.generate_streaming("Tell me a story", max_tokens=1024,
                          callback=print, cancel_token=token)
token.cancel()  # stop from another thread

# Speculative decoding: 3-8x faster on large models
draft  = ox.Engine("llama-3.2-1b.Q4_K_M.gguf")
target = ox.Engine("llama-3.2-8b.Q4_K_M.gguf")
spec   = ox.SpeculativeEngine(draft=draft, target=target, gamma=4)
output = spec.generate("Once upon a time", max_new_tokens=512)
print(output)

# LoRA adapter
lora   = ox.Lora.load("my-adapter.gguf")
engine.apply_lora(lora)
output = engine.generate("Write a haiku.", max_new_tokens=64)
engine.remove_lora()

# Tokenizer
tokenizer = ox.Tokenizer.from_file("tokenizer.json")
ids = tokenizer.encode("Hello, world!")
text = tokenizer.decode(ids)

# HuggingFace Hub loader
engine = ox.Engine.from_hub("meta-llama/Llama-3.2-3B-GGUF")

Feature Flags

Feature	Default	Description
`numpy`	no	numpy interop for `embed_numpy()`, `embed_batch_numpy()`, `forward_logits_numpy()`

License

Apache-2.0 — COOLJAPAN OU (Team Kitasan)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.3

May 5, 2026

0.1.2

Apr 25, 2026

This version

0.1.1

Apr 24, 2026

0.1.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oxillama-0.1.1.tar.gz (698.4 kB view details)

Uploaded Apr 24, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

oxillama-0.1.1-cp38-abi3-win_amd64.whl (2.0 MB view details)

Uploaded Apr 24, 2026 CPython 3.8+Windows x86-64

oxillama-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view details)

Uploaded Apr 24, 2026 CPython 3.8+manylinux: glibc 2.17+ x86-64

oxillama-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.0 MB view details)

Uploaded Apr 24, 2026 CPython 3.8+manylinux: glibc 2.17+ ARM64

oxillama-0.1.1-cp38-abi3-macosx_11_0_arm64.whl (1.9 MB view details)

Uploaded Apr 24, 2026 CPython 3.8+macOS 11.0+ ARM64

oxillama-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl (2.0 MB view details)

Uploaded Apr 24, 2026 CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file oxillama-0.1.1.tar.gz.

File metadata

Download URL: oxillama-0.1.1.tar.gz
Upload date: Apr 24, 2026
Size: 698.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oxillama-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`29a6b1a67d65b6715584ea144dc5dd68f6981ea93089ff062c0336d621fff46c`
MD5	`65eac0de56e3dd3909a6045d672928a2`
BLAKE2b-256	`030d5660c8e4992da229deecaaf2eb4e130c9f46c4d3222d776bc09a599a704e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.1.tar.gz:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: oxillama-0.1.1.tar.gz
- Subject digest: 29a6b1a67d65b6715584ea144dc5dd68f6981ea93089ff062c0336d621fff46c
- Sigstore transparency entry: 1368382684
- Sigstore integration time: Apr 24, 2026
Source repository:
- Permalink: cool-japan/oxillama@9c7e3549b279426813e6f03fc9fb3874cd4da2c2
- Branch / Tag: refs/heads/0.1.2
- Owner: https://github.com/cool-japan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@9c7e3549b279426813e6f03fc9fb3874cd4da2c2
- Trigger Event: workflow_dispatch

File details

Details for the file oxillama-0.1.1-cp38-abi3-win_amd64.whl.

File metadata

Download URL: oxillama-0.1.1-cp38-abi3-win_amd64.whl
Upload date: Apr 24, 2026
Size: 2.0 MB
Tags: CPython 3.8+, Windows x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oxillama-0.1.1-cp38-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`bbc0d7b7a94e4da5504c7e5e2a2aa1ce3c1301ea9b287534a97ef6669a210db2`
MD5	`0488c3aa1ebd15f6892581cca912c872`
BLAKE2b-256	`941a8242dd497303b63b273b896d6433a85c8e9a503170dd892a9747b0a99618`

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.1-cp38-abi3-win_amd64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: oxillama-0.1.1-cp38-abi3-win_amd64.whl
- Subject digest: bbc0d7b7a94e4da5504c7e5e2a2aa1ce3c1301ea9b287534a97ef6669a210db2
- Sigstore transparency entry: 1368382705
- Sigstore integration time: Apr 24, 2026
Source repository:
- Permalink: cool-japan/oxillama@9c7e3549b279426813e6f03fc9fb3874cd4da2c2
- Branch / Tag: refs/heads/0.1.2
- Owner: https://github.com/cool-japan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@9c7e3549b279426813e6f03fc9fb3874cd4da2c2
- Trigger Event: workflow_dispatch

File details

Details for the file oxillama-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: oxillama-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Apr 24, 2026
Size: 2.1 MB
Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oxillama-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`3c79d091ce75c0c1842e327ecc4ea79d684e6f67f452c3151556ff1f35c7f4d1`
MD5	`2819b294d67a6bdf61dd5677aed4f4a6`
BLAKE2b-256	`0c07ccca4cdcee6ca825cf46d3f7be54b0161d2c5be083af660a7a0ef2dcfdaa`

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: oxillama-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Subject digest: 3c79d091ce75c0c1842e327ecc4ea79d684e6f67f452c3151556ff1f35c7f4d1
- Sigstore transparency entry: 1368382737
- Sigstore integration time: Apr 24, 2026
Source repository:
- Permalink: cool-japan/oxillama@9c7e3549b279426813e6f03fc9fb3874cd4da2c2
- Branch / Tag: refs/heads/0.1.2
- Owner: https://github.com/cool-japan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@9c7e3549b279426813e6f03fc9fb3874cd4da2c2
- Trigger Event: workflow_dispatch

File details

Details for the file oxillama-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: oxillama-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: Apr 24, 2026
Size: 2.0 MB
Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oxillama-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`81a698e0dff05d19b0f8e257c3c3ccaf76d490b7aaa92fe21336f34c21781df5`
MD5	`c6ce4fc16ce4d9d3d78e08fc939703ee`
BLAKE2b-256	`ae28c784a14915c6af5d09a20113919f55db7ec2e30e3464a7c5103647e53fb0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: oxillama-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Subject digest: 81a698e0dff05d19b0f8e257c3c3ccaf76d490b7aaa92fe21336f34c21781df5
- Sigstore transparency entry: 1368382780
- Sigstore integration time: Apr 24, 2026
Source repository:
- Permalink: cool-japan/oxillama@9c7e3549b279426813e6f03fc9fb3874cd4da2c2
- Branch / Tag: refs/heads/0.1.2
- Owner: https://github.com/cool-japan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@9c7e3549b279426813e6f03fc9fb3874cd4da2c2
- Trigger Event: workflow_dispatch

File details

Details for the file oxillama-0.1.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: oxillama-0.1.1-cp38-abi3-macosx_11_0_arm64.whl
Upload date: Apr 24, 2026
Size: 1.9 MB
Tags: CPython 3.8+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oxillama-0.1.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`c79c67f43989a3048ac20c4690493d017b6852eab4f3c287964d11c76c386f15`
MD5	`7f194d1be364fb0cac41468b0d0fd280`
BLAKE2b-256	`653cbce55819c03264fbdbb1284049ec17ec18e3b3e7711b3e6c37814835e7ce`

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.1-cp38-abi3-macosx_11_0_arm64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: oxillama-0.1.1-cp38-abi3-macosx_11_0_arm64.whl
- Subject digest: c79c67f43989a3048ac20c4690493d017b6852eab4f3c287964d11c76c386f15
- Sigstore transparency entry: 1368382720
- Sigstore integration time: Apr 24, 2026
Source repository:
- Permalink: cool-japan/oxillama@9c7e3549b279426813e6f03fc9fb3874cd4da2c2
- Branch / Tag: refs/heads/0.1.2
- Owner: https://github.com/cool-japan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@9c7e3549b279426813e6f03fc9fb3874cd4da2c2
- Trigger Event: workflow_dispatch

File details

Details for the file oxillama-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

Download URL: oxillama-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl
Upload date: Apr 24, 2026
Size: 2.0 MB
Tags: CPython 3.8+, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oxillama-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`297ab2a2975ab2846a9d7138cc371ae460bc839dbd74417084028450dcb87738`
MD5	`1b715a05996899c171dd2501c56ecc4a`
BLAKE2b-256	`a1df5ccb99b4a2e675a70a1517437820cbffa873ecf56dbbcadc05f566a07a68`

See more details on using hashes here.

Provenance

The following attestation bundles were made for oxillama-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl:

Publisher: pypi-publish.yml on cool-japan/oxillama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: oxillama-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl
- Subject digest: 297ab2a2975ab2846a9d7138cc371ae460bc839dbd74417084028450dcb87738
- Sigstore transparency entry: 1368382758
- Sigstore integration time: Apr 24, 2026
Source repository:
- Permalink: cool-japan/oxillama@9c7e3549b279426813e6f03fc9fb3874cd4da2c2
- Branch / Tag: refs/heads/0.1.2
- Owner: https://github.com/cool-japan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@9c7e3549b279426813e6f03fc9fb3874cd4da2c2
- Trigger Event: workflow_dispatch

oxillama 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

oxillama-py

What It Provides

Installation

Usage

Feature Flags

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance