Batched multi-LoRA inference runtime for PyTorch models.

These details have not been verified by PyPI

Project description

PolyLoRA

Minimal PyTorch runtime for batched LoRA inference where each row can use a different adapter.

PolyLoRA wraps an existing torch.nn.Module, replaces selected nn.Linear layers, and serves PEFT LoRA adapters from CPU, GPU, and optional disk caches.

Install

pip install .

With PEFT loading support:

pip install '.[peft]'

Usage

from polylora import CustomLoraConfig, CustomPeftModel

model = CustomPeftModel(
    base_model,
    CustomLoraConfig(
        max_gpu_adapters=4,
        max_rank=16,
        target_modules=["query_proj", "key_proj", "value_proj", "dense"],
    ),
).eval()

model.load_adapter_from_disk("legal", "./adapters/legal")
model.load_adapter_from_disk("finance", "./adapters/finance")

outputs = model(**batch, adapter_ids=["legal", "finance"])

Omit adapter_ids to run the base model. Use __base__ for rows that should skip LoRA inside a mixed batch.

Caches

PolyLoRA uses three adapter tiers:

GPU cache: fixed-size adapter slots for the active batch. Slot 0 is reserved for __base__, so non-adapter rows share the same execution path.
CPU cache: LRU store for loaded adapter weights. GPU evictions can reload from CPU without touching disk.
Disk cache: optional bounded PEFT adapter directory cache. CPU misses can reload adapters from this cold layer.

This makes small hot sets fast while still allowing a larger adapter catalog than GPU memory can hold.

Kernels

On CUDA, PolyLoRA uses Triton SGMV kernels for the LoRA A and B projections:

Mixed batches can contain different adapter ids, including __base__ rows.
Different adapters may use different ranks, up to max_rank.
Rank-0 rows skip adapter work, which is how base-only rows and missing layer weights are represented.
The B projection fuses scaling and add-back into the base linear output.
The implementation falls back to a PyTorch reference path on CPU or when Triton is disabled.

Adapter Layouts

Adapters do not need to cover every wrapped layer. If a model is wrapped with a larger target_modules set and an adapter only contains LoRA weights for some of those layers, missing layers are treated as rank-0 no-ops for that adapter.

PolyLoRA rejects adapters with weights outside the configured module set, which keeps mixed adapters predictable when different adapters target different subsets of the model.

Notes

Supports standard PEFT LoRA adapters for inference.
Does not support LoRA dropout, DoRA, RS-LoRA, or LoRA bias.
Attention masks must be right padded when enforce_right_padding=True.

Development

pip install -e '.[dev]'
pytest tests

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Jun 15, 2026

This version

0.1.0

Jun 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polylora-0.1.0.tar.gz (18.1 kB view details)

Uploaded Jun 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

polylora-0.1.0-py3-none-any.whl (16.1 kB view details)

Uploaded Jun 1, 2026 Python 3

File details

Details for the file polylora-0.1.0.tar.gz.

File metadata

Download URL: polylora-0.1.0.tar.gz
Upload date: Jun 1, 2026
Size: 18.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for polylora-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1f8f0a764575e93640b29c63f3d6e8734cad9aa63bbeadf455e2e3ec73f60327`
MD5	`d185d65287219165712f6c5e0f2dfa8c`
BLAKE2b-256	`5e525b8c5bbb7958912d9e5391605bb2eb9b36cd15ef5ade30081b28f5ec7c9d`

See more details on using hashes here.

File details

Details for the file polylora-0.1.0-py3-none-any.whl.

File metadata

Download URL: polylora-0.1.0-py3-none-any.whl
Upload date: Jun 1, 2026
Size: 16.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for polylora-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`05711141745ae2075c9935c00a5c66a43c98256200169bee0d8cd9c2f8b25834`
MD5	`a81cc453d1fafd3bf7a81f78539e2417`
BLAKE2b-256	`a7fdf306e8d85504c5a8dbdd6ce700b1663b66e3b7a9101221923481d4b8fe90`

See more details on using hashes here.

polylora 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

PolyLoRA

Install

Usage

Caches

Kernels

Adapter Layouts

Notes

Development

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes