vlora-dev

Shared low-rank subspaces for efficient LoRA adapter management

These details have not been verified by PyPI

Project links

Project description

vLoRA

Shared low-rank subspaces for efficient LoRA adapter management.

Based on the Share paper: LoRA adapters across tasks share a common low-rank subspace. Instead of storing N separate adapters, maintain one shared basis and per-task coefficient vectors — achieving up to 122× compression at scale.

Install

pip install vlora-dev

Or from source:

git clone https://github.com/tveseli/vlora.git
cd vlora
pip install -e ".[dev]"

Quickstart

from vlora import SharedSubspace, load_adapter

# Step 1: Build shared subspace from existing adapters
adapters = [load_adapter(f"adapters/task_{i}") for i in range(5)]
subspace = SharedSubspace.from_adapters(adapters, num_components=16)

# Step 2: Project a new adapter (only stores small loadings vector)
new_adapter = load_adapter("adapters/new_task")
projection = subspace.project(new_adapter, task_id="new_task")
subspace.add_task(projection)

# Step 3: Absorb — recompute basis to include new adapter
subspace.absorb(load_adapter("adapters/another_task"), new_task_id="another")

# Reconstruct any task back to full LoRA weights
weights = subspace.reconstruct("new_task")

# Save / load
subspace.save("shared_subspace/")
subspace = SharedSubspace.load("shared_subspace/")

CLI

vlora ships with 9 commands for common workflows:

# Build a shared subspace from adapter directories
vlora compress adapters/task_0 adapters/task_1 adapters/task_2 -o shared_subspace/

# Inspect a subspace (--json for machine-readable output)
vlora info shared_subspace/

# Export a task back to PEFT format (vLLM/TGI compatible)
vlora export shared_subspace/ task_0 -o exported_adapter/ \
  --alpha 32 --base-model meta-llama/Llama-3-8B --target-modules q_proj,v_proj

# Add a new adapter to an existing subspace
vlora add shared_subspace/ adapters/new_task --task-id new_task --incremental

# Analyze adapter similarity and clustering
vlora analyze adapters/task_0 adapters/task_1 adapters/task_2

# Merge adapters using task arithmetic, TIES, or DARE
vlora merge adapters/task_0 adapters/task_1 adapters/task_2 \
  -o merged/ --method ties --density 0.5

# Health check a subspace (NaN, orthonormality, loadings consistency)
vlora validate shared_subspace/

# Compare two tasks within a subspace
vlora diff shared_subspace/ task_0 task_1

# Benchmark subspace operations
vlora benchmark shared_subspace/

Multi-Task Inference

Wrap any PyTorch model with VLoRAModel for on-the-fly adapter switching:

from vlora import VLoRAModel, SharedSubspace

subspace = SharedSubspace.load("shared_subspace/")
model = VLoRAModel(base_model, subspace, lora_alpha=32)  # or scaling=alpha/rank

# Switch adapters instantly — reconstructed from compressed loadings
model.set_task("task_0")
output = model(input_ids)

model.set_task("task_1")  # cached if same task
output = model(input_ids)

print(model.available_tasks)  # ["task_0", "task_1", ...]

Training in the Subspace

Train only the loadings vector (k params per layer) instead of full LoRA matrices — 100×+ parameter reduction:

from vlora import SharedSubspace, orthogonal_init, SubspaceTrainer

subspace = SharedSubspace.load("shared_subspace/")
orthogonal_init(subspace, "new_task")  # initialize near-zero

trainer = SubspaceTrainer(subspace, "new_task", lr=1e-3)
print(f"Trainable params: {trainer.num_trainable_params}")  # e.g. 192 vs 200K

for batch in dataloader:
    loss = compute_loss(model, batch)
    trainer.step(loss)

trainer.write_back()  # persist learned loadings
subspace.save("updated_subspace/")

Task Router

Automatically blend adapters per input using a lightweight router:

from vlora import TaskRouter, SharedSubspace

subspace = SharedSubspace.load("shared_subspace/")
router = TaskRouter.from_subspace(subspace, input_dim=4096)

# Router produces soft blend weights over tasks
x = get_input_embedding(batch)  # (B, 4096)
blended = router.blend_loadings(x, subspace)
subspace.tasks["__routed__"] = blended
recon = subspace.reconstruct("__routed__")

Adapter Analysis

Analyze relationships between adapters before compression:

from vlora import load_adapter, compute_similarity_matrix, find_clusters, adapter_diff

adapters = [load_adapter(f"adapters/task_{i}") for i in range(10)]

# Pairwise cosine similarity
sim_matrix = compute_similarity_matrix(adapters)

# Find redundant adapter groups
clusters = find_clusters(sim_matrix, threshold=0.9)

# Per-layer comparison of two adapters
diff = adapter_diff(adapters[0], adapters[1])

Adapter Merging

Merge multiple adapters into one using state-of-the-art techniques:

from vlora import load_adapter, task_arithmetic, ties_merge, dare_merge

adapters = [load_adapter(f"adapters/task_{i}") for i in range(3)]

# Simple weighted average
merged = task_arithmetic(adapters, weights=[0.5, 0.3, 0.2])

# TIES: trim small values, elect sign by majority, average (reduces interference)
merged = ties_merge(adapters, density=0.5)

# DARE: randomly drop & rescale before averaging (sparsification regularizer)
merged = dare_merge(adapters, drop_rate=0.5, seed=42)

Advanced Compression

# Adaptive k: different components per layer based on explained variance
subspace = SharedSubspace.from_adapters(adapters, adaptive_k=True, variance_threshold=0.9)

# Quantize components for smaller memory footprint
subspace.quantize(bits=8)  # or bits=4

# Check compression stats
stats = subspace.compression_stats()
print(f"Compression ratio: {stats['compression_ratio']:.1f}×")
print(f"Compressed: {stats['total_params_compressed']:,} params")
print(f"Original:   {stats['total_params_original']:,} params")

Incremental Updates

Scale to thousands of adapters without loading them all at once:

# Streaming: load adapters one at a time from disk
subspace = SharedSubspace.from_adapters_streaming(
    adapter_paths, num_components=8
)

# Incremental absorb: fast O(1) update without full SVD recompute
subspace.absorb_incremental(new_adapter, "new_task")

# Move to GPU / change precision
subspace.to(device="cuda", dtype=torch.float16)

The 3-Step Algorithm

Step	Method	What happens
1. Initialize	`SharedSubspace.from_adapters()`	SVD on stacked weight matrices → shared basis
2. Project	`subspace.project()`	New adapter → small loadings vector
3. Absorb	`subspace.absorb()`	Incorporate new adapter, recompute basis

API Reference

Core

SharedSubspace — Central state container. Holds per-layer basis and per-task loadings.
- .from_adapters(adapters, ...) — Build from existing adapters
- .from_adapters_streaming(paths, ...) — Build one adapter at a time from disk
- .project(adapter, task_id) → TaskProjection
- .add_task(projection) — Register a projected task
- .reconstruct(task_id) → LoRAWeights
- .absorb(adapter, task_id) — Incorporate + recompute (full SVD)
- .absorb_incremental(adapter, task_id) — Fast incremental update
- .get_trainable_params(task_id) — For training integration
- .quantize(bits=8) — Quantize components (int8/int4)
- .compression_stats() — Compression ratio and parameter counts
- .to(device, dtype) — Move tensors to device/dtype
- .save(path) / .load(path) — Serialization

Model Integration

VLoRAModel(base_model, subspace, lora_alpha=None) — Inference wrapper with forward hooks
- .set_task(task_id) — Switch adapter (cached)
- .clear_task() — Remove adapter
- .available_tasks — List task IDs
- .reconstruct_state_dict(task_id) — Get delta weight dict
- .compile() — torch.compile the base model for faster inference

Training

orthogonal_init(subspace, task_id) — Initialize new task with small loadings
SubspaceTrainer(subspace, task_id) — Optimizer wrapper for loadings-only training
- .step(loss) — Backprop + update
- .write_back() — Persist to subspace

Router

TaskRouter(input_dim, num_tasks) — Lightweight adapter routing MLP
- .from_subspace(subspace, input_dim) — Auto-create from subspace
- .blend_loadings(x, subspace) — Per-input adapter blending

Merging

task_arithmetic(adapters, weights=None) — Weighted average merge
ties_merge(adapters, density=0.5, weights=None) — Trim + elect sign + merge
dare_merge(adapters, drop_rate=0.5, weights=None, seed=None) — Drop and rescale merge

Analysis

compute_similarity_matrix(adapters) — Pairwise cosine similarity
find_clusters(sim_matrix, threshold) — Greedy clustering
adapter_diff(a, b) — Per-layer L2 distance + cosine similarity
subspace_coverage(subspace, adapter) — How well subspace represents an adapter
find_outliers(adapters, threshold) — Detect statistical outlier adapters

I/O

load_adapter(path) — Load PEFT adapter from disk (safetensors)
load_adapter_from_hub(repo_id) — Load from HuggingFace Hub
save_adapter(weights, path) — Save back to PEFT format

Pipeline (convenience)

init_subspace(paths, ...) — Load + build in one call
absorb_task(subspace, path, task_id) — Load + absorb
extract_adapter(subspace, task_id, path) — Reconstruct + save

Math ops

compute_svd, project_onto_subspace, reconstruct_from_subspace
gram_schmidt, explained_variance_ratio, select_num_components
incremental_svd_update

Benchmarks — Real-World Adapters

Tested with 8 Lots-of-LoRAs adapters (Mistral-7B, rank 16, 96 layers each):

Variance explained — the B matrices share structure much more strongly:

k	Variance (A)	Variance (B)
1	0.19	0.43
2	0.37	0.73
4	0.69	0.95
6	1.00	1.00

Reconstruction error (relative L2 norm):

k	Mean Error	Max Error
1	0.826	0.938
4	0.387	0.846
6	0.000002	0.000003

Compression at scale — shared basis is a one-time cost; each new adapter adds only k loadings per layer:

N adapters	Full (MB)	vLoRA (MB)	Ratio
8	288	288	1.0×
100	3,600	289	12.5×
1,000	36,000	293	122.8×

Run the benchmark yourself:

pip install vlora-dev[hub]
python examples/real_adapters.py

HuggingFace Trainer Integration

Train in the subspace directly with HuggingFace Trainer:

from vlora import SharedSubspace, orthogonal_init
from vlora.integrations.huggingface import VLoRACallback

subspace = SharedSubspace.load("shared_subspace/")
orthogonal_init(subspace, "new_task")

callback = VLoRACallback(subspace, "new_task", lr=1e-3)
trainer = Trainer(model=base_model, args=args, callbacks=[callback])
trainer.train()
subspace.save("updated_subspace/")

Documentation

Quickstart notebook — try vlora in Google Colab
Migration from PEFT — integrate into existing workflow
vLLM guide — serve with vLLM
TGI guide — serve with TGI
Ollama guide — local inference via GGUF

Dependencies

torch >= 2.0
safetensors >= 0.4
click >= 8.0
huggingface-hub >= 0.20 (optional, pip install vlora-dev[hub])
transformers >= 4.38 (optional, pip install vlora-dev[hf])

Citation

@article{share2025,
  title={Share: Shared Low-Rank Subspaces for Efficient LoRA Adapter Management},
  year={2025},
  eprint={2602.06043},
  archivePrefix={arXiv},
}

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.0

Apr 11, 2026

0.3.0

Apr 11, 2026

This version

0.2.1

Feb 10, 2026

0.2.0

Feb 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlora_dev-0.2.1.tar.gz (473.0 kB view details)

Uploaded Feb 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vlora_dev-0.2.1-py3-none-any.whl (41.1 kB view details)

Uploaded Feb 10, 2026 Python 3

File details

Details for the file vlora_dev-0.2.1.tar.gz.

File metadata

Download URL: vlora_dev-0.2.1.tar.gz
Upload date: Feb 10, 2026
Size: 473.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for vlora_dev-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`d838e8c21f4605eacc29275e1a2078901085498c74a3c6ba688bd0c5f82f3af0`
MD5	`12c29ade97dc99c8f5b76b1b5bf3123d`
BLAKE2b-256	`2cd8f5da3905b3213eaeb66f999699763f933407b5f7b37885a2a0b936d39cf6`

See more details on using hashes here.

File details

Details for the file vlora_dev-0.2.1-py3-none-any.whl.

File metadata

Download URL: vlora_dev-0.2.1-py3-none-any.whl
Upload date: Feb 10, 2026
Size: 41.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for vlora_dev-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c24e0042c144ca055066e63c6dc24d4e3154f24d9396841ac1616e1687ab3fb1`
MD5	`649e4c605580c1d75730b388704d9b93`
BLAKE2b-256	`4e1d8b7bef73b839e73fffb2cf12eced10f94a6131948c0ff396f842aa2ba00f`

See more details on using hashes here.

vlora-dev 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Install

Quickstart

CLI

Multi-Task Inference

Training in the Subspace

Task Router

Adapter Analysis

Adapter Merging

Advanced Compression

Incremental Updates

The 3-Step Algorithm

API Reference

Core

Model Integration

Training

Router

Merging

Analysis

I/O

Pipeline (convenience)

Math ops

Benchmarks — Real-World Adapters

HuggingFace Trainer Integration

Documentation

Dependencies

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes