Production-grade PyTorch model compression: pruning, quantization, knowledge distillation, and LoRA/QLoRA fine-tuning.

These details have not been verified by PyPI

Project links

Project description

peftml

Parameter-Efficient Fine-Tuning & Machine Learning Compression Toolkit

Prune, quantise, distil, and fine-tune deep learning models — CNNs, transformers, and LLMs — through a single, unified API.

Features

Technique	Use case	Key classes
Pruning (global / unstructured / structured)	Remove redundant weights from any architecture	`DynamicPruner`, `IterativePruningScheduler`
Quantization-Aware Training (LSQ + PACT)	Train INT8/INT4 models that deploy without accuracy loss	`QConv2d`, `QLinear`, `PACTReLU`
Post-Training Quantization (SmoothQuant)	Quantise LLMs without retraining	`ActivationObserver`, `apply_smoothquant`
Sparse QAT	Simultaneous pruning + quantization for edge deployment	`SparseQATPipeline`
Knowledge Distillation	Train small students from large teachers (classification, segmentation, detection, LM)	`DynamicKDTrainer`
LoRA / QLoRA	Parameter-efficient LLM fine-tuning	`LoRALinear`, `QLoRAOrchestrator`
ONNX Export	Ship compressed models to production runtimes	`export_onnx`

Installation

pip install peftml

# With dev tools
pip install peftml[dev]

# With ONNX export support
pip install peftml[onnx]

Requirements: Python ≥ 3.9, PyTorch ≥ 2.0

Quick start

Every technique is accessible through the ModelCompressor facade:

from peftml import ModelCompressor

1. Pruning a CNN

import torchvision.models as models

model = models.resnet50(weights="DEFAULT")
comp = ModelCompressor(model)

# Global L1 pruning — remove 40% of weights network-wide
pruner = comp.prune(method="global", amount=0.4, ignore_layers=["fc"])
print(pruner.compute_sparsity())

# Commit before saving
pruner.commit()
torch.save(model.state_dict(), "resnet50_pruned.pt")

2. Quantization-Aware Training (CNN)

model = models.mobilenet_v2(weights="DEFAULT")
comp = ModelCompressor(model)

# Replace Conv2d → QConv2d (LSQ) and ReLU → PACTReLU
model = comp.quantize_for_qat(bits=8)

# Train normally — quantization is differentiable
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
for batch in train_loader:
    images, labels = batch
    loss = criterion(model(images), labels)
    loss.backward()
    optimizer.step()

3. Sparse QAT (Pruning + Quantization for Edge)

comp = ModelCompressor(model)
pipe = comp.sparse_qat(
    task_type="classification",
    bits=8,
    target_sparsity=0.5,
    pruning_steps=30,
)

for epoch in range(30):
    for batch in train_loader:
        loss = pipe.train_step(batch, optimizer, criterion, device="cuda")
    pipe.step_epoch()

final_model = pipe.export()

4. Knowledge Distillation

teacher = models.resnet50(weights="DEFAULT")
student = models.mobilenet_v3_small(weights="DEFAULT")

comp = ModelCompressor(student)
trainer = comp.distill(
    teacher=teacher,
    task_type="classification",
    temperature=4.0,
    alpha=0.5,
)

# Training loop
for batch in train_loader:
    images, labels = batch
    loss, student_out = trainer(images, labels, criterion)
    loss.backward()
    optimizer.step()

trainer.teardown()  # clean up hooks

5. QLoRA Fine-Tuning (LLMs)

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")

comp = ModelCompressor(model)
model = comp.apply_qlora(
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    r=16,
    lora_alpha=32,
)

# Only ~0.1% of params are trainable
# Train with your favourite loop / HF Trainer / etc.

# For deployment: merge adapters into base weights
comp.merge_lora()

6. Post-Training Quantization (SmoothQuant)

model = comp.apply_smoothquant(
    dataloader=calibration_loader,
    alpha=0.5,
    calibration_batches=8,
)

7. ONNX Export

from peftml import export_onnx

export_onnx(
    model,
    dummy_input=torch.randn(1, 3, 224, 224),
    path="model.onnx",
    dynamic_axes={"input": {0: "batch"}, "output": {0: "batch"}},
)

Advanced: Direct API Access

For fine-grained control, use the submodules directly:

from peftml.quantization import replace_with_lsq, replace_with_pact
from peftml.pruning import DynamicPruner, IterativePruningScheduler
from peftml.distillation import DynamicKDTrainer, hinton_kd_loss
from peftml.lora import LoRALinear, QLoRAOrchestrator
from peftml.core import LoRAConfig, KDConfig, FeatureMapping

Configuration

All components accept dataclass configs for reproducibility:

from peftml import LoRAConfig, QLoRAConfig, KDConfig, SparseQATConfig, FeatureMapping, TaskType

lora_cfg = LoRAConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"])

kd_cfg = KDConfig(
    task_type=TaskType.SEGMENTATION,
    temperature=4.0,
    alpha=0.3,
    beta=50.0,
    feature_mappings=[
        FeatureMapping("backbone.layer3", "backbone.layer3", 256, 1024),
    ],
)

Project Structure

peftml/
├── core/           # Configs, registry, utilities
├── quantization/   # LSQ, PACT, SmoothQuant, observers
├── pruning/        # DynamicPruner, iterative schedulers
├── distillation/   # KD losses, adapters, trainer
├── lora/           # LoRA layers, QLoRA orchestrator
├── pipelines/      # SparseQAT, ModelCompressor facade
└── export/         # ONNX export

Testing

pip install peftml[dev]
pytest tests/ -v

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.0

Apr 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peftml-2.0.0.tar.gz (29.7 MB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

peftml-2.0.0-py3-none-any.whl (37.8 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file peftml-2.0.0.tar.gz.

File metadata

Download URL: peftml-2.0.0.tar.gz
Upload date: Apr 19, 2026
Size: 29.7 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for peftml-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`c45268a13df195c105fc5fcfae196e0f9d8a052305c31cc2bf6cf6206881cba1`
MD5	`e0e3950ef7f272e401b82d3dcc15b11b`
BLAKE2b-256	`b249347486e30f9d4dfd2fdac1974f0dcb8d62323a0696864e2cfc361099a105`

See more details on using hashes here.

Provenance

The following attestation bundles were made for peftml-2.0.0.tar.gz:

Publisher: publish.yml on Tanupvats/peftml_v2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: peftml-2.0.0.tar.gz
- Subject digest: c45268a13df195c105fc5fcfae196e0f9d8a052305c31cc2bf6cf6206881cba1
- Sigstore transparency entry: 1340605817
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: Tanupvats/peftml_v2@cb17167a87fc91a51c0f9f05fe581c5e0e965294
- Branch / Tag: refs/tags/v2.0.0
- Owner: https://github.com/Tanupvats
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cb17167a87fc91a51c0f9f05fe581c5e0e965294
- Trigger Event: release

File details

Details for the file peftml-2.0.0-py3-none-any.whl.

File metadata

Download URL: peftml-2.0.0-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 37.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for peftml-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bc93f1bcc4b5a4db35248204069aa9099e7dc6a86b97b12c28108522703622d6`
MD5	`4685db925ab8321df3a959d932417ef5`
BLAKE2b-256	`7e46199a468010c007a98a7488f5171320b78cbc59b821246fee8673e6bf416a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for peftml-2.0.0-py3-none-any.whl:

Publisher: publish.yml on Tanupvats/peftml_v2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: peftml-2.0.0-py3-none-any.whl
- Subject digest: bc93f1bcc4b5a4db35248204069aa9099e7dc6a86b97b12c28108522703622d6
- Sigstore transparency entry: 1340605821
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: Tanupvats/peftml_v2@cb17167a87fc91a51c0f9f05fe581c5e0e965294
- Branch / Tag: refs/tags/v2.0.0
- Owner: https://github.com/Tanupvats
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cb17167a87fc91a51c0f9f05fe581c5e0e965294
- Trigger Event: release

peftml 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

peftml

Features

Installation

Quick start

1. Pruning a CNN

2. Quantization-Aware Training (CNN)

3. Sparse QAT (Pruning + Quantization for Edge)

4. Knowledge Distillation

5. QLoRA Fine-Tuning (LLMs)

6. Post-Training Quantization (SmoothQuant)

7. ONNX Export

Advanced: Direct API Access

Configuration

Project Structure

Testing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance