ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning

Project description

ShadowPEFT

ShadowPEFT is a parameter-efficient fine-tuning (PEFT) framework that augments a frozen large base model with a lightweight, centralized, and pretrainable Shadow network. The shadow network runs in parallel with the base model, injecting learned corrections into each decoder layer to enable effective adaptation with a fraction of the parameters. Once jointly trained with the base model, the shadow network can be detached and deployed independently, benefiting edge computing scenarios.

ShadowPEFT Preview

How It Works

ShadowPEFT Framework

Input
  │
  ├──► Shadow Model (small, trainable) ──► shadow_hidden_states
  │
  └──► Base Model (frozen, large)
         │
         layer_0 ──────────────────────────────────────────────────► hidden_0
         layer_1 ◄── ShadowInjection(hidden_0, shadow[0]) ─────────► hidden_1
         layer_2 ◄── ShadowInjection(hidden_1, shadow[1]) ─────────► ...
         ...        [ShadowUpdate updates shadow state each step]

Three trainable components control the adaptation:

Shadow Model — a small copy of the base architecture with fewer/smaller layers
ShadowInjectionModel — projects the difference between base and shadow hidden states back into the base at each layer
ShadowUpdateModel — uses a gated update to evolve the shadow hidden states as the base model processes each layer

Supported Models
Installation
Quick Start
Examples
Usage
Configuration Reference
Saving and Loading
Exporting the Shadow Model
Training with HF Trainer
Notes and Limitations
Contributors
Credits

Supported Models

ShadowPEFT is architecture-agnostic for most Hugging Face decoder-only transformer models whose decoder layer stack is accessible via one of:

Attribute path	Example architectures
`model.model.layers`	LLaMA, Mistral, Qwen, Gemma
`model.transformer.h`	GPT-2-style
`model.model.decoder.layers`	Some nested decoder layouts

Installation

uv pip install shadow-peft

git clone https://github.com/ShadowLLM/shadow-peft.git
cd shadow-peft
uv pip install -e .
# Optional: dev/test dependencies
uv pip install -e ".[dev]"

Requirements: Python ≥ 3.10, PyTorch ≥ 2.1, Transformers > 5.0

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from shadow_peft import get_shadow_model, ShadowConfig

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")

# Wrap the base model with a Shadow adapter (1-layer implicit shadow)
model = get_shadow_model(model, ShadowConfig(num_shadow_layers=1))
model.print_trainable_parameters()
# Trainable params: ~18M  ||  Total params: ~770M  ||  Trainable%: ~2.30%

# Only shadow-related parameters are trainable; base model is frozen.

Examples

The examples/ folder contains interactive playground notebooks for common ShadowPEFT workflows:

examples/different_llm_backbones_playground.ipynb - explore ShadowPEFT across different LLM backbones
examples/pretraining_shadow_via_pseudo_inverse.ipynb - initilize pretraining shadow model with the pseudo-inverse recipe
examples/robot_intent_playground.ipynb - robot intent generation
examples/classification_playground.ipynb - experiment with sequence-classification workflows

Usage

1. Implicit Shadow Model

The simplest way to use ShadowPEFT. A shadow model is automatically constructed from the same architecture as the base model, with fewer layers and optionally reduced MLP/attention sizes.

from transformers import AutoModelForCausalLM
from shadow_peft import get_shadow_model, ShadowConfig

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")

shadow_config = ShadowConfig(
    num_shadow_layers=1,          # number of layers in the implicit shadow model
    injection_hidden_size=16,     # bottleneck dim for injection adapter
    gate_hidden_size=8,           # hidden dim for the update gate
    alpha=0.1,                    # scale factor for the injection delta
    dropout=0.1,

    # Optional: override implicit shadow model dimensions
    shadow_intermediate_size=None,       # MLP intermediate size (None = same as base)
    shadow_num_attention_heads=None,     # attention heads (None = same as base)
    shadow_num_key_value_heads=None,     # KV heads (None = same as base)
    shadow_head_dim=None,                # head dimension (None = same as base)
)

model = get_shadow_model(model, shadow_config)
model.print_trainable_parameters()

2. Explicit Shadow Model [Recommendation]

Use a separately pre-trained shadow model — for example, a smaller model that has been pre-trained to align with a larger base model's hidden space via AutoModelForCausalLMWithHiddenProjection.

When the shadow model's hidden size differs from the base model's hidden size, ShadowPEFT automatically inserts a shadow_hidden_projection linear layer to bridge the gap.

from transformers import AutoModelForCausalLM
from shadow_peft import get_shadow_model, ShadowConfig, AutoModelForCausalLMWithHiddenProjection

# Large base model (frozen)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B")

# Pre-trained shadow model aligned to the 8B hidden space
shadow_model = AutoModelForCausalLMWithHiddenProjection.from_pretrained(
    "shadow-llm/Qwen3-0.6B-H8B"
)

shadow_config = ShadowConfig(
    injection_hidden_size=16,
    gate_hidden_size=8,
    alpha=0.1,
    dropout=0.1,
)

model = get_shadow_model(model, shadow_config, shadow_model=shadow_model)
model.print_trainable_parameters()

Tip: When shadow_model carries a shadow_hidden_projection Linear layer (as produced by AutoModelForCausalLMWithHiddenProjection), ShadowPEFT reuses its trained weights instead of randomly initializing the projection.

3. `ShadowForCausalLM` — generation & training

ShadowForCausalLM is a task wrapper that adds a language modeling head to the Shadow setup. It supports two inference modes:

Mode	`logits`	`shadow_logits`
`"base_shadow"` (default)	Base model output	Shadow path output
`"shadow_only"`	Shadow path output	Shadow path output

from transformers import AutoModelForCausalLM, AutoTokenizer
from shadow_peft import ShadowConfig, ShadowForCausalLM, get_shadow_model

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")

# Pre-trained shadow model aligned to the 8B hidden space
shadow_model = AutoModelForCausalLMWithHiddenProjection.from_pretrained(
    "shadow-llm/Qwen3-0.6B-H8B"
)

shadow_config = ShadowConfig(
    injection_hidden_size=16,
    gate_hidden_size=8,
    alpha=0.1,
    dropout=0.1,
)

model = get_shadow_model(model, shadow_config, shadow_model=shadow_model)
model = ShadowForCausalLM(peft, inference_mode="base_shadow")

inputs = tokenizer("Hello", return_tensors="pt")

# base_shadow: returns both base logits and shadow logits
out = model(**inputs)
print(out.logits.shape)         # [1, seq_len, vocab]
print(out.shadow_logits.shape)  # [1, seq_len, vocab]

# Switch to shadow-only inference (lightweight, no base model forward pass)
model.set_inference_mode("shadow_only")
out = model(**inputs)
print(out.logits.shape)         # shadow logits only

Training with labels:

When labels are provided, ShadowForCausalLM computes a combined loss:

loss = base_CE_loss + shadow_loss_weight * shadow_CE_loss

model = ShadowForCausalLM(peft, shadow_loss_weight=0.05)

inputs = tokenizer("Hello world", return_tensors="pt")
labels = inputs["input_ids"].clone()

out = model(**inputs, labels=labels)
print(out.loss)  # combined loss for backprop

Text generation:

KV cache is disabled inside Shadow; always pass use_cache=False:

gen_ids = model.generate(**inputs, use_cache=False, max_new_tokens=32)
print(tokenizer.decode(gen_ids[0], skip_special_tokens=True))

Loading from a saved checkpoint:

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B")
shadow_model = AutoModelForCausalLMWithHiddenProjection.from_pretrained(
    "shadow-llm/Qwen3-0.6B-H8B"
)
model = ShadowForCausalLM.from_pretrained(
    base,
    "/path/to/shadow_checkpoint",
    is_trainable=False,
    inference_mode="base_shadow",
    shadow_model=shadow_model,  # explicitly set shadow model
)

4. `ShadowForSequenceClassification`

Drop-in equivalent of ShadowForCausalLM for classification tasks.

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from shadow_peft import ShadowConfig, ShadowForSequenceClassification, get_shadow_model

base = AutoModelForSequenceClassification.from_pretrained(
    "Qwen/Qwen3-0.6B",
    num_labels=2,
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")

peft = get_shadow_model(base, ShadowConfig(num_shadow_layers=1))
model = ShadowForSequenceClassification(peft, inference_mode="base_shadow")

inputs = tokenizer("This movie was great!", return_tensors="pt")

out = model(**inputs)
print(out.logits)         # base classifier logits [1, 2]
print(out.shadow_logits)  # shadow classifier logits [1, 2]

# Switch to shadow-only (no base forward pass)
model.set_inference_mode("shadow_only")
out = model(**inputs)
print(out.logits)  # shadow logits only

By default, both classifier_head and shadow_classifier_head are trainable. Use ShadowConfig.modules_to_save to control which heads are saved alongside the adapter:

shadow_config = ShadowConfig(
    num_shadow_layers=1,
    modules_to_save=["classifier_head", "shadow_classifier_head"],
)

Loading from a saved checkpoint:

base = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
model = ShadowForSequenceClassification.from_pretrained(
    base,
    "/path/to/shadow_checkpoint",
    is_trainable=False,
)

5. `AutoModelForCausalLMWithHiddenProjection`

A standalone HF-compatible model that wraps a small shadow backbone with:

A projection layer mapping shadow hidden size → base hidden size
A frozen lm_head from the larger base model

This is the canonical format for distributing pre-trained shadow models that target a larger base model's vocabulary space.

Loading a pre-trained projected shadow model:

from shadow_peft import AutoModelForCausalLMWithHiddenProjection

# Load directly from the Hub (or a local path)
shadow_model = AutoModelForCausalLMWithHiddenProjection.from_pretrained(
    "shadow-llm/Qwen3-0.6B-H8B",
    freeze_backbone=False,      # keep backbone trainable (default)
    freeze_embed_tokens=True,   # freeze input embeddings (default)
    freeze_lm_head=True,        # freeze lm_head (default)
)

Creating from scratch (wrapping existing models) via pseudo-inverse:

import torch.nn as nn
from transformers import AutoModelForCausalLM
from shadow_peft import AutoModelForCausalLMWithHiddenProjection

small = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
large = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B")

# Wrap: small backbone + projection (1024→4096) + large lm_head
wrapped = AutoModelForCausalLMWithHiddenProjection.wrap(
    shadow_model=small,
    shadow_hidden_projection=nn.Linear(1024, 4096, bias=False),
    lm_head=large.lm_head,
    # Optionally solve for the optimal initial projection via pseudoinverse:
    init_optimal_projection=True,
    reference_lm_head=small.lm_head,
)

wrapped.save_pretrained("/path/to/Qwen3-0.6B-H8B")

When init_optimal_projection=True, the projection is initialized to minimize ‖W_lm_large @ W_proj - W_lm_small‖, providing a better starting point for fine-tuning.

Configuration Reference

from shadow_peft import ShadowConfig

ShadowConfig(
    # ── Shadow model architecture ──────────────────────────────────────────
    num_shadow_layers: int = 1,
    #   Number of transformer layers in the implicit shadow model.
    #   Ignored when an explicit shadow_model is provided.

    shadow_intermediate_size: int | None = None,
    #   Override the MLP intermediate size of the implicit shadow model.
    #   None = same as the base model.

    shadow_num_attention_heads: int | None = None,
    #   Override the number of attention heads. None = same as base.

    shadow_num_key_value_heads: int | None = None,
    #   Override the number of KV heads (GQA). None = same as base.

    shadow_head_dim: int | None = None,
    #   Override per-head dimension. None = same as base.

    # ── Adapter hyperparameters ────────────────────────────────────────────
    injection_hidden_size: int = 16,
    #   Bottleneck dimension of the ShadowInjectionModel.
    #   Larger = more expressive injection but more parameters.

    gate_hidden_size: int = 10,
    #   Hidden dimension of the ShadowUpdateModel gate.

    alpha: float = 0.1,
    #   Scale factor applied to the injection delta:
    #     hidden' = hidden + alpha * injection_delta

    dropout: float = 0.2,
    #   Dropout applied inside injection and update adapters.

    # ── Modules to save ────────────────────────────────────────────────────
    modules_to_save: list[str] = [],
    #   Extra modules to make trainable and persist in the checkpoint.
    #   CausalLM options:  ["lm_head", "shadow_lm_head"]
    #   SeqCls options:    ["classifier_head", "shadow_classifier_head"]
)

Saving and Loading

Save a checkpoint

Calling save_pretrained saves only the adapter weights (shadow model + injection/update modules), not the base model:

# From ShadowPeftModel
# From ShadowForCausalLM or ShadowForSequenceClassification
# Also saves trainable task heads if modules_to_save is set.
model.save_pretrained("/path/to/checkpoint")

Saved files:

shadow_config.json — adapter configuration
shadow_adapter.safetensors — adapter weights (shadow model + injection + update)
shadow_modules.safetensors — task-specific heads (if modules_to_save is set)

Load a checkpoint

from transformers import AutoModelForCausalLM
from shadow_peft import ShadowPeftModel

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
# implicit or explicit shadow model
shadow_model = None

# Inference (frozen)
model = ShadowPeftModel.from_pretrained(base, "/path/to/checkpoint", is_trainable=False, shadow_model=shadow_model)

# Resume training
model = ShadowPeftModel.from_pretrained(base, "/path/to/checkpoint", is_trainable=True, shadow_model=shadow_model)

Or use the task wrappers directly:

from shadow_peft import ShadowForCausalLM

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
model = ShadowForCausalLM.from_pretrained(base, "/path/to/checkpoint", is_trainable=False, shadow_model=shadow_model)

Push to the Hugging Face Hub

# From ShadowPeftModel or ShadowForCausalLM / ShadowForSequenceClassification
model.push_to_hub(
    "your-org/my-shadow-adapter",
    commit_message="Add ShadowPEFT adapter for Qwen3-0.6B",
    private=True,
    token="hf_...",
)

Load from the Hub

from transformers import AutoModelForCausalLM
from shadow_peft import ShadowPeftModel

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
# implicit or explicit shadow model
shadow_model = None

# Supports repo_id or repo_id@revision
model = ShadowPeftModel.from_pretrained(base, "your-org/my-shadow-adapter", shadow_model=shadow_model)

Exporting the Shadow Model

After training, you can extract the shadow backbone as a fully self-contained HF model — useful for independent evaluation or shadow-only inference:

# Export a standalone HF model from the trained adapter
shadow_only = model.peft_model.export_shadow()
shadow_only.save_pretrained("/path/to/exported_shadow")

# Load and use it independently
import shadow_peft
from transformers import AutoModelForCausalLM
standalone = AutoModelForCausalLM.from_pretrained("/path/to/exported_shadow")

When the shadow and base have different hidden sizes, export_shadow returns an AutoModelForCausalLMWithHiddenProjection that bundles the backbone, the trained projection, and the base model's lm_head into a single loadable checkpoint.

Training with HF Trainer

ShadowForCausalLM and ShadowForSequenceClassification are compatible with transformers.Trainer. The adapter's state_dict returns only the trainable adapter weights, so Trainer's safetensors checkpointing works without any patching.

from transformers import Trainer, TrainingArguments
from shadow_peft import ShadowConfig, ShadowForCausalLM, get_shadow_model

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
peft = get_shadow_model(base, ShadowConfig(num_shadow_layers=1))
model = ShadowForCausalLM(peft, shadow_loss_weight=0.05)

training_args = TrainingArguments(
    output_dir="./shadow-output",
    per_device_train_batch_size=4,
    num_train_epochs=3,
    # Gradient checkpointing is forwarded to the base model automatically:
    gradient_checkpointing=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=...,
)
trainer.train()

# Save only the adapter
model.save_pretrained("./shadow-checkpoint")

Notes and Limitations

KV cache is disabled. Shadow requires full-sequence processing to compute injections at every layer. use_cache=False is enforced automatically in all forward passes and generation calls.
Generation requires use_cache=False. Some Transformers versions will still try to slice inputs when cache is active. Always pass it explicitly:
```
outputs = model.generate(input_ids, use_cache=False, max_new_tokens=64)
```
Base model is always frozen. ShadowPeftModel sets requires_grad=False on all base model parameters during construction. If you need to fine-tune both base and shadow, manage requires_grad manually after wrapping.
Minimum 2 decoder layers required. Shadow injection starts at layer 1, so the base model must have at least 2 decoder layers.
Embedding sharing. For implicit shadow models, embed_tokens is removed from the shadow backbone and replaced by the base model's embeddings. This saves memory and keeps token representations consistent. Explicit shadow models keep their own embeddings by default; pass remove_embed_tokens=True to prepare_shadow_model to opt in to sharing.

Contributors

Carbon-based:

Silicon-based:

Credits

ShadowPEFT's API and code structure are heavily inspired by PEFT (Hugging Face). Concepts such as get_shadow_model, ShadowPeftModel.from_pretrained / save_pretrained, and modules_to_save deliberately mirror PEFT's conventions to provide a familiar experience for users already accustomed to LoRA and similar adapters.

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shadow_peft-0.1.0.tar.gz (702.0 kB view details)

Uploaded Apr 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

shadow_peft-0.1.0-py3-none-any.whl (35.2 kB view details)

Uploaded Apr 21, 2026 Python 3

File details

Details for the file shadow_peft-0.1.0.tar.gz.

File metadata

Download URL: shadow_peft-0.1.0.tar.gz
Upload date: Apr 21, 2026
Size: 702.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for shadow_peft-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3cc2a24975b31f88b21bb27314852ec764e8dc741781cd9117f48aea85262725`
MD5	`2ff94fff7dcc2abb33815fde569771b8`
BLAKE2b-256	`f8c352296f5bb7eb3d610c4d1150a44b495e0e27b57b9e8f1ad19c6730d51c46`

See more details on using hashes here.

File details

Details for the file shadow_peft-0.1.0-py3-none-any.whl.

File metadata

Download URL: shadow_peft-0.1.0-py3-none-any.whl
Upload date: Apr 21, 2026
Size: 35.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for shadow_peft-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`21ec27e5bb3a35a3829fb27870bdb142e59db56953f5c4d7d486497298c139c2`
MD5	`09377d2390f1baee047b132317a767fe`
BLAKE2b-256	`ab7cb3e247a3c7c53b79316f45fb1915a87b5b9208dd3298e47dd6a5655cadaf`

See more details on using hashes here.

shadow-peft 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ShadowPEFT

How It Works

Table of Contents

Supported Models

Installation

Quick Start

Examples

Usage

1. Implicit Shadow Model

2. Explicit Shadow Model [Recommendation]

3. ShadowForCausalLM — generation & training

4. ShadowForSequenceClassification

5. AutoModelForCausalLMWithHiddenProjection

Configuration Reference

Saving and Loading

Save a checkpoint

Load a checkpoint

Push to the Hugging Face Hub

Load from the Hub

Exporting the Shadow Model

Training with HF Trainer

Notes and Limitations

Contributors

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

3. `ShadowForCausalLM` — generation & training

4. `ShadowForSequenceClassification`

5. `AutoModelForCausalLMWithHiddenProjection`