Skip to main content

A library for structured pruning & Bias visualization of large language models

Project description

OptiPFair

A Python library for structured pruning of large language models, with a focus on GLU architectures.

Overview

OptiPFair enables efficient pruning of large language models while maintaining their performance. It implements various structured pruning methods, starting with MLP pruning for GLU architectures (as used in models like LLaMA, Mistral, etc.).

Key features:

  • GLU architecture-aware pruning that preserves model structure
  • Multiple neuron importance calculation methods
  • Support for both pruning percentage and target expansion rate
  • Simple Python API and CLI interface
  • Progress tracking and detailed statistics

Installation

# From PyPI (not yet available)
pip install optipfair

# From source
git clone https://github.com/yourusername/optipfair.git
cd optipfair
pip install -e .

Quick Start

Python API

from transformers import AutoModelForCausalLM
from optipfair import prune_model

# Load a pre-trained model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")

# Prune 20% of neurons from MLP layers using MAW method
pruned_model, stats = prune_model(
    model=model,
    pruning_type="MLP_GLU",
    neuron_selection_method="MAW",
    pruning_percentage=20,
    show_progress=True,
    return_stats=True
)

# Print pruning statistics
print(f"Original parameters: {stats['original_parameters']:,}")
print(f"Pruned parameters: {stats['pruned_parameters']:,}")
print(f"Reduction: {stats['reduction']:,} parameters ({stats['percentage_reduction']:.2f}%)")

# Save pruned model
pruned_model.save_pretrained("./pruned-llama-model")

Command-Line Interface

# Prune a model with default settings (10% pruning, MAW method)
optipfair prune --model-path meta-llama/Llama-3.2-1B --output-path ./pruned-model

# Prune with custom settings
optipfair prune \
  --model-path meta-llama/Llama-3.2-1B \
  --pruning-type MLP_GLU \
  --method MAW \
  --pruning-percentage 20 \
  --output-path ./pruned-model

# Use expansion rate instead of pruning percentage
optipfair prune \
  --model-path meta-llama/Llama-3.2-1B \
  --expansion-rate 140 \
  --output-path ./pruned-model

# Analyze a model's architecture
optipfair analyze --model-path meta-llama/Llama-3.2-1B

Neuron Selection Methods

OptiPFair supports three methods for calculating neuron importance:

  1. MAW (Maximum Absolute Weight) - Default method that identifies influential neurons based on the magnitude of their connections. Typically provides the best pruning results.

  2. VOW (Variance of Weights) - Identifies neurons based on the variance of their weights. May be useful for specific architectures.

  3. PON (Product of Norms) - Uses the product of L1 norms to identify important neurons. This method may be applicable in certain contexts.

Documentation

Complete documentation is available at https://yourusername.github.io/optipfair/.

Supported Models

OptiPFair is designed to work with transformer-based language models that use GLU architecture in their MLP layers, including:

  • LLaMA family (LLaMA, LLaMA-2, LLaMA-3)
  • Mistral models
  • And other models with similar GLU architecture

Expansion Rate vs Pruning Percentage

OptiPFair supports two ways to specify the pruning target:

  1. Pruning Percentage - Directly specify what percentage of neurons to remove (e.g., 20%)

  2. Expansion Rate - Specify the target expansion rate (ratio of intermediate size to hidden size) as a percentage (e.g., 140% instead of the default 400%)

The expansion rate approach is often more intuitive when comparing across different model scales.

Future Roadmap

  • Support for attention layer pruning
  • Whole block pruning
  • Quantization-aware pruning
  • Integrated evaluation benchmarks
  • Iterative pruning capabilities

Citation

If you use OptiPFair in your research, please cite:

@software{optipfair2025,
  author = {Pere Martra},
  title = {OptiPFair: A Library for Structured Pruning of Large Language Models},
  year = {2025},
  url = {https://github.com/yourusername/optipfair}
}

License

Apache 2.0

Contributing

Contributions are welcome! Please check out our contributing guidelines for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optipfair-0.1.0.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

optipfair-0.1.0-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file optipfair-0.1.0.tar.gz.

File metadata

  • Download URL: optipfair-0.1.0.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.13

File hashes

Hashes for optipfair-0.1.0.tar.gz
Algorithm Hash digest
SHA256 283d072b7017b1d5fea5540ac8c7c22ed4d87ca6f9b76258b3ccf374c3d11c66
MD5 f1aec78d453b2e55cf7ff6784435c1ae
BLAKE2b-256 988fc2eca3131d4efff4c14d3401141125a6b5c8860d688dae820df7fdeac63b

See more details on using hashes here.

File details

Details for the file optipfair-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: optipfair-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.13

File hashes

Hashes for optipfair-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a5e37f9b9c398d1cae3fefefba8dba24a263dee17da82fbb1fadb7305c31d1c
MD5 1091a265e3b093a9c378300c4070a93a
BLAKE2b-256 11821fc80ecb12ab0bf52412b3f897dcd50d3192b8c16f55c1c20d0ef1a34cd9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page