FMCHISEL is a library which aims at improving LLM training + inference effectiveness and efficiency from an algorithm perspective with quantization, pruning, optimizers, etc.
Project description
fmchisel – Efficient Foundation Model Algorithms
State-of-the-art compression & distillation recipes for Large Language Models
✨ Overview
fmchisel (Foundation Model Chisel) is an open-source research library that makes it simple to:
- Compress LLMs with cutting-edge pruning and quantization techniques.
- Distill knowledge from larger models to smaller ones.
- Accelerate inference on consumer hardware by combining sparse + low-bit weight formats.
- Train efficiently with advanced optimizers such as schedule-free AdamW.
- Prototype new compression ideas rapidly.
fmchisel is built on PyTorch and integrates seamlessly with 📚 🤗 Transformers.
📦 Installation
PyPi Package
pip install fmchisel[all]
Source
To install from source Linux is required (enforced by setup). Installing on macOS or Windows will fail at setup time:
# Clone the repo
git clone https://github.com/linkedin/fmchisel.git
cd fmchisel
# Base install
pip install -e .
# Optional extras
# - inference: pruning/quantization via llmcompressor
# - train: distillation (Lightning, liger-kernel)
# - all: both of the above
pip install -e ".[inference]"
pip install -e ".[train]"
# or
pip install -e ".[all]"
🚀 Quick Start
Ready-to-run recipes in examples/:
- Distillation:
bash examples/distillation/run.sh - Unstructured or N:M pruning (ALPS, SparseGPT, Wanda):
bash examples/pruning/run.sh - Structured pruning (OSSCAR):
bash examples/structured_pruning/run.sh - Quantization (QuantEase via YAML recipes):
bash examples/quantization/run_quantization.sh
Tweak the scripts or pass flags to adjust models, datasets, and hyper-parameters.
🗂️ Project Structure
fmchisel/
│
├─ data/ # Calibration & data utilities
├─ distillation/ # Knowledge-distillation components
├─ pruning/ # ALPS + OSSCAR implementations; SparseGPT/Wanda via llmcompressor
├─ quantization/ # QuantEase & helpers
├─ optimizers/ # AdamW schedule-free implementation
├─ utils/ # Callbacks, training helpers
└─ config.py # Global configuration
examples/ # End-to-end reproducible recipes
tests/ # PyTest suite
🧪 Research Components
| Area | Algorithm(s) | Implementation Module |
|---|---|---|
| Pruning | ALPS (unstructured, N:M) | fmchisel.pruning.alps |
| Structured | OSSCAR (MLP/attn-group drop) | fmchisel.pruning.osscar |
| Quantization | QuantEase (weight-only/group) | fmchisel.quantization.quantease |
| Distillation | Per-token KD (e.g., JSD) | fmchisel.distillation.losses |
| Optimization | AdamW Schedule-Free | fmchisel.optimizers.adamw_schedulefree |
Notes:
- SparseGPT and Wanda pruning are available through
llmcompressorand wired up inexamples/pruning/pruning_utils.py. - Quantization uses
llmcompressorpipelines with a QuantEase modifier and YAML recipes. - To combine pruning and quantization, compose both modifiers in a single YAML recipe and pass it to
llmcompressor.oneshot. Seellmcompressordocumentation for composing modifiers. Example composite recipes are not included in this repo.
Minimal Python usage (grounded in the repo)
Pruning (ALPS or SparseGPT/Wanda) via oneshot and HFCalibrationDataLoader:
from llmcompressor import oneshot
from transformers import AutoTokenizer
from fmchisel.data.calibration_datautil import HFCalibrationDataLoader
from fmchisel.pruning.alps.base import ALPSModifier
model_id = "Qwen/Qwen3-0.6B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
dataset = HFCalibrationDataLoader(
nsamples=1024,
tokenizer=tokenizer,
max_seq_length=tokenizer.model_max_length,
dataset="allenai/c4",
data_field="text",
data_dir="en",
data_split="train",
).get_tokenized_calibration()
recipe = ALPSModifier(sparsity=0.5, mask_structure="2:4", targets="__ALL_PRUNABLE__")
oneshot(model=model_id, dataset=dataset, recipe=recipe, output_dir="out/pruned")
Structured pruning (OSSCAR):
from llmcompressor import oneshot
from transformers import AutoTokenizer
from fmchisel.data.calibration_datautil import HFCalibrationDataLoader
from fmchisel.pruning.osscar.base import OSSCARModifier
model_id = "Qwen/Qwen3-0.6B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
dataset = HFCalibrationDataLoader(
nsamples=1024,
tokenizer=tokenizer,
max_seq_length=tokenizer.model_max_length,
dataset="allenai/c4",
data_field="text",
data_dir="en",
data_split="train",
).get_tokenized_calibration()
recipe = OSSCARModifier(num_drop_mlp_neuron=128, num_drop_attn_group=1)
oneshot(model=model_id, dataset=dataset, recipe=recipe, output_dir="out/structured")
Quantization (QuantEase) is driven by YAML recipes (see examples/quantization/recipes/*):
bash examples/quantization/run_quantization.sh
Distillation with JSD loss (Lightning + FSDP):
bash examples/distillation/run.sh
🛠️ Contributing
- Fork & clone the repository.
- Install dev deps:
pip install -e ".[dev]"(note: A Linux system is required.) - Run linters/formatters:
make checkstyle. - Execute tests:
make test. - Open a pull request!
[!NOTE] Please open an issue first to discuss major changes.
🔒 License
See LICENSE for details.
📝 Citation
@software{behdin2025,
author = {Behdin, Kayhan and Fatahibaarzi, Ata and Yun, Dai and
Song, Qingquan and Kothapalli, Vignesh and Tang, Shao and
Sang, Hejian and Gupta, Aman and Wang, Zhipeng and
Dexter, Gregory and Zhu, Sirou and Zhu, Siyu},
title = {fmchisel},
year = {2025},
}
Additional references
This library implements compression methods from the following papers:
@article{meng2024alps,
title={Alps: Improved optimization for highly sparse one-shot pruning for large language models},
author={Meng, Xiang and Behdin, Kayhan and Wang, Haoyue and Mazumder, Rahul},
journal={Advances in Neural Information Processing Systems},
volume={37},
pages={37594--37625},
year={2024}
}
@inproceedings{mengosscar,
title={OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization},
author={Meng, Xiang and Ibrahim, Shibal and Behdin, Kayhan and Hazimeh, Hussein and Ponomareva, Natalia and Mazumder, Rahul},
booktitle={Forty-first International Conference on Machine Learning}
}
@article{behdin2023quantease,
title={QuantEase: Optimization-based quantization for language models},
author={Behdin, Kayhan and Acharya, Ayan and Gupta, Aman and Song, Qingquan and Zhu, Siyu and Keerthi, Sathiya and Mazumder, Rahul},
journal={arXiv preprint arXiv:2309.01885},
year={2023}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fmchisel_nightly-0.1.2.dev20251023200538.tar.gz.
File metadata
- Download URL: fmchisel_nightly-0.1.2.dev20251023200538.tar.gz
- Upload date:
- Size: 77.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7cfa4f6df1407ec60b85589d7ce141db31cad07248936cc39f7805be7961c73
|
|
| MD5 |
ef2df9a0cb2408e697a2fd95e6f85671
|
|
| BLAKE2b-256 |
e1d7291b3c8ffc2de5469647017e234cef3d5e038c1a63421b7c8d63cef03eab
|
Provenance
The following attestation bundles were made for fmchisel_nightly-0.1.2.dev20251023200538.tar.gz:
Publisher:
publish-nightly.yml on linkedin/fmchisel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fmchisel_nightly-0.1.2.dev20251023200538.tar.gz -
Subject digest:
d7cfa4f6df1407ec60b85589d7ce141db31cad07248936cc39f7805be7961c73 - Sigstore transparency entry: 634821338
- Sigstore integration time:
-
Permalink:
linkedin/fmchisel@626648ad08f913391e1ff21c3a4a5e201ce7dfa9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/linkedin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-nightly.yml@626648ad08f913391e1ff21c3a4a5e201ce7dfa9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fmchisel_nightly-0.1.2.dev20251023200538-py3-none-any.whl.
File metadata
- Download URL: fmchisel_nightly-0.1.2.dev20251023200538-py3-none-any.whl
- Upload date:
- Size: 62.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7c55602e54afd8c7a074d7d556ab5c7a93210ba599cbfeaf47d450b177e4eda
|
|
| MD5 |
173fd02d8760c8b87ba87827935c898b
|
|
| BLAKE2b-256 |
25c73b410e47ac07dfa07a8a7da98429261638725de9776bc9bc639da9dd7284
|
Provenance
The following attestation bundles were made for fmchisel_nightly-0.1.2.dev20251023200538-py3-none-any.whl:
Publisher:
publish-nightly.yml on linkedin/fmchisel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fmchisel_nightly-0.1.2.dev20251023200538-py3-none-any.whl -
Subject digest:
d7c55602e54afd8c7a074d7d556ab5c7a93210ba599cbfeaf47d450b177e4eda - Sigstore transparency entry: 634821354
- Sigstore integration time:
-
Permalink:
linkedin/fmchisel@626648ad08f913391e1ff21c3a4a5e201ce7dfa9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/linkedin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-nightly.yml@626648ad08f913391e1ff21c3a4a5e201ce7dfa9 -
Trigger Event:
push
-
Statement type: