A configuration-driven LLM compression framework using low-rank factorization and layer removal
Project description
Goldcrest
LLM compression framework focusing on low-rank factorization (matrices and tensors), managing your whole experiment pipelines - compression, evaluation, profiling and logging.
Features
| Category | Options |
|---|---|
| Compression Algorithms | SVD, Tucker, CP (CANDECOMP/PARAFAC), Tensor Train, weight pruning (layer removal) |
| Rank Selection Criteria | Activation-informed algorithms like ASVD and SVD-LLM, entropies, Fisher Information, stable rank, and metrics included in information_flow (external download is required) |
| Algebra Backends | CoLA (Lanczos, LOBPCG), PyTorch, TensorLy |
| Target Layers | MLP, attention layers (Q/K/V/O), embedding layers |
| Evaluation | lm-eval-harness integration, self-defined language task evaluation metrics |
| Profiling | Memory (RSS, VMS, GPU allocated/reserved, peak), latency (per-phase), subprocess isolation |
| Logging | CSV logging for experiments, including models, evaluations, and compression runs, with cross-experiment comparison |
Quick Start
1. Install
pip install goldcrest
With optional dependencies:
pip install goldcrest[eval] # lm-eval-harness integration
pip install goldcrest[cola] # CoLA algebra backend
2. Configure
Create a YAML config file specifying your compression pipeline, this is an example:
model:
name: Qwen/Qwen3-14B
source: hf
device: cuda
compression:
svd:
backend: cola
cola:
algorithm: lanczos
factorization:
objects:
- "model.layers[*].mlp.gate_proj"
- "model.layers[*].mlp.up_proj"
- "model.layers[*].self_attn.q_proj"
func_name: svd
rank: 64
evaluation:
type: lm_eval # lm-eval-harness
tasks: [arc_easy]
More example configurations are in config/.
3. Run
python scripts/examples/gpu/h100_activation_svd.py --config config/h100_activation_svd.yaml
Configuration Options
| Section | Key | Description |
|---|---|---|
model |
name |
HuggingFace model ID or local path |
model |
device |
cuda or cpu |
analysis |
type |
activation_metrics, fisher_information, weight_metrics |
compression.svd |
backend |
cola, torch, tensorly |
factorization |
objects |
Layer patterns to compress (supports wildcards) |
factorization |
func_name |
svd, tucker, cp, tensor_train |
evaluation |
type |
lm_eval for benchmark tasks |
There are more detailed clarification about configuration settings for compression algorithms, and layer types to be compressed.
Example Scripts
CPU Examples (scripts/examples/cpu/)
Lightweight scripts for development and testing.
| Script | Description |
|---|---|
svd_gemma3.py |
Single-pass SVD compression for Gemma-3 |
svd_qwen3.py |
Single-pass SVD compression for Qwen-3 |
loop_svd_gemma3.py |
Iterative SVD compression for Gemma-3 |
loop_svd_qwen3.py |
Iterative SVD compression for Qwen-3 |
asvd_svdllm_pipeline.py |
Combined ASVD and SVD-LLM pipeline |
benchmark_reproducibility.py |
Reproducibility benchmarks |
perplexity_evaluation.py |
Perplexity evaluation utilities |
memory_profiling.py |
Memory usage profiling |
profile_compressed.py |
Profile compressed model performance |
comparison_baseline_compressed.py |
Compare baseline vs compressed models |
GPU Examples (scripts/examples/gpu/)
GPU scripts for H100 and H200 workloads:
| Script | Description |
|---|---|
h100_activation_svd.py |
Activation-guided SVD compression (H100) |
h100_entropy_svd.py |
Entropy-based rank selection (H100) |
h100_hybrid_mi.py |
Mutual information hybrid compression (H100) |
h100_full_pipeline.py |
Complete analysis + compression + evaluation (H100) |
h100_tucker.py |
Tucker decomposition compression (H100) |
h100_cp.py |
CP (CANDECOMP/PARAFAC) decomposition (H100) |
h100_tensor_train.py |
Tensor-Train decomposition (H100) |
h100_weight_pruning.py |
Weight pruning compression (H100) |
h200_activation_svd.py |
Activation-guided SVD compression (H200) |
h200_entropy_svd.py |
Entropy-based rank selection (H200) |
h200_hybrid_mi.py |
Mutual information hybrid compression (H200) |
h200_full_pipeline.py |
Complete analysis + compression + evaluation (H200) |
h200_tucker.py |
Tucker decomposition compression (H200) |
h200_cp.py |
CP (CANDECOMP/PARAFAC) decomposition (H200) |
h200_tensor_train.py |
Tensor-Train decomposition (H200) |
h200_weight_pruning.py |
Weight pruning compression (H200) |
asvd_svdllm_pipeline.py |
Combined ASVD and SVD-LLM pipeline |
benchmark_reproducibility.py |
Reproducibility benchmarks |
perplexity_evaluation.py |
Perplexity evaluation utilities |
memory_profiling.py |
Memory usage profiling |
profile_compressed.py |
Profile compressed model performance |
comparison_baseline_compressed.py |
Compare baseline vs compressed models |
Third-Party Integrations (scripts/examples/third-party/)
| Script | Description |
|---|---|
cola_vs_torch_benchmark.py |
CoLA vs PyTorch SVD performance comparison |
lanczos_truncated_svd.py |
Lanczos algorithm for truncated SVD |
lobpcg_ill_conditioned.py |
LOBPCG for ill-conditioned matrices |
info_flow_metrics_loading.py |
Information flow metric computation |
layer_sensitivity_correlation.py |
Layer sensitivity analysis |
metric_guided_compression.py |
Metric-driven compression selection |
mi_hybrid_compression.py |
Mutual information hybrid approach |
third_party_full_pipeline.py |
Full pipeline with third-party backends |
Project Structure
goldcrest/
├── config/ # Experiment configs plus base/ and profiles/
│ ├── base/
│ └── profiles/
├── docs/ # Documentation and changelogs
├── logs/ # Runtime logs
├── scripts/
│ ├── bash/ # Shell scripts (cpu/, gpu/, third-party/)
│ ├── examples/ # Python examples (cpu/, gpu/, third-party/)
│ ├── logs/ # Script execution logs
│ └── utils/ # Utility scripts
├── tests/ # Pytest regression and integration suite
└── goldcrest/
├── config/ # Config loader
├── framework/ # Core framework (layers, state, IO)
├── orchestration/ # Pipeline orchestration
└── plugins/
├── analysis/ # Metrics and layer selection
├── compression/ # SVD, Tucker, CP, pruning
├── evaluation/ # lm-eval integration
└── models/ # Model utilities
Documentation
- Architecture — Plugin-based design, EventBus, and workflow system
- Changelogs — Notes on recent fixes
- Successful Runs — H200 GPU benchmark results (7 compression strategies, third-party tests)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file goldcrest-0.1.0.tar.gz.
File metadata
- Download URL: goldcrest-0.1.0.tar.gz
- Upload date:
- Size: 164.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9741272eb65e87bee13cbe3aed1d1f0def971c6f381c3286bdb9bdc1bd2a665b
|
|
| MD5 |
1bfe4811eb6dfb3e25dffe7a6c545410
|
|
| BLAKE2b-256 |
8d5eeb39e4e8ed53205f2cb9a0b736e3b9bf4478ed74126ff2e80b0caf05f6a1
|
File details
Details for the file goldcrest-0.1.0-py3-none-any.whl.
File metadata
- Download URL: goldcrest-0.1.0-py3-none-any.whl
- Upload date:
- Size: 193.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1801ca784fad826d255657aa6dd3404e2530fadee09caccb2543e5c6c3f07fe0
|
|
| MD5 |
4c5aa4d02460c291cac31378fdfd6a16
|
|
| BLAKE2b-256 |
f35ae9c12c5c962c209ce42e48013b4b0871c6431906de217e6de9b1518a1394
|