Skip to main content

A configuration-driven LLM compression framework using low-rank factorization and layer removal

Project description

Goldcrest

PyPI version Python 3.10+ License

LLM compression framework focusing on low-rank factorization (matrices and tensors), managing your whole experiment pipelines - compression, evaluation, profiling and logging.

Features

Category Options
Compression Algorithms SVD, Tucker, CP (CANDECOMP/PARAFAC), Tensor Train, weight pruning (layer removal)
Rank Selection Criteria Activation-informed algorithms like ASVD and SVD-LLM, entropies, Fisher Information, stable rank, and metrics included in information_flow (external download is required)
Algebra Backends CoLA (Lanczos, LOBPCG), PyTorch, TensorLy
Target Layers MLP, attention layers (Q/K/V/O), embedding layers
Evaluation lm-eval-harness integration, self-defined language task evaluation metrics
Profiling Memory (RSS, VMS, GPU allocated/reserved, peak), latency (per-phase), subprocess isolation
Logging CSV logging for experiments, including models, evaluations, and compression runs, with cross-experiment comparison

Quick Start

1. Install

pip install goldcrest

With optional dependencies:

pip install goldcrest[eval]    # lm-eval-harness integration
pip install goldcrest[cola]    # CoLA algebra backend

2. Configure

Create a YAML config file specifying your compression pipeline, this is an example:

model:
  name: Qwen/Qwen3-14B
  source: hf
  device: cuda

compression:
  svd:
    backend: cola
    cola:
      algorithm: lanczos

factorization:
  objects:
    - "model.layers[*].mlp.gate_proj"
    - "model.layers[*].mlp.up_proj"
    - "model.layers[*].self_attn.q_proj"
  func_name: svd
  rank: 64

evaluation:
  type: lm_eval         #   lm-eval-harness
  tasks: [arc_easy]

More example configurations are in config/.

3. Run

python scripts/examples/gpu/h100_activation_svd.py --config config/h100_activation_svd.yaml

Configuration Options

Section Key Description
model name HuggingFace model ID or local path
model device cuda or cpu
analysis type activation_metrics, fisher_information, weight_metrics
compression.svd backend cola, torch, tensorly
factorization objects Layer patterns to compress (supports wildcards)
factorization func_name svd, tucker, cp, tensor_train
evaluation type lm_eval for benchmark tasks

There are more detailed clarification about configuration settings for compression algorithms, and layer types to be compressed.

Example Scripts

CPU Examples (scripts/examples/cpu/)

Lightweight scripts for development and testing.

Script Description
svd_gemma3.py Single-pass SVD compression for Gemma-3
svd_qwen3.py Single-pass SVD compression for Qwen-3
loop_svd_gemma3.py Iterative SVD compression for Gemma-3
loop_svd_qwen3.py Iterative SVD compression for Qwen-3
asvd_svdllm_pipeline.py Combined ASVD and SVD-LLM pipeline
benchmark_reproducibility.py Reproducibility benchmarks
perplexity_evaluation.py Perplexity evaluation utilities
memory_profiling.py Memory usage profiling
profile_compressed.py Profile compressed model performance
comparison_baseline_compressed.py Compare baseline vs compressed models

GPU Examples (scripts/examples/gpu/)

GPU scripts for H100 and H200 workloads:

Script Description
h100_activation_svd.py Activation-guided SVD compression (H100)
h100_entropy_svd.py Entropy-based rank selection (H100)
h100_hybrid_mi.py Mutual information hybrid compression (H100)
h100_full_pipeline.py Complete analysis + compression + evaluation (H100)
h100_tucker.py Tucker decomposition compression (H100)
h100_cp.py CP (CANDECOMP/PARAFAC) decomposition (H100)
h100_tensor_train.py Tensor-Train decomposition (H100)
h100_weight_pruning.py Weight pruning compression (H100)
h200_activation_svd.py Activation-guided SVD compression (H200)
h200_entropy_svd.py Entropy-based rank selection (H200)
h200_hybrid_mi.py Mutual information hybrid compression (H200)
h200_full_pipeline.py Complete analysis + compression + evaluation (H200)
h200_tucker.py Tucker decomposition compression (H200)
h200_cp.py CP (CANDECOMP/PARAFAC) decomposition (H200)
h200_tensor_train.py Tensor-Train decomposition (H200)
h200_weight_pruning.py Weight pruning compression (H200)
asvd_svdllm_pipeline.py Combined ASVD and SVD-LLM pipeline
benchmark_reproducibility.py Reproducibility benchmarks
perplexity_evaluation.py Perplexity evaluation utilities
memory_profiling.py Memory usage profiling
profile_compressed.py Profile compressed model performance
comparison_baseline_compressed.py Compare baseline vs compressed models

Third-Party Integrations (scripts/examples/third-party/)

Script Description
cola_vs_torch_benchmark.py CoLA vs PyTorch SVD performance comparison
lanczos_truncated_svd.py Lanczos algorithm for truncated SVD
lobpcg_ill_conditioned.py LOBPCG for ill-conditioned matrices
info_flow_metrics_loading.py Information flow metric computation
layer_sensitivity_correlation.py Layer sensitivity analysis
metric_guided_compression.py Metric-driven compression selection
mi_hybrid_compression.py Mutual information hybrid approach
third_party_full_pipeline.py Full pipeline with third-party backends

Project Structure

goldcrest/
├── config/                # Experiment configs plus base/ and profiles/
│   ├── base/
│   └── profiles/
├── docs/                  # Documentation and changelogs
├── logs/                  # Runtime logs
├── scripts/
│   ├── bash/              # Shell scripts (cpu/, gpu/, third-party/)
│   ├── examples/          # Python examples (cpu/, gpu/, third-party/)
│   ├── logs/              # Script execution logs
│   └── utils/             # Utility scripts
├── tests/                 # Pytest regression and integration suite
└── goldcrest/
    ├── config/            # Config loader
    ├── framework/         # Core framework (layers, state, IO)
    ├── orchestration/     # Pipeline orchestration
    └── plugins/
        ├── analysis/      # Metrics and layer selection
        ├── compression/   # SVD, Tucker, CP, pruning
        ├── evaluation/    # lm-eval integration
        └── models/        # Model utilities

Documentation

  • Architecture — Plugin-based design, EventBus, and workflow system
  • Changelogs — Notes on recent fixes
  • Successful Runs — H200 GPU benchmark results (7 compression strategies, third-party tests)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

goldcrest-0.1.0.tar.gz (164.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

goldcrest-0.1.0-py3-none-any.whl (193.8 kB view details)

Uploaded Python 3

File details

Details for the file goldcrest-0.1.0.tar.gz.

File metadata

  • Download URL: goldcrest-0.1.0.tar.gz
  • Upload date:
  • Size: 164.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for goldcrest-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9741272eb65e87bee13cbe3aed1d1f0def971c6f381c3286bdb9bdc1bd2a665b
MD5 1bfe4811eb6dfb3e25dffe7a6c545410
BLAKE2b-256 8d5eeb39e4e8ed53205f2cb9a0b736e3b9bf4478ed74126ff2e80b0caf05f6a1

See more details on using hashes here.

File details

Details for the file goldcrest-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: goldcrest-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 193.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for goldcrest-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1801ca784fad826d255657aa6dd3404e2530fadee09caccb2543e5c6c3f07fe0
MD5 4c5aa4d02460c291cac31378fdfd6a16
BLAKE2b-256 f35ae9c12c5c962c209ce42e48013b4b0871c6431906de217e6de9b1518a1394

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page