LLM Compression and Optimization Library - Build the smallest runnable models that preserve target accuracy

These details have not been verified by PyPI

Project links

Project description

compressGPT

compressGPT is a flexible, modular training pipeline designed to bridge the gap between large foundation models and efficient edge-ready deployment.

It orchestrates the full lifecycle of Large Language Model (LLM) optimization — from supervised fine-tuning, through post-quantization recovery, to production-ready artifact generation — with a single, composable API.

Unlike rigid training scripts, compressGPT allows developers to define custom compression workflows by composing high-level stages such as ft, compress_8bit, and deploy. Whether you need a high-accuracy FP16 model for server inference or a compact GGUF Q8_0 model for CPU-only deployment, compressGPT automates tokenization, adapter training, memory-efficient evaluation, and artifact generation to deliver the smallest runnable model that preserves task-level accuracy.

🚀 Quick Start

To install:

pip install compressgpt-core

Below is a complete example that transforms a CSV dataset into a compressed, deployment-ready 4-bit Llama-3 model.

from compressgpt import (
    CompressTrainer,
    DatasetBuilder,
    TrainingConfig,
    DeploymentConfig,
)

prompt_template = (
    'Classify this notification as "Important" or "Ignore".\n'
    'Important: Security alerts, direct messages, payment confirmations.\n'
    'Ignore: Marketing promos, news digests, social media likes.\n\n'
    'Notification: {text}\n'
    'Answer:'
)

MODEL_ID = "meta-llama/Llama-3.2-1B"

# Build dataset
builder = DatasetBuilder(
    data_path="notifications.csv",
    model_id=MODEL_ID,
    prompt_template=prompt_template,
    input_column_map={"text": "message_body"},
    label_column="label",
).build()

# Run compression pipeline
trainer = CompressTrainer(
    model_id=MODEL_ID,
    dataset_builder=builder,
    stages=["ft", "compress_8bit", "deploy"],
    training_config=TrainingConfig(
        num_train_epochs=1,
        eval_strategy="epoch",
        save_strategy="epoch",
    ),
    deployment_config=DeploymentConfig(
        save_merged_fp16=True,     # Canonical dense model
        save_gguf_q8_0=True,       # GGUF Q8_0 for llama.cpp/Ollama
    ),
)

results = trainer.run()

print("Training complete!")
print(results)

📦 Deployment & Artifacts

Deployment Methods

The final stage of the pipeline, deploy, automatically converts your optimized model into production formats. Controlled by DeploymentConfig, it supports:

GGUF Q8_0 (save_gguf_q8_0): The recommended format for CPU/GPU inference. These files can be loaded directly into llama.cpp, Ollama, or llama-cpp-python. Bundled conversion — no external tools required.
GGUF F16/BF16 (save_gguf_f16, save_gguf_bf16): Higher precision GGUF for maximum accuracy.
Merged FP16 (save_merged_fp16): The canonical high-precision model. Use this for vLLM / TGI servers or further research.

Note: GGUF conversion uses vendored code from llama.cpp (MIT License). Currently supports F32, F16, BF16, and Q8_0 quantization. For other quantization types (Q4_K, Q5_K, etc.), use the external llama-quantize tool on the output.

Saving Models & Trade-offs

A unique feature of compressGPT is that every stage saves its own model and metrics. This allows you to deploy different versions of the same model to different devices based on their constraints.

1. Default Outputs (runs/default/) Every stage you run automatically saves its result:

ft_adapter/: High-accuracy LoRA adapter (best for Cloud/GPU).
compress_8bit_merged/: Quantized & recovered model (best for accuracy/size balance).
metrics.json: Compare ft vs compress_8bit accuracy to make data-driven deployment decisions.

2. Deploy Outputs (runs/default/deploy/) Production-ready artifacts are generated here only if enabled in DeploymentConfig:

runs/default/deploy/
├── merged_fp16/        # Universal format (vLLM, TGI)
└── gguf/
    └── model-q8_0.gguf # Optimized GGUF for llama.cpp/Ollama

⚠️ Current Support

Currently, compressGPT is optimized for Classification Tasks (e.g., Sentiment, Intent Detection, Spam Filtering). Support for Generation tasks (RAG, Chat) is coming soon.

Notes on Development

This project was built quickly and iteratively while converting an academic thesis into a working system. AI tools were used to accelerate implementation; all core ideas, abstractions, and evaluation logic come directly from my thesis and were reasoned about and validated manually.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.2

Feb 12, 2026

This version

0.3.1

Feb 6, 2026

0.3.0

Feb 3, 2026

0.2.0

Feb 1, 2026

0.1.5

Feb 1, 2026

0.1.4

Feb 1, 2026

0.1.3

Feb 1, 2026

0.1.2

Jan 31, 2026

0.1.0

Jan 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compressgpt_core-0.3.1.tar.gz (247.2 kB view details)

Uploaded Feb 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

compressgpt_core-0.3.1-py3-none-any.whl (231.4 kB view details)

Uploaded Feb 6, 2026 Python 3

File details

Details for the file compressgpt_core-0.3.1.tar.gz.

File metadata

Download URL: compressgpt_core-0.3.1.tar.gz
Upload date: Feb 6, 2026
Size: 247.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for compressgpt_core-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`e7e049a488ecf460c87913e8e0f313c9037f12ba0298ba8d9c5e2b52e5bbca3e`
MD5	`4fca851ada7fba08c3fecd8882e114e1`
BLAKE2b-256	`ffe813e714e93580488e9eb75190d4f0cb771d1288f021011595ae1a4a4c7026`

See more details on using hashes here.

File details

Details for the file compressgpt_core-0.3.1-py3-none-any.whl.

File metadata

Download URL: compressgpt_core-0.3.1-py3-none-any.whl
Upload date: Feb 6, 2026
Size: 231.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for compressgpt_core-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e050a9e72813a92db502749e128654816fccb6d13588dc16ba5c2ee7535bcd49`
MD5	`6086cd40abfda650db51355e470d252a`
BLAKE2b-256	`267d9f1f6e62b7c3331904c713646d710c085d43499850524bc4bc038e3c1b9e`

See more details on using hashes here.

compressgpt-core 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

compressGPT

🚀 Quick Start

📦 Deployment & Artifacts

Deployment Methods

Saving Models & Trade-offs

⚠️ Current Support

Notes on Development

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes