Tree-based prompt compression library using cut-then-transform strategy for LLM applications
Project description
CUTIA: Quality-Aware Prompt Compressor
A prompt optimizer that cuts token usage while maintaining quality.
Features
- Tree-based Segmentation: Recursively splits prompts into segments for fine-grained optimization
- Cut-then-Rewrite Strategy: Attempts to remove redundant content, then rewrites if cutting fails
- Quality-Aware Compression: Maintains quality thresholds during compression
- Multi-Candidate Generation: Generates multiple compression variants and chooses the best
- DSPy Integration: First-class support for DSPy programs via the DSPy adapter
Installation
pip install cutia
Usage
DSPy Adapter
The DSPy adapter allows you to compress DSPy programs:
import dspy
from cutia.adapters.dspy_adapter import CUTIA
# Configure models
# prompt_model generates rewrite candidates
prompt_model = dspy.LM(
model="openai/gpt-4o-mini",
max_tokens=10000,
temperature=1,
)
# task_model runs the task/program for scoring and validation
task_model = dspy.LM(
model="openai/gpt-4.1-nano",
max_tokens=2000,
temperature=1,
)
# Define your metric
def your_metric(example, prediction, trace=None):
return example.output == prediction.output
# Create optimizer
optimizer = CUTIA(
prompt_model=prompt_model,
task_model=task_model,
metric=your_metric,
)
# Compile your program
compressed_program = optimizer.compile(
student=your_program,
trainset=train_examples,
valset=val_examples,
)
Local AI
If you’re running CUTIA (or other prompt optimizers) against locally hosted LLMs, vLLM is a solid option for serving models: it supports high-throughput inference and handles concurrent requests efficiently.
vLLM
If you’d like to use a separate prompt model from the task model, llmsnap can help by enabling fast model switching via vLLM’s sleep/wake mode—so you can swap models in seconds.
llmsnap
How It Works
- Tree Building: The prompt is recursively split into segments (left, chunk, right)
- Node Processing: For each node in the tree:
- Attempt to cut the chunk entirely
- If cutting fails quality check, attempt to rewrite the chunk
- Keep original if both fail
- Multi-Candidate: Generate multiple compression variants with different random seeds
- Selection: Evaluate candidates on validation set and select the best
Examples
Strawberry Problem (Letter Counting)
Demonstrates prompt compression on a character counting task using the CharBench dataset.
See src/cutia/examples/README.md for details.
Development
Development Installation
For development with testing and linting tools:
# Clone the repository
git clone https://github.com/napmany/cutia.git
cd cutia
# Install with development dependencies
uv sync --extra dev
Running Tests
# Install development dependencies (if not already installed)
uv sync --extra dev
# Run tests
make test
Code Quality
The project uses Ruff for linting and formatting, and Pyright for type checking:
# Run all checks (linting, formatting, and type checking)
make check
Dependencies
Core
- No required dependencies for the base library
Install optional dependencies:
# For testing
uv sync --extra test
# For development (includes test dependencies)
uv sync --extra dev
Future Plans
- Framework-agnostic core implementation (not tied to DSPy)
- Additional adapters for other frameworks and platforms (LangChain, MLflow, etc.)
- Standalone Python API for direct use
- Enhanced chunking strategies
Star History
[!NOTE] ⭐️ Star this project to help others discover it!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cutia-0.0.2.tar.gz.
File metadata
- Download URL: cutia-0.0.2.tar.gz
- Upload date:
- Size: 26.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6036340a4c8b7c736c0c30646c886248238c6ac9aacc7e99d3bc263a24c58610
|
|
| MD5 |
ed9f30094f0059ff3b70d11ab90dd4d6
|
|
| BLAKE2b-256 |
e01312599f0937444a0a5f48315265a1bdaaba9f21ad6f36d05d916683010c8a
|
Provenance
The following attestation bundles were made for cutia-0.0.2.tar.gz:
Publisher:
publish.yml on napmany/cutia
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cutia-0.0.2.tar.gz -
Subject digest:
6036340a4c8b7c736c0c30646c886248238c6ac9aacc7e99d3bc263a24c58610 - Sigstore transparency entry: 775538343
- Sigstore integration time:
-
Permalink:
napmany/cutia@8a930b2e5dc6b79c36d01ab000e40c3a29b17153 -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/napmany
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8a930b2e5dc6b79c36d01ab000e40c3a29b17153 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cutia-0.0.2-py3-none-any.whl.
File metadata
- Download URL: cutia-0.0.2-py3-none-any.whl
- Upload date:
- Size: 29.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
367b14e35f75bb0398ef68267d4a74ff531db78503f9f1e4303bcb7f679f3a8b
|
|
| MD5 |
f24a26683067aa066d6826c7020e6b7f
|
|
| BLAKE2b-256 |
7b264ecc725da8446bb3163c0c400c875581aa1ee6bcf1a37faab372ac6ac1cf
|
Provenance
The following attestation bundles were made for cutia-0.0.2-py3-none-any.whl:
Publisher:
publish.yml on napmany/cutia
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cutia-0.0.2-py3-none-any.whl -
Subject digest:
367b14e35f75bb0398ef68267d4a74ff531db78503f9f1e4303bcb7f679f3a8b - Sigstore transparency entry: 775538409
- Sigstore integration time:
-
Permalink:
napmany/cutia@8a930b2e5dc6b79c36d01ab000e40c3a29b17153 -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/napmany
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8a930b2e5dc6b79c36d01ab000e40c3a29b17153 -
Trigger Event:
push
-
Statement type: