Skip to main content

Tree-based prompt compression library using cut-then-transform strategy for LLM applications

Project description

cutia-3

CUTIA: Quality-Aware Prompt Compressor

A prompt optimizer that cuts token usage while maintaining quality.

PyPI - Version GitHub Actions Workflow Status

Features

  • Tree-based Segmentation: Recursively splits prompts into segments for fine-grained optimization
  • Cut-then-Rewrite Strategy: Attempts to remove redundant content, then rewrites if cutting fails
  • Quality-Aware Compression: Maintains quality thresholds during compression
  • Multi-Candidate Generation: Generates multiple compression variants and chooses the best
  • DSPy Integration: First-class support for DSPy programs via the DSPy adapter

Installation

pip install cutia

Usage

DSPy Adapter

The DSPy adapter allows you to compress DSPy programs:

import dspy
from cutia.adapters.dspy_adapter import CUTIA

# Configure models
# prompt_model generates rewrite candidates
prompt_model = dspy.LM(
    model="openai/gpt-4o-mini",
    max_tokens=10000,
    temperature=1,
)
# task_model runs the task/program for scoring and validation
task_model = dspy.LM(
    model="openai/gpt-4.1-nano",
    max_tokens=2000,
    temperature=1,
)

# Define your metric
def your_metric(example, prediction, trace=None):
    return example.output == prediction.output

# Create optimizer
optimizer = CUTIA(
    prompt_model=prompt_model,
    task_model=task_model,
    metric=your_metric,
)

# Compile your program
compressed_program = optimizer.compile(
    student=your_program,
    trainset=train_examples,
    valset=val_examples,
)

Local AI

If you’re running CUTIA (or other prompt optimizers) against locally hosted LLMs, vLLM is a solid option for serving models: it supports high-throughput inference and handles concurrent requests efficiently.
vLLM

If you’d like to use a separate prompt model from the task model, llmsnap can help by enabling fast model switching via vLLM’s sleep/wake mode—so you can swap models in seconds.
llmsnap

How It Works

  1. Tree Building: The prompt is recursively split into segments (left, chunk, right)
  2. Node Processing: For each node in the tree:
    • Attempt to cut the chunk entirely
    • If cutting fails quality check, attempt to rewrite the chunk
    • Keep original if both fail
  3. Multi-Candidate: Generate multiple compression variants with different random seeds
  4. Selection: Evaluate candidates on validation set and select the best

Examples

Strawberry Problem (Letter Counting)

Demonstrates prompt compression on a character counting task using the CharBench dataset.

See src/cutia/examples/README.md for details.

Development

Development Installation

For development with testing and linting tools:

# Clone the repository
git clone https://github.com/napmany/cutia.git
cd cutia

# Install with development dependencies
uv sync --extra dev

Running Tests

# Install development dependencies (if not already installed)
uv sync --extra dev

# Run tests
make test

Code Quality

The project uses Ruff for linting and formatting, and Pyright for type checking:

# Run all checks (linting, formatting, and type checking)
make check

Dependencies

Core

  • No required dependencies for the base library

Install optional dependencies:

# For testing
uv sync --extra test

# For development (includes test dependencies)
uv sync --extra dev

Future Plans

  • Framework-agnostic core implementation (not tied to DSPy)
  • Additional adapters for other frameworks and platforms (LangChain, MLflow, etc.)
  • Standalone Python API for direct use
  • Enhanced chunking strategies

Star History

[!NOTE] ⭐️ Star this project to help others discover it!

Star History Chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cutia-0.0.2.tar.gz (26.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cutia-0.0.2-py3-none-any.whl (29.6 kB view details)

Uploaded Python 3

File details

Details for the file cutia-0.0.2.tar.gz.

File metadata

  • Download URL: cutia-0.0.2.tar.gz
  • Upload date:
  • Size: 26.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cutia-0.0.2.tar.gz
Algorithm Hash digest
SHA256 6036340a4c8b7c736c0c30646c886248238c6ac9aacc7e99d3bc263a24c58610
MD5 ed9f30094f0059ff3b70d11ab90dd4d6
BLAKE2b-256 e01312599f0937444a0a5f48315265a1bdaaba9f21ad6f36d05d916683010c8a

See more details on using hashes here.

Provenance

The following attestation bundles were made for cutia-0.0.2.tar.gz:

Publisher: publish.yml on napmany/cutia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cutia-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: cutia-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 29.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cutia-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 367b14e35f75bb0398ef68267d4a74ff531db78503f9f1e4303bcb7f679f3a8b
MD5 f24a26683067aa066d6826c7020e6b7f
BLAKE2b-256 7b264ecc725da8446bb3163c0c400c875581aa1ee6bcf1a37faab372ac6ac1cf

See more details on using hashes here.

Provenance

The following attestation bundles were made for cutia-0.0.2-py3-none-any.whl:

Publisher: publish.yml on napmany/cutia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page