Skip to main content

Self-Adaptive Context Pruning for Coding Agents

Project description

Installation

Basic Installation

Install from PyPI:

pip install swe-pruner

Flash Attention Setup

SwePruner requires flash-attn which needs to be installed separately based on your system configuration. You can install it using one of the following methods:

  1. Pre-built wheel (recommended): Download the appropriate wheel file for your system from the flash-attention releases and install it:
pip install flash_attn-<version>-<platform>.whl
  1. From source: If no pre-built wheel is available, you can build from source:
pip install flash-attn --no-build-isolation

Note: Flash attention requires CUDA and specific PyTorch versions. Make sure your environment is compatible.

Model Download

The pre-trained model files are not included in the PyPI package. You need to download them separately:

  1. From HuggingFace Hub (if available):
# Using huggingface-hub
huggingface-cli download <model-repo-id> --local-dir ./model
  1. Manual download: Download the model files and place them in a directory (e.g., ./model) with the following structure:
model/
├── config.json
├── model.safetensors
├── tokenizer.json
├── tokenizer_config.json
└── ... (other tokenizer files)

Usage

Command Line Interface

Start the FastAPI server using the CLI:

swe-pruner serve --model-path ./model --port 8000

Options:

  • --host / -h: Host to bind the server to (default: 0.0.0.0)
  • --port / -p: Port to run the server on (default: 8000)
  • --model-path / -m: Path to model directory (overrides SWEPRUNER_MODEL_PATH environment variable)

You can also set the model path using an environment variable:

export SWEPRUNER_MODEL_PATH=./model
swe-pruner serve

Python API

Basic Usage

from hf.prune_wrapper import SwePrunerForCodePruning, PruneRequest

# Load the model
model = SwePrunerForCodePruning.from_pretrained("./model")

# Create a prune request
request = PruneRequest(
    query="Find functions that handle user authentication",
    code="""
def login(username, password):
    # Authentication logic
    if verify_credentials(username, password):
        return create_session(username)
    return None

def logout(session_id):
    # Logout logic
    invalidate_session(session_id)
    """,
    threshold=0.5,
    always_keep_first_frags=False,
    chunk_overlap_tokens=50
)

# Prune the code
response = model.prune(request)

print(f"Relevance score: {response.score}")
print(f"Pruned code:\n{response.pruned_code}")
print(f"Token count: {response.origin_token_cnt} -> {response.left_token_cnt}")

API Response

The PruneResponse object contains:

  • score: Document-level relevance score (float)
  • pruned_code: Pruned code string with filtered sections marked
  • token_scores: List of [token, score] pairs
  • kept_frags: List of kept line numbers
  • origin_token_cnt: Original token count
  • left_token_cnt: Remaining token count after pruning
  • model_input_token_cnt: Total tokens sent to the model
  • error_msg: Error message if any (optional)

FastAPI Server

Once the server is running, you can interact with it via HTTP:

Health Check

curl http://localhost:8000/health

Prune Code

curl -X POST http://localhost:8000/prune \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Find authentication functions",
    "code": "def login(): ...",
    "threshold": 0.5
  }'

Configuration

Model Parameters

The model supports various configuration options through SwePrunerConfig:

  • backbone_model_name_or_path: Backbone model identifier
  • bottleneck: Bottleneck dimension (default: 256)
  • dropout: Dropout rate (default: 0.4)
  • num_fusion_layers: Number of fusion layers (default: 1)
  • num_heads: Number of attention heads (default: 8)
  • use_multi_layer_fusion: Whether to use multi-layer fusion (default: True)
  • compression_head_type: Type of compression head ("ffn", "simple", or "crf")

Pruning Parameters

  • threshold: Score threshold for keeping tokens (default: 0.5)
  • always_keep_first_frags: Always keep the first N fragments (default: False)
  • chunk_overlap_tokens: Overlap tokens between chunks for long code (default: 50)

Requirements

  • Python >= 3.12
  • PyTorch >= 2.8.0
  • Transformers >= 4.57.1
  • CUDA (for GPU acceleration)
  • Flash Attention 2

See pyproject.toml for the complete list of dependencies.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swe_pruner-0.1.0.tar.gz (54.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swe_pruner-0.1.0-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file swe_pruner-0.1.0.tar.gz.

File metadata

  • Download URL: swe_pruner-0.1.0.tar.gz
  • Upload date:
  • Size: 54.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.22

File hashes

Hashes for swe_pruner-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3b9a941ca24fd71db4582db04619fe825aa2d3448d4b562199eab500a50def75
MD5 2dea96618113ea5d5c5a56fb27b0b074
BLAKE2b-256 22432033f29fe3180a705afc791ddc21d29f0f676974f101ea1e6a05433cfe97

See more details on using hashes here.

File details

Details for the file swe_pruner-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: swe_pruner-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.22

File hashes

Hashes for swe_pruner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 078044bd116c33cc4590c7ca4fc5904d6eb7fba9cfdc8baa5f13bce205d9c47c
MD5 63be45c555fe093eb34f3e0fcdf90f80
BLAKE2b-256 a314cfbb32c24047ff51def970ad931c6d6b9433a3814eb06027a8000d9fbea8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page