Self-Adaptive Context Pruning for Coding Agents
Project description
Installation
Basic Installation
Install from PyPI:
pip install swe-pruner
Flash Attention Setup
SwePruner requires flash-attn which needs to be installed separately based on your system configuration. You can install it using one of the following methods:
- Pre-built wheel (recommended): Download the appropriate wheel file for your system from the flash-attention releases and install it:
pip install flash_attn-<version>-<platform>.whl
- From source: If no pre-built wheel is available, you can build from source:
pip install flash-attn --no-build-isolation
Note: Flash attention requires CUDA and specific PyTorch versions. Make sure your environment is compatible.
Model Download
The pre-trained model files are not included in the PyPI package. You need to download them separately:
- From HuggingFace Hub (if available):
# Using huggingface-hub
huggingface-cli download <model-repo-id> --local-dir ./model
- Manual download: Download the model files and place them in a directory (e.g.,
./model) with the following structure:
model/
├── config.json
├── model.safetensors
├── tokenizer.json
├── tokenizer_config.json
└── ... (other tokenizer files)
Usage
Command Line Interface
Start the FastAPI server using the CLI:
swe-pruner serve --model-path ./model --port 8000
Options:
--host/-h: Host to bind the server to (default:0.0.0.0)--port/-p: Port to run the server on (default:8000)--model-path/-m: Path to model directory (overridesSWEPRUNER_MODEL_PATHenvironment variable)
You can also set the model path using an environment variable:
export SWEPRUNER_MODEL_PATH=./model
swe-pruner serve
Python API
Basic Usage
from hf.prune_wrapper import SwePrunerForCodePruning, PruneRequest
# Load the model
model = SwePrunerForCodePruning.from_pretrained("./model")
# Create a prune request
request = PruneRequest(
query="Find functions that handle user authentication",
code="""
def login(username, password):
# Authentication logic
if verify_credentials(username, password):
return create_session(username)
return None
def logout(session_id):
# Logout logic
invalidate_session(session_id)
""",
threshold=0.5,
always_keep_first_frags=False,
chunk_overlap_tokens=50
)
# Prune the code
response = model.prune(request)
print(f"Relevance score: {response.score}")
print(f"Pruned code:\n{response.pruned_code}")
print(f"Token count: {response.origin_token_cnt} -> {response.left_token_cnt}")
API Response
The PruneResponse object contains:
score: Document-level relevance score (float)pruned_code: Pruned code string with filtered sections markedtoken_scores: List of [token, score] pairskept_frags: List of kept line numbersorigin_token_cnt: Original token countleft_token_cnt: Remaining token count after pruningmodel_input_token_cnt: Total tokens sent to the modelerror_msg: Error message if any (optional)
FastAPI Server
Once the server is running, you can interact with it via HTTP:
Health Check
curl http://localhost:8000/health
Prune Code
curl -X POST http://localhost:8000/prune \
-H "Content-Type: application/json" \
-d '{
"query": "Find authentication functions",
"code": "def login(): ...",
"threshold": 0.5
}'
Configuration
Model Parameters
The model supports various configuration options through SwePrunerConfig:
backbone_model_name_or_path: Backbone model identifierbottleneck: Bottleneck dimension (default: 256)dropout: Dropout rate (default: 0.4)num_fusion_layers: Number of fusion layers (default: 1)num_heads: Number of attention heads (default: 8)use_multi_layer_fusion: Whether to use multi-layer fusion (default: True)compression_head_type: Type of compression head ("ffn", "simple", or "crf")
Pruning Parameters
threshold: Score threshold for keeping tokens (default: 0.5)always_keep_first_frags: Always keep the first N fragments (default: False)chunk_overlap_tokens: Overlap tokens between chunks for long code (default: 50)
Requirements
- Python >= 3.12
- PyTorch >= 2.8.0
- Transformers >= 4.57.1
- CUDA (for GPU acceleration)
- Flash Attention 2
See pyproject.toml for the complete list of dependencies.
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swe_pruner-0.1.0.tar.gz.
File metadata
- Download URL: swe_pruner-0.1.0.tar.gz
- Upload date:
- Size: 54.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b9a941ca24fd71db4582db04619fe825aa2d3448d4b562199eab500a50def75
|
|
| MD5 |
2dea96618113ea5d5c5a56fb27b0b074
|
|
| BLAKE2b-256 |
22432033f29fe3180a705afc791ddc21d29f0f676974f101ea1e6a05433cfe97
|
File details
Details for the file swe_pruner-0.1.0-py3-none-any.whl.
File metadata
- Download URL: swe_pruner-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
078044bd116c33cc4590c7ca4fc5904d6eb7fba9cfdc8baa5f13bce205d9c47c
|
|
| MD5 |
63be45c555fe093eb34f3e0fcdf90f80
|
|
| BLAKE2b-256 |
a314cfbb32c24047ff51def970ad931c6d6b9433a3814eb06027a8000d9fbea8
|