Skip to main content

Self-Adaptive Context Pruning for Coding Agents

Project description

Installation

Basic Installation

Install from PyPI:

pip install swe-pruner

Flash Attention Setup

SwePruner requires flash-attn which needs to be installed separately based on your system configuration. You can install it using one of the following methods:

  1. Pre-built wheel (recommended): Download the appropriate wheel file for your system from the flash-attention releases and install it:
pip install flash_attn-<version>-<platform>.whl
  1. From source: If no pre-built wheel is available, you can build from source:
pip install flash-attn --no-build-isolation

Note: Flash attention requires CUDA and specific PyTorch versions. Make sure your environment is compatible.

Model Download

The pre-trained model files are not included in the PyPI package. You need to download them separately:

  1. From HuggingFace Hub (if available):
# Using huggingface-hub
huggingface-cli download ayanami-kitasan/code-pruner --local-dir ./model
  1. Manual download: Download the model files and place them in a directory (e.g., ./model) with the following structure:
model/
├── config.json
├── model.safetensors
├── tokenizer.json
├── tokenizer_config.json
└── ... (other tokenizer files)

Usage

Command Line Interface

Start the FastAPI server using the CLI:

swe-pruner --model-path ./model --port 8000

Options:

  • --host / -h: Host to bind the server to (default: 0.0.0.0)
  • --port / -p: Port to run the server on (default: 8000)
  • --model-path / -m: Path to model directory (overrides SWEPRUNER_MODEL_PATH environment variable)

You can also set the model path using an environment variable:

export SWEPRUNER_MODEL_PATH=./model
swe-pruner

Python API

Basic Usage

from swe_pruner.prune_wrapper import SwePrunerForCodePruning, PruneRequest

# Load the model
model = SwePrunerForCodePruning.from_pretrained("./model")

# Create a prune request
request = PruneRequest(
    query="Find functions that handle user authentication",
    code="""
def login(username, password):
    # Authentication logic
    if verify_credentials(username, password):
        return create_session(username)
    return None

def logout(session_id):
    # Logout logic
    invalidate_session(session_id)
    """,
    threshold=0.5,
    always_keep_first_frags=False,
    chunk_overlap_tokens=50
)

# Prune the code
response = model.prune(request)

print(f"Relevance score: {response.score}")
print(f"Pruned code:\n{response.pruned_code}")
print(f"Token count: {response.origin_token_cnt} -> {response.left_token_cnt}")

API Response

The PruneResponse object contains:

  • score: Document-level relevance score (float)
  • pruned_code: Pruned code string with filtered sections marked
  • token_scores: List of [token, score] pairs
  • kept_frags: List of kept line numbers
  • origin_token_cnt: Original token count
  • left_token_cnt: Remaining token count after pruning
  • model_input_token_cnt: Total tokens sent to the model
  • error_msg: Error message if any (optional)

FastAPI Server

Once the server is running, you can interact with it via HTTP:

Health Check

curl http://localhost:8000/health

Prune Code

curl -X POST http://localhost:8000/prune \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Find authentication functions",
    "code": "def login(): ...",
    "threshold": 0.5
  }'

Configuration

Model Parameters

The model supports various configuration options through SwePrunerConfig:

  • backbone_model_name_or_path: Backbone model identifier
  • bottleneck: Bottleneck dimension (default: 256)
  • dropout: Dropout rate (default: 0.4)
  • num_fusion_layers: Number of fusion layers (default: 1)
  • num_heads: Number of attention heads (default: 8)
  • use_multi_layer_fusion: Whether to use multi-layer fusion (default: True)
  • compression_head_type: Type of compression head ("ffn", "simple", or "crf")

Pruning Parameters

  • threshold: Score threshold for keeping tokens (default: 0.5)
  • always_keep_first_frags: Always keep the first N fragments (default: False)
  • chunk_overlap_tokens: Overlap tokens between chunks for long code (default: 50)

Requirements

  • Python >= 3.12
  • PyTorch >= 2.8.0
  • Transformers >= 4.57.1
  • CUDA (for GPU acceleration)
  • Flash Attention 2

See pyproject.toml for the complete list of dependencies.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swe_pruner-0.1.1.tar.gz (54.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swe_pruner-0.1.1-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file swe_pruner-0.1.1.tar.gz.

File metadata

  • Download URL: swe_pruner-0.1.1.tar.gz
  • Upload date:
  • Size: 54.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.22

File hashes

Hashes for swe_pruner-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9d365a09508c006e190dd1ca6fd3e2e586eadd538c28a8eaf9bc0c64cf47a0a2
MD5 2169b20a1689eeb2f6cd7109ed822ab9
BLAKE2b-256 9ad19fae039831518d402973f221499eacd2632bcbd9a9ceb45e1ffd86d76813

See more details on using hashes here.

File details

Details for the file swe_pruner-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: swe_pruner-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.22

File hashes

Hashes for swe_pruner-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6b445890b2f03f0a57ca77538237590b30ec84bf5411c22f7a5f0da1af053fb9
MD5 482b734c35050405671b0e8e97dec3d9
BLAKE2b-256 2098c2c93085bf228d2203680ffe30d0c5a6f80ce77195983605ce4784dc1bca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page