AIP Proxy — Token compression proxy for LLM APIs. Reduce costs 15-40% on any AI IDE (Antigravity, Cursor, VS Code, etc.)
Project description
AIP Proxy
Token compression proxy for LLM APIs. Reduce your AI coding costs by 15-40% without losing quality.
AIP Proxy sits between your AI IDE (Antigravity, Cursor, VS Code, etc.) and the LLM API, compressing prompts on the fly before they reach the model.
How it works
Your IDE ──> AIP Proxy (localhost:8090) ──> OpenAI / Gemini / Claude API
│
├─ Whitespace normalization
├─ Code comment removal
├─ Block deduplication
└─ Pattern abbreviation
4 compression passes, configurable by level:
| Level | Passes | Typical savings |
|---|---|---|
| 0 | None (passthrough) | 0% |
| 1 | Whitespace normalization | 5-10% |
| 2 | + Code compression + deduplication | 15-25% |
| 3 | + Pattern abbreviation | 25-40% |
Install
pip install aip-proxy
Quick start
# Start proxy targeting OpenAI
aip-proxy start --target https://api.openai.com/v1 --port 8090
# Or targeting Google Gemini
aip-proxy start --target https://generativelanguage.googleapis.com --port 8090
# Or any OpenAI-compatible API
aip-proxy start --target https://api.anthropic.com --port 8090
Then change your IDE's API endpoint to http://localhost:8090/v1.
Usage with Antigravity
- Install:
pip install aip-proxy - Start:
aip-proxy start --target https://generativelanguage.googleapis.com --port 8090 - In Antigravity settings, set API endpoint to
http://localhost:8090 - Done — you'll see savings in the proxy stats
Usage with Cursor / VS Code
- Install:
pip install aip-proxy - Start:
aip-proxy start --target https://api.openai.com/v1 --port 8090 - In your IDE settings, change the API base URL to
http://localhost:8090/v1 - Keep your API key as usual — the proxy forwards it transparently
Options
aip-proxy start --help
Options:
--target, -t Target API URL (required)
--port, -p Port to listen on (default: 8090)
--host Host to bind (default: 127.0.0.1)
--level, -l Compression: 0=off, 1=light, 2=balanced, 3=aggressive (default: 2)
--no-cache Disable response caching
--cache-ttl Cache TTL in seconds (default: 300)
Endpoints
| Endpoint | Description |
|---|---|
GET /health |
Proxy status and basic stats |
GET /stats |
Detailed compression and cache statistics |
* /{path} |
Proxied to target API |
Python API
from aip_proxy import TokenCompressor
tc = TokenCompressor(level=2)
messages = [
{"role": "user", "content": "your long prompt here..."}
]
compressed = tc.compress_messages(messages)
print(tc.get_savings())
# {'original_chars': 1500, 'compressed_chars': 1100, 'saved_chars': 400, 'savings_pct': 26.7, 'calls': 1}
How does it save money?
LLM APIs charge per token. A typical coding session sends thousands of tokens in context — much of it is:
- Redundant whitespace and blank lines
- Comments in code blocks (the model doesn't need them)
- Repeated code blocks across messages
- Verbose filler phrases
AIP Proxy removes this noise while preserving the semantic content the model needs.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aip_proxy-0.1.0.tar.gz.
File metadata
- Download URL: aip_proxy-0.1.0.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60e1479feaf561c0caf204b1d8feee3ae6fd11ae03eb7b77f0423dd64f18436f
|
|
| MD5 |
1508664509251fb7c589b4e9b06b0672
|
|
| BLAKE2b-256 |
7160783f4a03d127c7f6843ff25e873661d72940b0f093f1713ceec34f2591f9
|
File details
Details for the file aip_proxy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: aip_proxy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc927fe46229c19d58e3fa95822cc88bfe4533159fa9d72ea25fb6ccd3d9c4ad
|
|
| MD5 |
a98a32b02502f270d6f2e5e1bf463bda
|
|
| BLAKE2b-256 |
6a5bb2f3cc5fbea8808105989c5ee456dd555a096251d40369de1d600af6f145
|