Proxy server to Argo API, OpenAI format compatible
Project description
argo-proxy
A universal API gateway for LLM services via ARGO. Translates between OpenAI, Anthropic, and Google GenAI API formats, routing requests to optimal upstream ARGO endpoints. Works with AI coding tools like Claude Code, Codex CLI, Aider, Gemini CLI, and more.
For detailed documentation, visit the argo-proxy ReadTheDocs page.
TL;DR
pip install argo-proxy # install
argo-proxy serve # start the proxy
A single proxy instance serves all 4 major LLM API formats:
| API Format | Endpoint | Example Client |
|---|---|---|
| OpenAI Chat Completions | /v1/chat/completions |
OpenAI SDK, Aider, OpenCode |
| OpenAI Responses | /v1/responses |
Codex CLI |
| Anthropic Messages | /v1/messages |
Claude Code, Kilo Code |
| Google GenAI | /v1beta/models/{model}:generateContent |
Gemini CLI |
NOTICE OF USAGE
The machine or server making API calls to Argo must be connected to the Argonne internal network or through a VPN on an Argonne-managed computer if you are working off-site. Your instance of the argo proxy should always be on-premise at an Argonne machine. The software is provided "as is," without any warranties. By using this software, you accept that the authors, contributors, and affiliated organizations will not be liable for any damages or issues arising from its use. You are solely responsible for ensuring the software meets your requirements.
Deployment
Prerequisites
-
Python 3.10+ is required.
Recommended: use conda, mamba, or pipx to manage an exclusive environment.
Conda/Mamba Download and install from: https://conda-forge.org/download/
pipx Download and install from: https://pipx.pypa.io/stable/installation/ -
Install:
PyPI current version:
pip install argo-proxy
To upgrade:
argo-proxy update check # check for updates (includes dependency status) argo-proxy update install # install latest stable argo-proxy update install --pre # install latest pre-release
Or, from source (at the repo root):
pip install .
Configuration
The application uses a YAML config file (v3 format). If you don't have one, First-Time Setup will create it interactively.
config_version: "3"
user: "your_username"
host: 0.0.0.0
port: 44497
verbose: true
argo_base_url: "https://apps.inside.anl.gov/argoapi"
Config file search order (first found is used):
./config.yaml(current directory)~/.config/argoproxy/config.yaml~/.argoproxy/config.yaml
Migrate from v1/v2 config:
argo-proxy config migrate /path/to/old/config.yaml
Running the Proxy
argo-proxy serve # default config search
argo-proxy serve /path/to/config.yaml # explicit config
argo-proxy serve --verbose --show # verbose mode, show config at startup
First-Time Setup
Create a new config interactively:
argo-proxy config init
This will:
- Prompt for your ANL username
- Select a random available port (can be overridden)
- Choose upstream environment (prod/dev/test)
- Validate connectivity to upstream URLs
- Write the config file to
~/.config/argoproxy/config.yaml
Configuration Options Reference
| Option | Description | Default |
|---|---|---|
config_version |
Config format version | "3" |
user |
Your ANL username | (required) |
host |
Host address to bind to | 0.0.0.0 |
port |
Port number | 44497 |
verbose |
Enable verbose logging | true |
argo_base_url |
Base URL for ARGO API | Dev URL |
native_openai_base_url |
Custom OpenAI endpoint (auto-derived if unset) | — |
native_anthropic_base_url |
Custom Anthropic endpoint (auto-derived if unset) | — |
anthropic_stream_mode |
Non-streaming Anthropic handling: force/retry/passthrough |
force |
force_conversion |
Always run full format conversion | false |
use_legacy_argo |
Use legacy ARGO gateway pipeline | false |
skip_url_validation |
Skip upstream URL checks on startup | false |
connection_test_timeout |
Seconds for URL validation | 5 |
resolve_overrides |
DNS overrides for SSH tunnels (host:port -> IP) | {} |
max_log_history |
Keep last N messages in verbose logs | 3 |
enable_payload_control |
Enable image payload size control | false |
max_payload_size |
Max image payload size in MB | 20 |
image_timeout |
Image download timeout in seconds | 30 |
concurrent_downloads |
Parallel image downloads | 10 |
CLI Reference
argo-proxy [-h] [--version] {serve,config,logs,update,models}
| Command | Description |
|---|---|
serve [config] |
Start the proxy server |
config edit |
Open config in default editor |
config validate |
Validate config and check connectivity |
config show |
Display resolved config |
config migrate |
Migrate v1/v2 config to v3 |
config init |
Interactive config setup |
config list |
List all found config files |
config env [prod|dev|test] |
Show or switch upstream environment |
logs collect [--type TYPE] |
Collect diagnostic logs |
update check |
Check for updates (argo-proxy + llm-rosetta) |
update install [--pre] |
Install latest version |
models [--json] |
List available models and aliases |
Key serve flags:
argo-proxy serve --verbose # verbose logging
argo-proxy serve --force-conversion # always convert via llm-rosetta
argo-proxy serve --username-passthrough # use API key as username
argo-proxy serve --anthropic-stream-mode retry # try non-streaming first
argo-proxy serve --legacy-argo # use legacy ARGO gateway pipeline
argo-proxy serve --dump-requests # dump request/response for debugging
Usage
Endpoints
API Format Endpoints
All four formats are served simultaneously from a single proxy instance:
| Endpoint | Format | Typical Client |
|---|---|---|
/v1/chat/completions |
OpenAI Chat Completions | OpenAI SDK, Aider, OpenCode |
/v1/responses |
OpenAI Responses | Codex CLI |
/v1/messages |
Anthropic Messages | Claude Code, Anthropic SDK |
/v1beta/models/{model}:generateContent |
Google GenAI | Gemini CLI |
/v1beta/models/{model}:streamGenerateContent |
Google GenAI (streaming) | Gemini CLI |
/v1/embeddings |
Embeddings | OpenAI SDK |
Utility Endpoints
| Endpoint | Description |
|---|---|
/v1/models |
List available models (OpenAI-compatible format) |
/refresh |
Reload model list from upstream (POST) |
/health |
Health check |
/version |
Version info with update status |
Timeout Override
You can override the default timeout with a timeout parameter in your request body. See Timeout Override Examples for details.
Models
Models are fetched dynamically from upstream at startup. Use argo-proxy models or GET /v1/models to list all available models and aliases. Refresh without restart via POST /refresh.
Model Naming
Model names are flexible and case-insensitive:
- OpenAI:
argo:gpt-4o,gpt-4o,argo:gpt-4.1-mini,argo:o3-mini - Claude:
argo:claude-4-opusorargo:claude-opus-4,argo:claude-4.6-sonnet - Gemini:
argo:gemini-2.5-pro,argo:gemini-2.5-flash - Embedding:
argo:text-embedding-ada-002,argo:text-embedding-3-small
The argo: prefix is optional -- bare model names like gpt-4o or claude-4-sonnet work too.
Tool Calls
Native function calling is supported for all three providers:
- OpenAI models: Full native function calling
- Anthropic models: Full native function calling
- Gemini models: Full native function calling
Available on /v1/chat/completions in both streaming and non-streaming modes. Cross-format tool call translation is handled automatically via llm-rosetta.
For usage details, refer to the OpenAI function calling guide and the tool calls documentation.
A lightweight tool management library is also available: ToolRegistry.
AI Coding Tools Integration
Argo-proxy works out of the box with popular AI coding tools:
| Tool | API Format | Base URL Env Var | Value |
|---|---|---|---|
| Claude Code | Anthropic | ANTHROPIC_BASE_URL |
http://localhost:<port> |
| Codex CLI | OpenAI Responses | OPENAI_BASE_URL |
http://localhost:<port>/v1 |
| Aider | OpenAI or Anthropic | OPENAI_API_BASE / ANTHROPIC_BASE_URL |
http://localhost:<port>/v1 |
| Gemini CLI | Google GenAI | GOOGLE_GEMINI_BASE_URL |
http://localhost:<port> |
| OpenCode | OpenAI | OPENAI_BASE_URL |
http://localhost:<port>/v1 |
| Kilo Code | Anthropic | (VS Code settings) | http://localhost:<port> |
All tools use your ANL username as the API key. For detailed setup instructions, see the CLI Tools Integration Guide.
Examples
OpenAI Format
SDK-based (openai.OpenAI):
- Chat Completions | Streaming
- Responses | Streaming
- Function Calling (Chat) | Function Calling (Responses)
- Image Chat | Image Base64
- Embedding
- Legacy Completions | Streaming
REST-based (httpx / requests):
Anthropic Format
SDK-based (anthropic.Anthropic):
REST-based:
Direct ARGO Access
Bug Reports and Contributions
This project is developed in my spare time. Bugs and issues may exist. If you encounter any or have suggestions for improvements, please open an issue or submit a pull request. Your contributions are highly appreciated!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file argo_proxy-3.0.2.tar.gz.
File metadata
- Download URL: argo_proxy-3.0.2.tar.gz
- Upload date:
- Size: 164.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c822773bcb288d613755d943f28b35f5d5b3706557319749d4e4f7537f99d1eb
|
|
| MD5 |
d64318020f9bd5c7126e191fb3e949d1
|
|
| BLAKE2b-256 |
69a414be0bfa948b219c3efaa0eea8182d560e81515288b08468d2b0d63b56e3
|
File details
Details for the file argo_proxy-3.0.2-py3-none-any.whl.
File metadata
- Download URL: argo_proxy-3.0.2-py3-none-any.whl
- Upload date:
- Size: 178.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e73f8deb68ef4e6db0f04e8c8c6b8a63949b02256e117e1370818d098b129c8f
|
|
| MD5 |
325d6d302fc9e671b652873331840306
|
|
| BLAKE2b-256 |
2f6655a24361b1530cc00b88d2d7a92e3b45d1d70250e9042d4002a744150b1d
|