ArgusLM — Open-source LLM monitoring & benchmarking SDK
Project description
ArgusLM — Open-Source LLM Monitoring & Benchmarking
Know exactly which LLM providers are up, which are fastest, and which are degrading — before your users notice.
The Problem
Modern AI architectures use dozens of LLM providers across services — OpenAI, Anthropic, Bedrock, Vertex, local Ollama, custom endpoints — each with different availability, latency, and throughput characteristics. When providers fail or slow down, you find out from support tickets, not monitoring dashboards. Existing tools are either SaaS-only (expensive, locked-in), infrastructure-focused (can't probe LLM APIs), or require complex instrumentation (changes your code).
Why ArgusLM?
| Aspect | Datadog / Langfuse | Prometheus | LLM Overwatch | ArgusLM |
|---|---|---|---|---|
| Deployment | SaaS-only | Self-hosted | SaaS-only | Self-hosted |
| Local Models | ❌ No | ❌ No | ❌ No | ✅ Ollama, LM Studio, local APIs |
| Probing vs Tracing | Tracing only | Infrastructure only | Probing only | Both (probing + tracing) |
| Metrics | Request-level | Node-level | Response time | TTFT, TPS, latency, uptime |
| Pricing | $$$$ | Free | $$$ | ✅ Free & Open-Source |
| Extensible | Limited | Limited | No | ✅ Full Python SDK + HTTP API |
What makes ArgusLM unique: The only open-source tool that actively probes any LLM provider (including local Ollama/LM Studio) for real uptime, Time to First Token (TTFT), Tokens per Second (TPS), and latency — with a unified Python SDK for custom automation.
Use Cases
ArgusLM is for you if:
- You're building production AI systems — Monitor uptime and performance of multiple LLM providers in real-time, detect degradations before users do.
- You run self-hosted LLM deployments — Track local Ollama/LM Studio availability and response metrics alongside cloud providers in one dashboard.
- You provider LLM-based services — Know exactly which provider to route traffic to based on real performance data, not assumptions or marketing claims.
- You need automated benchmarking — Run scheduled comparisons between models (GPT-4 vs Claude vs local Llama) to optimize costs and quality.
- You must keep costs private — Self-hosted, no SaaS lock-in, full control over your observability data.
Quick Start
Quick Start
Deploy ArgusLM in under a minute:
git clone https://github.com/bluet/arguslm.git && cd arguslm
cp .env.example .env
# Generate secrets (requires cryptography package, or use the Docker one-liner in .env.example)
python3 scripts/generate-secrets.py >> .env
docker compose up -d
Dashboard: http://localhost:3000 API Documentation: http://localhost:8000/docs
Features
| Category | Capabilities |
|---|---|
| Monitoring | Automated uptime checks, real-time status tracking, and configurable availability intervals. |
| Benchmarking | Parallel multi-model testing with deep metrics for TTFT, TPS, and total latency. |
| Visualization | Live performance charts, historical trends, and side-by-side model comparisons. |
| Alerting | Proactive downtime detection and performance degradation notifications. |
| Integration | Native support for 100+ providers via LiteLLM abstraction. |
Architecture
ArgusLM is built for scale and reliability, leveraging a modern asynchronous stack.
┌─────────────────────────────────────────────────────────────────┐
│ ArgusLM │
├─────────────────────────────────────────────────────────────────┤
│ Frontend (React + Vite) Backend (FastAPI) │
│ ┌─────────────────────┐ ┌──────────────────────┐ │
│ │ Dashboard │◄─────────►│ REST API + WebSocket │ │
│ │ Benchmarks │ │ Background Scheduler │ │
│ │ Monitoring │ │ Alert Engine │ │
│ │ Providers │ └──────────┬───────────┘ │
│ └─────────────────────┘ │ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ LiteLLM Abstraction Layer │ │
│ └─────────────┬───────────────┘ │
│ │ │
└────────────────────────────────────────────┼────────────────────┘
▼
┌──────────────────────────────────────────────────┐
│ LLM Providers │
│ OpenAI │ Anthropic │ Bedrock │ Vertex │ Azure │
│ Ollama │ LM Studio │ xAI │ DeepSeek │ 100+ │
└──────────────────────────────────────────────────┘
Usage Examples
Trigger Monitoring (HTTP API)
# Trigger a manual monitoring run
curl -X POST http://localhost:8000/api/v1/monitoring/run
# Get current monitoring configuration
curl http://localhost:8000/api/v1/monitoring/config
# Get uptime history for all providers (last 100 checks)
curl "http://localhost:8000/api/v1/monitoring/uptime?limit=100"
Run Benchmarks (HTTP API)
# Start benchmark for specific models
curl -X POST http://localhost:8000/api/v1/benchmarks \
-H "Content-Type: application/json" \
-d '{
"model_ids": ["uuid-1", "uuid-2"],
"prompt_pack": "health_check",
"max_tokens": 100,
"num_runs": 5
}'
# List all benchmarks
curl http://localhost:8000/api/v1/benchmarks
# Get results for specific benchmark run
curl http://localhost:8000/api/v1/benchmarks/{run_id}/results
Python SDK
pip install arguslm
from arguslm import ArgusLMClient
from arguslm.schemas import BenchmarkCreate
with ArgusLMClient(base_url="http://localhost:8000") as client:
# Check provider uptime
uptime = client.get_uptime_history(limit=10)
for check in uptime.items:
print(f"{check.model_name}: {check.status} ({check.ttft_ms}ms TTFT)")
# Run a benchmark
benchmark = client.start_benchmark(BenchmarkCreate(
model_ids=["uuid-1", "uuid-2"],
prompt_pack="shakespeare",
num_runs=3,
))
print(f"Benchmark started: {benchmark.id}")
Async support:
from arguslm import AsyncArgusLMClient
async with AsyncArgusLMClient() as client:
config = await client.get_monitoring_config()
providers = await client.list_providers()
Key Metrics
ArgusLM tracks the metrics that define real-world LLM performance:
- Time to First Token (TTFT): Measure user-perceived responsiveness and cold-start latency.
- Tokens per Second (TPS): Evaluate sustained streaming throughput independent of initial latency.
- End-to-End Latency: Track total request duration for non-streaming workloads.
- Availability: Monitor uptime and reliability trends with granular failure analysis.
Dashboard Screenshots
Real-time tracking of latency and throughput trends across all configured providers.
Side-by-side performance comparison to identify the most efficient models for your workload.
Configure granular monitoring intervals and thresholds for each provider.
Execute standardized benchmark suites to validate provider performance under load.
Configuration
| Variable | Description | Default |
|---|---|---|
DATABASE_URL |
PostgreSQL connection string | postgresql+asyncpg://... |
SECRET_KEY |
Session encryption key | required |
ENCRYPTION_KEY |
Credential encryption (Fernet) | required |
Detailed setup instructions are available in the Configuration Guide.
Local Development
Backend
pip install -e ".[server]"
alembic upgrade head
uvicorn arguslm.server.main:app --reload
Frontend
cd frontend
npm install
npm run dev
Tech Stack
| Layer | Technology |
|---|---|
| Backend | FastAPI, Python 3.11+, SQLAlchemy, Alembic |
| Frontend | React 18, TypeScript, Vite, Tailwind CSS, Recharts |
| Database | PostgreSQL (Production) / SQLite (Development) |
| Abstraction | LiteLLM |
Installation
# SDK only (lightweight — for querying an ArgusLM instance)
pip install arguslm
# Full server (for self-hosted deployment without Docker)
pip install arguslm[server]
Documentation
- Architecture Overview
- Python SDK Guide
- REST API Reference
- Configuration Guide
- Troubleshooting
- Comparison with Alternatives
- Interactive API Docs (Swagger UI, available when server is running)
Contributing
We welcome contributions from the community. Please review our Contributing Guidelines before submitting a Pull Request.
Author
Matthew (BlueT) Lien
License
ArgusLM is released under the Apache License 2.0.
Named after Argus Panoptes, the all-seeing giant of Greek mythology.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arguslm-0.2.0.tar.gz.
File metadata
- Download URL: arguslm-0.2.0.tar.gz
- Upload date:
- Size: 98.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f1657539fb03e3a3168b4fc5948269584f8a6d5f2b0754b06f9bbb30596d8c6
|
|
| MD5 |
fdc83aac876e40c0a7862374cc6fd7f3
|
|
| BLAKE2b-256 |
82889c68c9b53d7d0ca4f3953388382fe7827aabc0d39229e5896579b66ecfa6
|
File details
Details for the file arguslm-0.2.0-py3-none-any.whl.
File metadata
- Download URL: arguslm-0.2.0-py3-none-any.whl
- Upload date:
- Size: 81.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cde30367422f5d4a04284a301fdaf02387a5394c224c2fa4921a819ce2e584c3
|
|
| MD5 |
37fc71c1e0571efaafce7a9d637c82c2
|
|
| BLAKE2b-256 |
ffd6b09ac99973a38a07b2c59f714fc9c5ee2dd1b4133b5907f9cf3bfcc2bcfd
|