Skip to main content

ArgusLM — Open-source LLM monitoring & benchmarking SDK

Project description

ArgusLM — Open-Source LLM Monitoring & Benchmarking

PyPI CI License GitHub Stars Python 3.11+ Docker

Know exactly which LLM providers are up, which are fastest, and which are degrading — before your users notice.

ArgusLM Dashboard Overview

The Problem

Modern AI architectures use dozens of LLM providers across services — OpenAI, Anthropic, Bedrock, Vertex, local Ollama, custom endpoints — each with different availability, latency, and throughput characteristics. When providers fail or slow down, you find out from support tickets, not monitoring dashboards. Existing tools are either SaaS-only (expensive, locked-in), infrastructure-focused (can't probe LLM APIs), or require complex instrumentation (changes your code).

Why ArgusLM?

Aspect Datadog / Langfuse Prometheus LLM Overwatch ArgusLM
Deployment SaaS-only Self-hosted SaaS-only Self-hosted
Local Models ❌ No ❌ No ❌ No ✅ Ollama, LM Studio, local APIs
Probing vs Tracing Tracing only Infrastructure only Probing only Both (probing + tracing)
Metrics Request-level Node-level Response time TTFT, TPS, latency, uptime
Pricing $$$$ Free $$$ ✅ Free & Open-Source
Extensible Limited Limited No ✅ Full Python SDK + HTTP API

What makes ArgusLM unique: The only open-source tool that actively probes any LLM provider (including local Ollama/LM Studio) for real uptime, Time to First Token (TTFT), Tokens per Second (TPS), and latency — with a unified Python SDK for custom automation.

Use Cases

ArgusLM is for you if:

  • You're building production AI systems — Monitor uptime and performance of multiple LLM providers in real-time, detect degradations before users do.
  • You run self-hosted LLM deployments — Track local Ollama/LM Studio availability and response metrics alongside cloud providers in one dashboard.
  • You provider LLM-based services — Know exactly which provider to route traffic to based on real performance data, not assumptions or marketing claims.
  • You need automated benchmarking — Run scheduled comparisons between models (GPT-4 vs Claude vs local Llama) to optimize costs and quality.
  • You must keep costs private — Self-hosted, no SaaS lock-in, full control over your observability data.

Quick Start

Quick Start

Deploy ArgusLM in under a minute:

git clone https://github.com/bluet/arguslm.git && cd arguslm
cp .env.example .env

# Generate secrets (requires cryptography package, or use the Docker one-liner in .env.example)
python3 scripts/generate-secrets.py >> .env

docker compose up -d

Dashboard: http://localhost:3000 API Documentation: http://localhost:8000/docs


Features

Category Capabilities
Monitoring Automated uptime checks, real-time status tracking, and configurable availability intervals.
Benchmarking Parallel multi-model testing with deep metrics for TTFT, TPS, and total latency.
Visualization Live performance charts, historical trends, and side-by-side model comparisons.
Alerting Proactive downtime detection and performance degradation notifications.
Integration Native support for 100+ providers via LiteLLM abstraction.

Architecture

ArgusLM is built for scale and reliability, leveraging a modern asynchronous stack.

┌─────────────────────────────────────────────────────────────────┐
│                         ArgusLM                                 │
├─────────────────────────────────────────────────────────────────┤
│  Frontend (React + Vite)           Backend (FastAPI)            │
│  ┌─────────────────────┐           ┌──────────────────────┐    │
│  │ Dashboard           │◄─────────►│ REST API + WebSocket │    │
│  │ Benchmarks          │           │ Background Scheduler │    │
│  │ Monitoring          │           │ Alert Engine         │    │
│  │ Providers           │           └──────────┬───────────┘    │
│  └─────────────────────┘                      │                 │
│                                               ▼                 │
│                              ┌─────────────────────────────┐   │
│                              │  LiteLLM Abstraction Layer  │   │
│                              └─────────────┬───────────────┘   │
│                                            │                    │
└────────────────────────────────────────────┼────────────────────┘
                                             ▼
              ┌──────────────────────────────────────────────────┐
              │                  LLM Providers                   │
              │  OpenAI │ Anthropic │ Bedrock │ Vertex │ Azure   │
              │  Ollama │ LM Studio │ xAI │ DeepSeek │ 100+     │
              └──────────────────────────────────────────────────┘

Usage Examples

Trigger Monitoring (HTTP API)

# Trigger a manual monitoring run
curl -X POST http://localhost:8000/api/v1/monitoring/run

# Get current monitoring configuration
curl http://localhost:8000/api/v1/monitoring/config

# Get uptime history for all providers (last 100 checks)
curl "http://localhost:8000/api/v1/monitoring/uptime?limit=100"

Run Benchmarks (HTTP API)

# Start benchmark for specific models
curl -X POST http://localhost:8000/api/v1/benchmarks \
  -H "Content-Type: application/json" \
  -d '{
    "model_ids": ["uuid-1", "uuid-2"],
    "prompt_pack": "health_check",
    "max_tokens": 100,
    "num_runs": 5
  }'

# List all benchmarks
curl http://localhost:8000/api/v1/benchmarks

# Get results for specific benchmark run
curl http://localhost:8000/api/v1/benchmarks/{run_id}/results

Python SDK

pip install arguslm
from arguslm import ArgusLMClient
from arguslm.schemas import BenchmarkCreate

with ArgusLMClient(base_url="http://localhost:8000") as client:
    # Check provider uptime
    uptime = client.get_uptime_history(limit=10)
    for check in uptime.items:
        print(f"{check.model_name}: {check.status} ({check.ttft_ms}ms TTFT)")

    # Run a benchmark
    benchmark = client.start_benchmark(BenchmarkCreate(
        model_ids=["uuid-1", "uuid-2"],
        prompt_pack="shakespeare",
        num_runs=3,
    ))
    print(f"Benchmark started: {benchmark.id}")

Async support:

from arguslm import AsyncArgusLMClient

async with AsyncArgusLMClient() as client:
    config = await client.get_monitoring_config()
    providers = await client.list_providers()

Key Metrics

ArgusLM tracks the metrics that define real-world LLM performance:

  • Time to First Token (TTFT): Measure user-perceived responsiveness and cold-start latency.
  • Tokens per Second (TPS): Evaluate sustained streaming throughput independent of initial latency.
  • End-to-End Latency: Track total request duration for non-streaming workloads.
  • Availability: Monitor uptime and reliability trends with granular failure analysis.

Dashboard Screenshots

Performance Trends Real-time tracking of latency and throughput trends across all configured providers.

Model Comparison Side-by-side performance comparison to identify the most efficient models for your workload.

Monitoring Configuration Configure granular monitoring intervals and thresholds for each provider.

Benchmark Runner Execute standardized benchmark suites to validate provider performance under load.


Configuration

Variable Description Default
DATABASE_URL PostgreSQL connection string postgresql+asyncpg://...
SECRET_KEY Session encryption key required
ENCRYPTION_KEY Credential encryption (Fernet) required

Detailed setup instructions are available in the Configuration Guide.


Local Development

Backend

pip install -e ".[server]"
alembic upgrade head
uvicorn arguslm.server.main:app --reload

Frontend

cd frontend
npm install
npm run dev

Tech Stack

Layer Technology
Backend FastAPI, Python 3.11+, SQLAlchemy, Alembic
Frontend React 18, TypeScript, Vite, Tailwind CSS, Recharts
Database PostgreSQL (Production) / SQLite (Development)
Abstraction LiteLLM

Installation

# SDK only (lightweight — for querying an ArgusLM instance)
pip install arguslm

# Full server (for self-hosted deployment without Docker)
pip install arguslm[server]

Documentation


Contributing

We welcome contributions from the community. Please review our Contributing Guidelines before submitting a Pull Request.


Author

Matthew (BlueT) Lien


License

ArgusLM is released under the Apache License 2.0.


Named after Argus Panoptes, the all-seeing giant of Greek mythology.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arguslm-0.2.0.tar.gz (98.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arguslm-0.2.0-py3-none-any.whl (81.7 kB view details)

Uploaded Python 3

File details

Details for the file arguslm-0.2.0.tar.gz.

File metadata

  • Download URL: arguslm-0.2.0.tar.gz
  • Upload date:
  • Size: 98.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for arguslm-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6f1657539fb03e3a3168b4fc5948269584f8a6d5f2b0754b06f9bbb30596d8c6
MD5 fdc83aac876e40c0a7862374cc6fd7f3
BLAKE2b-256 82889c68c9b53d7d0ca4f3953388382fe7827aabc0d39229e5896579b66ecfa6

See more details on using hashes here.

File details

Details for the file arguslm-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: arguslm-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 81.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for arguslm-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cde30367422f5d4a04284a301fdaf02387a5394c224c2fa4921a819ce2e584c3
MD5 37fc71c1e0571efaafce7a9d637c82c2
BLAKE2b-256 ffd6b09ac99973a38a07b2c59f714fc9c5ee2dd1b4133b5907f9cf3bfcc2bcfd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page