Local LLM infrastructure with monitoring dashboard for distributed AI applications
Project description
Ollama Local Serve
Local LLM infrastructure with a professional monitoring dashboard for distributed AI applications. Serve Ollama-powered models across your network with seamless LangChain integration, OpenTelemetry instrumentation, and real-time metrics visualization.
Features
- Service Management: Easy start/stop control of Ollama server instances
- Network Accessible: Configure host/port for LAN accessibility
- LangChain Integration: Seamless integration with LangChain for remote LLM clients
- OpenTelemetry Instrumentation: Built-in metrics collection with OTEL support
- Real-time Monitoring Dashboard: Professional React dashboard with live metrics
- In-App Chat Interface: Floating chat bubble with streaming responses and markdown support
- Model Management: Pull, delete, and manage models directly from the dashboard
- Model Repository: Track favorites, usage stats, and preferences per model
- Multiple Database Backends: Export metrics to ClickHouse or PostgreSQL/TimescaleDB
- Enhanced Request Logging: Capture prompt/response text, client info, and token counts
- Data Management: Clear metrics/logs and view data summaries via API
- Health Checks: Built-in health check endpoints to monitor service status
- Docker Ready: Complete Docker Compose stack for production deployment
- Async/Await: Production-ready async patterns throughout
- Type Hints: Full type annotations with Pydantic configuration
Quick Start
Installation
# Basic installation
pip install ollama-local-serve
# With LangChain integration
pip install ollama-local-serve[langchain]
# With full monitoring stack
pip install ollama-local-serve[monitoring]
# All features
pip install ollama-local-serve[all]
Prerequisites
- Python 3.12 or higher
- Ollama installed on your system (Download Ollama)
Basic Usage
import asyncio
from ollama_local_serve import OllamaService, NetworkConfig
async def main():
# Create configuration for LAN accessibility
config = NetworkConfig(
host="0.0.0.0", # Accessible from any network interface
port=11434, # Default Ollama port
timeout=30,
max_retries=3
)
# Create and start the service
service = OllamaService(config)
await service.start()
# Check service health
is_healthy = await service.health_check()
print(f"Service is healthy: {is_healthy}")
# Stop the service
await service.stop()
asyncio.run(main())
Using Context Manager
import asyncio
from ollama_local_serve import OllamaService, NetworkConfig
async def main():
config = NetworkConfig(host="0.0.0.0", port=11434)
# Automatic cleanup with context manager
async with OllamaService(config) as service:
print(f"Service running at {service.base_url}")
await service.health_check()
# Service automatically stops when exiting context
asyncio.run(main())
Docker Deployment
Quick Start with Docker Compose
# Clone the repository
git clone https://github.com/AbhinaavRamesh/ollama-local-serve.git
cd ollama-local-serve
# Initialize environment
make init
# Start all services
make up
# View the dashboard
open http://localhost:3000
Available Services
| Service | Port | Description |
|---|---|---|
| Ollama | 11434 | LLM inference service |
| ClickHouse | 8123, 9000 | Time-series database |
| PostgreSQL | 5432 | TimescaleDB for relational storage |
| API Server | 8000 | FastAPI monitoring API |
| Dashboard | 3000 | React monitoring dashboard |
Note: The API Server is named
ollama-monitorin docker-compose files and commands. This naming reflects the service's primary purpose as a monitoring API for Ollama. When you see references toollama-monitorin docker commands or compose files, this refers to the API Server listed above.
Make Commands
Core Commands
make help # Show all available commands
make init # Initialize environment
make up # Start all services
make down # Stop all services
make logs # View logs
make health # Check service health
make dev # Start development environment
make clean # Remove all containers and volumes
Dependency Installation
make install # Install all dependencies (Python + Frontend)
make install-python # Install Python dependencies
make install-python-venv # Install Python deps in virtual environment
make install-frontend # Install frontend (Node.js) dependencies
make install-db-clients # Install database CLI clients (ClickHouse + PostgreSQL)
make install-clickhouse-client # Install ClickHouse client only
make install-postgres-client # Install PostgreSQL client only
make check-deps # Check if required dependencies are installed
Selective Service Startup (Toggle Databases)
make up-minimal # Start only Ollama + API (no databases, no frontend)
make up-clickhouse # Start full stack with ClickHouse (includes frontend)
make up-postgres # Start full stack with PostgreSQL (includes frontend)
# Or use environment variables to toggle services (with up-selective target):
make up-selective ENABLE_CLICKHOUSE=false # Disable ClickHouse
make up-selective ENABLE_POSTGRES=false # Disable PostgreSQL
make up-selective ENABLE_FRONTEND=false # Disable frontend dashboard
Local Development (without Docker)
make run-api # Run API server locally (port 8000)
make run-frontend # Run frontend dev server locally (port 5173)
make run-local # Run both API and frontend in parallel
Configuration
Environment Variables
Copy .env.example to .env and configure:
# Ollama Configuration
OLLAMA_HOST=http://ollama:11434
OLLAMA_MODEL=llama3.2
OLLAMA_TIMEOUT=120
# Instrumentation
ENABLE_INSTRUMENTATION=true
EXPORTER_TYPE=clickhouse # clickhouse, postgres, both, none
# ClickHouse
CLICKHOUSE_HOST=clickhouse
CLICKHOUSE_PORT=9000
CLICKHOUSE_DATABASE=ollama_metrics
# PostgreSQL
POSTGRES_HOST=postgres
POSTGRES_PORT=5432
POSTGRES_DATABASE=ollama_metrics
POSTGRES_USER=ollama
POSTGRES_PASSWORD=your_secure_password
Pydantic Configuration
All configuration classes support environment variable loading:
from ollama_local_serve.config import (
NetworkConfig,
InstrumentationConfig,
ClickHouseConfig,
PostgresConfig,
APIConfig,
AppConfig,
)
# Load from environment variables
config = AppConfig()
# Access nested configs
print(config.network.base_url)
print(config.clickhouse.connection_url)
print(config.api.cors_origins_list)
Monitoring & Instrumentation
Enable Instrumentation
from ollama_local_serve import OllamaService, NetworkConfig
config = NetworkConfig(
host="0.0.0.0",
port=11434,
enable_instrumentation=True,
exporter_type="clickhouse", # or "postgres", "both"
metrics_export_interval=5,
)
async with OllamaService(config) as service:
# Metrics are automatically collected
response = await service.generate("llama2", "Hello, world!")
Available Metrics
ollama_requests_total- Total number of requestsollama_tokens_generated_total- Total tokens generatedollama_errors_total- Total errorsollama_request_latency_ms- Request latency histogram
API Endpoints
The monitoring API provides:
# Health & Stats
GET /api/health - Health check
GET /api/stats/current - Current statistics
GET /api/stats/history - Historical metrics
GET /api/stats/logs - Request logs (with prompt/response text)
GET /api/models - Model statistics
# Chat
POST /api/chat - Stream chat with Ollama (SSE)
# Ollama Proxy
GET /api/ollama/models - List installed Ollama models
POST /api/ollama/pull - Pull a model (streaming progress)
DELETE /api/ollama/models/{name} - Delete a model
GET /api/ollama/library - Search model library
# Model Repository (PostgreSQL)
GET /api/models/repository - Get all models with preferences
GET /api/models/repository/{name} - Get model details
POST /api/models/repository - Add model to repository
PUT /api/models/repository/{name} - Update model (favorite, default)
POST /api/models/repository/sync - Sync with installed models
# Data Management
GET /api/data/summary - Get data summary
DELETE /api/data/metrics - Clear all metrics
DELETE /api/data/logs - Clear all request logs
DELETE /api/data/all - Clear all data
LangChain Integration
from ollama_local_serve import create_langchain_client, NetworkConfig
# Connect to a local or remote Ollama service
llm = create_langchain_client(
base_url="http://192.168.1.100:11434",
model="llama2",
temperature=0.7
)
response = llm.invoke("What is the meaning of life?")
print(response)
Project Structure
ollama-local-serve/
├── ollama_local_serve/ # Python package
│ ├── __init__.py
│ ├── config.py # Pydantic configuration
│ ├── service.py # OllamaService class
│ ├── client.py # LangChain client
│ ├── exceptions.py # Custom exceptions
│ ├── api/ # FastAPI server
│ │ ├── server.py
│ │ ├── models.py
│ │ └── dependencies.py
│ ├── instrumentation/ # OTEL instrumentation
│ │ ├── metrics_provider.py
│ │ └── tracer.py
│ └── exporters/ # Database exporters
│ ├── base.py
│ ├── clickhouse_exporter.py
│ └── postgres_exporter.py
├── frontend/ # React dashboard
│ ├── src/
│ │ ├── components/
│ │ │ ├── chat/ # Chat bubble with streaming
│ │ │ ├── charts/ # Visualization components
│ │ │ └── ... # Other UI components
│ │ ├── pages/
│ │ ├── hooks/
│ │ ├── context/ # App and Theme context
│ │ └── utils/
│ ├── package.json
│ └── Dockerfile
├── schemas/ # Database schemas
│ ├── clickhouse_init.sql
│ └── postgres_init.sql
├── docker-compose.yml # Production stack
├── docker-compose.dev.yml # Development overrides
├── Dockerfile # API Dockerfile
├── Makefile # Convenience commands
├── pyproject.toml # Python project config
└── requirements-api.txt # API dependencies
Installation Options
# Core only
pip install ollama-local-serve
# With specific features
pip install ollama-local-serve[langchain] # LangChain integration
pip install ollama-local-serve[api] # FastAPI server
pip install ollama-local-serve[instrumentation] # OpenTelemetry
pip install ollama-local-serve[clickhouse] # ClickHouse exporter
pip install ollama-local-serve[postgres] # PostgreSQL exporter
pip install ollama-local-serve[monitoring] # Full monitoring stack
pip install ollama-local-serve[all] # Everything
# Development
pip install -e ".[dev]"
Error Handling
from ollama_local_serve import (
OllamaServiceError, # Base exception
ConnectionError, # Connection failures
HealthCheckError, # Health check failures
ServiceStartError, # Service start failures
ServiceStopError, # Service stop failures
)
try:
await service.health_check()
except ConnectionError as e:
print(f"Connection failed: {e}")
except HealthCheckError as e:
print(f"Health check failed: {e}")
Development
Setup
# Clone repository
git clone https://github.com/AbhinaavRamesh/ollama-local-serve.git
cd ollama-local-serve
# Check what dependencies you have installed
make check-deps
# Option 1: Install all dependencies at once
make install
# Option 2: Use virtual environment for Python
make install-python-venv # Creates .venv and installs deps
source .venv/bin/activate # Activate the virtual environment
make install-frontend # Install frontend deps
# Option 3: Manual setup
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
pip install -e ".[dev]"
cd frontend && npm install
# Optional: Install database clients for local debugging
make install-db-clients
Code Quality
# Format code
black ollama_local_serve/
# Lint code
ruff check ollama_local_serve/
# Type checking
mypy ollama_local_serve/
# Run tests
pytest
Development Mode
# Option 1: Use Docker (hot reloading enabled)
make dev
# Or manually:
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up
# Option 2: Run locally without Docker
make run-local # Runs both API and frontend
# Or run separately:
# Run API locally (update the module path if needed):
python -m ollama_local_serve.api # API on http://localhost:8000
make run-frontend # Frontend on http://localhost:5173
API Reference
OllamaService
Main service class for managing Ollama server instances.
Methods:
async start(startup_delay: float = 2.0)- Start the Ollama serverasync stop(timeout: float = 5.0)- Stop the Ollama serverasync health_check(retries: Optional[int] = None)- Check service healthasync get_models()- Get list of available modelsasync generate(model: str, prompt: str)- Generate text
Properties:
is_running: bool- Check if service is runningbase_url: str- Get the base URL of the serviceuptime_seconds: float- Get service uptimemetrics_enabled: bool- Check if metrics are enabled
NetworkConfig
Configuration for network settings (Pydantic BaseSettings).
Attributes:
host: str- Host address (default: "0.0.0.0")port: int- Port number (default: 11434)timeout: int- Connection timeout in seconds (default: 30)max_retries: int- Maximum retry attempts (default: 3)
Computed Properties:
base_url: str- Get the base URLapi_url: str- Get the API URL
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ollama_local_serve-0.2.4.tar.gz.
File metadata
- Download URL: ollama_local_serve-0.2.4.tar.gz
- Upload date:
- Size: 51.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a6fde935c44756cab59e2665fff8d348be6e4c099ac65bd1c6d896f2b7dab5f
|
|
| MD5 |
c77835d5cf5b171f9bb8378e4f074827
|
|
| BLAKE2b-256 |
c7dbf08fcba8f69e609f7c945d171928bbc3a6261f80d71faa78955bc44f156b
|
File details
Details for the file ollama_local_serve-0.2.4-py3-none-any.whl.
File metadata
- Download URL: ollama_local_serve-0.2.4-py3-none-any.whl
- Upload date:
- Size: 54.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a17e25c5a303e90bbacac13769b6ddc639bc7b157e3975733af7757f50c7f66
|
|
| MD5 |
37b6c489948eb043a8489d88c6f4068d
|
|
| BLAKE2b-256 |
4dc5929ba9bad61f9e2007f392b558c46ed9eaab42d32b6ed7abe6e2bca3139b
|