Skip to main content

Local LLM infrastructure with monitoring dashboard for distributed AI applications

Project description

Ollama Local Serve

PyPI version Python 3.12+ License: MIT

Local LLM infrastructure with a professional monitoring dashboard for distributed AI applications. Serve Ollama-powered models across your network with seamless LangChain integration, OpenTelemetry instrumentation, and real-time metrics visualization.

Features

  • Service Management: Easy start/stop control of Ollama server instances
  • Network Accessible: Configure host/port for LAN accessibility
  • LangChain Integration: Seamless integration with LangChain for remote LLM clients
  • OpenTelemetry Instrumentation: Built-in metrics collection with OTEL support
  • Real-time Monitoring Dashboard: Professional React dashboard with live metrics
  • In-App Chat Interface: Floating chat bubble with streaming responses and markdown support
  • Model Management: Pull, delete, and manage models directly from the dashboard
  • Model Repository: Track favorites, usage stats, and preferences per model
  • Multiple Database Backends: Export metrics to ClickHouse or PostgreSQL/TimescaleDB
  • Enhanced Request Logging: Capture prompt/response text, client info, and token counts
  • Data Management: Clear metrics/logs and view data summaries via API
  • Health Checks: Built-in health check endpoints to monitor service status
  • Docker Ready: Complete Docker Compose stack for production deployment
  • Async/Await: Production-ready async patterns throughout
  • Type Hints: Full type annotations with Pydantic configuration

Quick Start

Installation

# Basic installation
pip install ollama-local-serve

# With LangChain integration
pip install ollama-local-serve[langchain]

# With LangGraph integration (includes LangChain)
pip install ollama-local-serve[langgraph]

# With full monitoring stack
pip install ollama-local-serve[monitoring]

# All features
pip install ollama-local-serve[all]

See Installation Guide for detailed installation options and prerequisites.

Basic Usage

import asyncio
from ollama_local_serve import OllamaService, NetworkConfig

async def main():
    config = NetworkConfig(host="0.0.0.0", port=11434)
    
    async with OllamaService(config) as service:
        print(f"Service running at {service.base_url}")
        await service.health_check()

asyncio.run(main())

See Installation Guide for more examples and error handling.

Use Cases

Ollama Local Serve is ideal for:

  • Development & Testing: Quick local LLM setup with integrated monitoring
  • Research & Experimentation: Compare models, track metrics, benchmark performance
  • Small to Medium Scale Inference: Single or small cluster deployments (10-100 concurrent users)
  • AI Agent Development: Build ReAct agents with LangChain/LangGraph integration
  • Educational Projects: Learn about LLMs, monitoring, distributed systems
  • Internal Tools: Deploy custom AI features within organizations
  • Prototyping: Fast iteration with live metrics and dashboard feedback

Deployment Comparison

Aspect Docker Compose Kubernetes Local Development
Setup Time 5 minutes 15-30 minutes 2-3 minutes
Scalability Single machine Multi-node clusters Single machine
Persistence Volumes PersistentVolumeClaims Local filesystem
Best For Development, testing Production, scaling Quick prototyping
GPU Support Yes Yes (NVIDIA plugin) Yes
Cost Low Moderate to High Free
Monitoring Built-in dashboard Enhanced with Prometheus Dashboard included

Recommendation: Start with Docker Compose for development, move to Kubernetes for production multi-node deployments.

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                      Client Applications                     │
│  (Python Scripts, LangChain Agents, HTTP Clients, etc.)    │
└────────────────┬────────────────────────────────────────────┘
                 │
                 ├─────────────────┬─────────────────┐
                 │                 │                 │
        ┌────────▼────────┐   ┌───▼──────────┐  ┌──▼─────────────┐
        │   FastAPI Server │   │  Ollama      │  │  React Dashboard
        │   (Port 8000)    │   │  (Port 11434)│  │  (Port 3000/5173)
        │                  │   │              │  │
        │ REST API Layer   │   │ LLM Engine   │  │ Real-time UI
        │ - Chat Endpoint  │   │ - Models     │  │ - Metrics Viz
        │ - Stats/Metrics  │   │ - Generation │  │ - System Health
        │ - Health Checks  │   │ - Streaming  │  │ - Model Mgmt
        └────────┬─────────┘   └──────────────┘  └────────────────┘
                 │
        ┌────────▼──────────────────────────┐
        │   Metrics Collection Layer        │
        │   (OpenTelemetry Instrumentation) │
        │   - Request tracking              │
        │   - Token counting                │
        │   - Performance metrics           │
        │   - Error tracking                │
        └────────┬──────────────────────────┘
                 │
        ┌────────▼──────────────────────────┐
        │   Storage Layer (Choose one/both) │
        ├──────────────────────────────────┤
        │  ClickHouse (Time-series)        │
        │  - Fast metrics queries           │
        │  - Real-time aggregations         │
        │                                   │
        │  PostgreSQL/TimescaleDB (Query)  │
        │  - Relational queries             │
        │  - Model metadata                 │
        └───────────────────────────────────┘

Common Workflows

Quick Local Testing

make init && make up
# Open http://localhost:3000 → Start chatting & monitoring

LangChain Agent Development

from ollama_local_serve import create_langchain_chat_client, OllamaService
async with OllamaService() as service:
    llm = create_langchain_chat_client(model="llama3.2")
    # Build your agent...

Production Kubernetes Deployment

cd k8s
helm install ollama-serve . -n production -f values.yaml
# Configure ingress, GPU, and database backends

Performance Benchmarking

make up  # Start stack
# Generate load and monitor metrics in dashboard
# Check /api/stats/history for detailed performance data

Documentation

Detailed documentation is organized into the following sections:

Guide Description
Installation Guide Installation methods, prerequisites, and basic usage
Docker Deployment Docker Compose setup and Make commands
Kubernetes Deployment Helm charts and K8s deployment
Configuration Environment variables and Pydantic config
Monitoring & Instrumentation Metrics, instrumentation, and API endpoints
LangChain Integration LangChain and LangGraph usage examples
Development Guide Setup, code quality, and development mode
API Reference Python API and REST endpoint documentation
GPU Testing Guide GPU setup, verification, and benchmarking

Project Structure

ollama-local-serve/
├── ollama_local_serve/          # Python package
│   ├── __init__.py
│   ├── config.py                # Pydantic configuration
│   ├── service.py               # OllamaService class
│   ├── client.py                # LangChain client
│   ├── exceptions.py            # Custom exceptions
│   ├── api/                     # FastAPI server
│   │   ├── server.py
│   │   ├── models.py
│   │   └── dependencies.py
│   ├── instrumentation/         # OTEL instrumentation
│   │   ├── metrics_provider.py
│   │   └── tracer.py
│   └── exporters/               # Database exporters
│       ├── base.py
│       ├── clickhouse_exporter.py
│       └── postgres_exporter.py
├── frontend/                    # React dashboard
│   ├── src/
│   │   ├── components/
│   │   │   ├── chat/            # Chat bubble with streaming
│   │   │   ├── charts/          # Visualization components
│   │   │   └── ...              # Other UI components
│   │   ├── pages/
│   │   ├── hooks/
│   │   ├── context/             # App and Theme context
│   │   └── utils/
│   ├── package.json
│   └── Dockerfile
├── schemas/                     # Database schemas
│   ├── clickhouse_init.sql
│   └── postgres_init.sql
├── k8s/                         # Kubernetes configuration
│   ├── values.yaml
│   ├── values-local.yaml
│   └── local-databases.yaml
├── docker-compose.yml           # Production stack
├── docker-compose.dev.yml       # Development overrides
├── Dockerfile                   # API Dockerfile
├── Makefile                     # Convenience commands
├── pyproject.toml               # Python project config
├── requirements-api.txt         # API dependencies
└── docs/                        # Documentation
    ├── INSTALLATION.md
    ├── DOCKER.md
    ├── KUBERNETES.md
    ├── CONFIGURATION.md
    ├── MONITORING.md
    ├── LANGCHAIN.md
    ├── API_REFERENCE.md
    └── DEVELOPMENT.md

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollama_local_serve-0.3.0.tar.gz (73.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ollama_local_serve-0.3.0-py3-none-any.whl (79.7 kB view details)

Uploaded Python 3

File details

Details for the file ollama_local_serve-0.3.0.tar.gz.

File metadata

  • Download URL: ollama_local_serve-0.3.0.tar.gz
  • Upload date:
  • Size: 73.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ollama_local_serve-0.3.0.tar.gz
Algorithm Hash digest
SHA256 56db4f5793ea1e4714e389bc26b141fb42f2923a86da0fd33941ce0414e55f20
MD5 60068f777d394c6905f7e43e108fc085
BLAKE2b-256 094285617d74aca557ca070702f339eae2279f88d1424a1d8b78c441e1eb4a70

See more details on using hashes here.

File details

Details for the file ollama_local_serve-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ollama_local_serve-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e5508cba707d6d465e0d54e1c84e2d1a852f2da1f050bd99722401e30d68cd12
MD5 6563d77d78f6fe82c3687f3f4b15fe6f
BLAKE2b-256 c075c5ba6b52167089d5953fc0deff31f66d464a14cb0fad561be185b56223d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page