Skip to main content

Local LLM infrastructure with monitoring dashboard for distributed AI applications

Project description

Ollama Local Serve

PyPI version Python 3.12+ License: MIT

Local LLM infrastructure with a professional monitoring dashboard for distributed AI applications. Serve Ollama-powered models across your network with seamless LangChain integration, OpenTelemetry instrumentation, and real-time metrics visualization.

Features

  • Service Management: Easy start/stop control of Ollama server instances
  • Network Accessible: Configure host/port for LAN accessibility
  • LangChain Integration: Seamless integration with LangChain for remote LLM clients
  • OpenTelemetry Instrumentation: Built-in metrics collection with OTEL support
  • Real-time Monitoring Dashboard: Professional React dashboard with live metrics
  • In-App Chat Interface: Floating chat bubble with streaming responses and markdown support
  • Model Management: Pull, delete, and manage models directly from the dashboard
  • Model Repository: Track favorites, usage stats, and preferences per model
  • Multiple Database Backends: Export metrics to ClickHouse or PostgreSQL/TimescaleDB
  • Enhanced Request Logging: Capture prompt/response text, client info, and token counts
  • Data Management: Clear metrics/logs and view data summaries via API
  • Health Checks: Built-in health check endpoints to monitor service status
  • Docker Ready: Complete Docker Compose stack for production deployment
  • Async/Await: Production-ready async patterns throughout
  • Type Hints: Full type annotations with Pydantic configuration

Quick Start

Installation

# Basic installation
pip install ollama-local-serve

# With LangChain integration
pip install ollama-local-serve[langchain]

# With LangGraph integration (includes LangChain)
pip install ollama-local-serve[langgraph]

# With full monitoring stack
pip install ollama-local-serve[monitoring]

# All features
pip install ollama-local-serve[all]

See Installation Guide for detailed installation options and prerequisites.

Basic Usage

import asyncio
from ollama_local_serve import OllamaService, NetworkConfig

async def main():
    config = NetworkConfig(host="0.0.0.0", port=11434)
    
    async with OllamaService(config) as service:
        print(f"Service running at {service.base_url}")
        await service.health_check()

asyncio.run(main())

See Installation Guide for more examples and error handling.

Use Cases

Ollama Local Serve is ideal for:

  • Development & Testing: Quick local LLM setup with integrated monitoring
  • Research & Experimentation: Compare models, track metrics, benchmark performance
  • Small to Medium Scale Inference: Single or small cluster deployments (10-100 concurrent users)
  • AI Agent Development: Build ReAct agents with LangChain/LangGraph integration
  • Educational Projects: Learn about LLMs, monitoring, distributed systems
  • Internal Tools: Deploy custom AI features within organizations
  • Prototyping: Fast iteration with live metrics and dashboard feedback

Deployment Comparison

Aspect Docker Compose Kubernetes Local Development
Setup Time 5 minutes 15-30 minutes 2-3 minutes
Scalability Single machine Multi-node clusters Single machine
Persistence Volumes PersistentVolumeClaims Local filesystem
Best For Development, testing Production, scaling Quick prototyping
GPU Support Yes Yes (NVIDIA plugin) Yes
Cost Low Moderate to High Free
Monitoring Built-in dashboard Enhanced with Prometheus Dashboard included

Recommendation: Start with Docker Compose for development, move to Kubernetes for production multi-node deployments.

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                      Client Applications                     │
│  (Python Scripts, LangChain Agents, HTTP Clients, etc.)    │
└────────────────┬────────────────────────────────────────────┘
                 │
                 ├─────────────────┬─────────────────┐
                 │                 │                 │
        ┌────────▼────────┐   ┌───▼──────────┐  ┌──▼─────────────┐
        │   FastAPI Server │   │  Ollama      │  │  React Dashboard
        │   (Port 8000)    │   │  (Port 11434)│  │  (Port 3000/5173)
        │                  │   │              │  │
        │ REST API Layer   │   │ LLM Engine   │  │ Real-time UI
        │ - Chat Endpoint  │   │ - Models     │  │ - Metrics Viz
        │ - Stats/Metrics  │   │ - Generation │  │ - System Health
        │ - Health Checks  │   │ - Streaming  │  │ - Model Mgmt
        └────────┬─────────┘   └──────────────┘  └────────────────┘
                 │
        ┌────────▼──────────────────────────┐
        │   Metrics Collection Layer        │
        │   (OpenTelemetry Instrumentation) │
        │   - Request tracking              │
        │   - Token counting                │
        │   - Performance metrics           │
        │   - Error tracking                │
        └────────┬──────────────────────────┘
                 │
        ┌────────▼──────────────────────────┐
        │   Storage Layer (Choose one/both) │
        ├──────────────────────────────────┤
        │  ClickHouse (Time-series)        │
        │  - Fast metrics queries           │
        │  - Real-time aggregations         │
        │                                   │
        │  PostgreSQL/TimescaleDB (Query)  │
        │  - Relational queries             │
        │  - Model metadata                 │
        └───────────────────────────────────┘

Common Workflows

Quick Local Testing

make init && make up
# Open http://localhost:3000 → Start chatting & monitoring

LangChain Agent Development

from ollama_local_serve import create_langchain_chat_client, OllamaService
async with OllamaService() as service:
    llm = create_langchain_chat_client(model="llama3.2")
    # Build your agent...

Production Kubernetes Deployment

cd k8s
helm install ollama-serve . -n production -f values.yaml
# Configure ingress, GPU, and database backends

Performance Benchmarking

make up  # Start stack
# Generate load and monitor metrics in dashboard
# Check /api/stats/history for detailed performance data

Documentation

Detailed documentation is organized into the following sections:

Guide Description
Installation Guide Installation methods, prerequisites, and basic usage
Docker Deployment Docker Compose setup and Make commands
Kubernetes Deployment Helm charts and K8s deployment
Configuration Environment variables and Pydantic config
Monitoring & Instrumentation Metrics, instrumentation, and API endpoints
LangChain Integration LangChain and LangGraph usage examples
Development Guide Setup, code quality, and development mode
API Reference Python API and REST endpoint documentation
GPU Testing Guide GPU setup, verification, and benchmarking

Project Structure

ollama-local-serve/
├── ollama_local_serve/          # Python package
│   ├── __init__.py
│   ├── config.py                # Pydantic configuration
│   ├── service.py               # OllamaService class
│   ├── client.py                # LangChain client
│   ├── exceptions.py            # Custom exceptions
│   ├── api/                     # FastAPI server
│   │   ├── server.py
│   │   ├── models.py
│   │   └── dependencies.py
│   ├── instrumentation/         # OTEL instrumentation
│   │   ├── metrics_provider.py
│   │   └── tracer.py
│   └── exporters/               # Database exporters
│       ├── base.py
│       ├── clickhouse_exporter.py
│       └── postgres_exporter.py
├── frontend/                    # React dashboard
│   ├── src/
│   │   ├── components/
│   │   │   ├── chat/            # Chat bubble with streaming
│   │   │   ├── charts/          # Visualization components
│   │   │   └── ...              # Other UI components
│   │   ├── pages/
│   │   ├── hooks/
│   │   ├── context/             # App and Theme context
│   │   └── utils/
│   ├── package.json
│   └── Dockerfile
├── schemas/                     # Database schemas
│   ├── clickhouse_init.sql
│   └── postgres_init.sql
├── k8s/                         # Kubernetes configuration
│   ├── values.yaml
│   ├── values-local.yaml
│   └── local-databases.yaml
├── docker-compose.yml           # Production stack
├── docker-compose.dev.yml       # Development overrides
├── Dockerfile                   # API Dockerfile
├── Makefile                     # Convenience commands
├── pyproject.toml               # Python project config
├── requirements-api.txt         # API dependencies
└── docs/                        # Documentation
    ├── INSTALLATION.md
    ├── DOCKER.md
    ├── KUBERNETES.md
    ├── CONFIGURATION.md
    ├── MONITORING.md
    ├── LANGCHAIN.md
    ├── API_REFERENCE.md
    └── DEVELOPMENT.md

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollama_local_serve-0.2.5.tar.gz (60.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ollama_local_serve-0.2.5-py3-none-any.whl (65.6 kB view details)

Uploaded Python 3

File details

Details for the file ollama_local_serve-0.2.5.tar.gz.

File metadata

  • Download URL: ollama_local_serve-0.2.5.tar.gz
  • Upload date:
  • Size: 60.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ollama_local_serve-0.2.5.tar.gz
Algorithm Hash digest
SHA256 177a97a29e0d40baaa12b79e2d28719b0961823cb68fc348a0a6c7dcaceb75f6
MD5 793f021e21ee12edc2a66001e8b382ab
BLAKE2b-256 6eedeed4c21c2e483d3287410880967adc7d4622bf0958ffbbeeaf1bd05cdd70

See more details on using hashes here.

File details

Details for the file ollama_local_serve-0.2.5-py3-none-any.whl.

File metadata

File hashes

Hashes for ollama_local_serve-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5d68f0fbcc695dd211c00ff60da63bdfdf2626341ec1e36131ad543329f30b7e
MD5 373e1c05a75e86efaee1e3da8078d58d
BLAKE2b-256 add6e9cda87d2a24e0d9e6c611f7dd6134cc7e61599a7737c04ad63fb7dae8e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page