Local LLM infrastructure with monitoring dashboard for distributed AI applications

These details have not been verified by PyPI

Project links

Project description

Ollama Local Serve

Local LLM infrastructure with a professional monitoring dashboard for distributed AI applications. Serve Ollama-powered models across your network with seamless LangChain integration, OpenTelemetry instrumentation, and real-time metrics visualization.

Features

Service Management: Easy start/stop control of Ollama server instances
Network Accessible: Configure host/port for LAN accessibility
LangChain Integration: Seamless integration with LangChain for remote LLM clients
OpenTelemetry Instrumentation: Built-in metrics collection with OTEL support
Real-time Monitoring Dashboard: Professional React dashboard with live metrics
In-App Chat Interface: Floating chat bubble with streaming responses and markdown support
Model Management: Pull, delete, and manage models directly from the dashboard
Model Repository: Track favorites, usage stats, and preferences per model
Multiple Database Backends: Export metrics to ClickHouse or PostgreSQL/TimescaleDB
Enhanced Request Logging: Capture prompt/response text, client info, and token counts
Data Management: Clear metrics/logs and view data summaries via API
Health Checks: Built-in health check endpoints to monitor service status
Docker Ready: Complete Docker Compose stack for production deployment
Async/Await: Production-ready async patterns throughout
Type Hints: Full type annotations with Pydantic configuration

Quick Start

Installation

# Basic installation
pip install ollama-local-serve

# With LangChain integration
pip install ollama-local-serve[langchain]

# With LangGraph integration (includes LangChain)
pip install ollama-local-serve[langgraph]

# With full monitoring stack
pip install ollama-local-serve[monitoring]

# All features
pip install ollama-local-serve[all]

See Installation Guide for detailed installation options and prerequisites.

Basic Usage

import asyncio
from ollama_local_serve import OllamaService, NetworkConfig

async def main():
    config = NetworkConfig(host="0.0.0.0", port=11434)
    
    async with OllamaService(config) as service:
        print(f"Service running at {service.base_url}")
        await service.health_check()

asyncio.run(main())

See Installation Guide for more examples and error handling.

Use Cases

Ollama Local Serve is ideal for:

Development & Testing: Quick local LLM setup with integrated monitoring
Research & Experimentation: Compare models, track metrics, benchmark performance
Small to Medium Scale Inference: Single or small cluster deployments (10-100 concurrent users)
AI Agent Development: Build ReAct agents with LangChain/LangGraph integration
Educational Projects: Learn about LLMs, monitoring, distributed systems
Internal Tools: Deploy custom AI features within organizations
Prototyping: Fast iteration with live metrics and dashboard feedback

Deployment Comparison

Aspect	Docker Compose	Kubernetes	Local Development
Setup Time	5 minutes	15-30 minutes	2-3 minutes
Scalability	Single machine	Multi-node clusters	Single machine
Persistence	Volumes	PersistentVolumeClaims	Local filesystem
Best For	Development, testing	Production, scaling	Quick prototyping
GPU Support	Yes	Yes (NVIDIA plugin)	Yes
Cost	Low	Moderate to High	Free
Monitoring	Built-in dashboard	Enhanced with Prometheus	Dashboard included

Recommendation: Start with Docker Compose for development, move to Kubernetes for production multi-node deployments.

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                      Client Applications                     │
│  (Python Scripts, LangChain Agents, HTTP Clients, etc.)    │
└────────────────┬────────────────────────────────────────────┘
                 │
                 ├─────────────────┬─────────────────┐
                 │                 │                 │
        ┌────────▼────────┐   ┌───▼──────────┐  ┌──▼─────────────┐
        │   FastAPI Server │   │  Ollama      │  │  React Dashboard
        │   (Port 8000)    │   │  (Port 11434)│  │  (Port 3000/5173)
        │                  │   │              │  │
        │ REST API Layer   │   │ LLM Engine   │  │ Real-time UI
        │ - Chat Endpoint  │   │ - Models     │  │ - Metrics Viz
        │ - Stats/Metrics  │   │ - Generation │  │ - System Health
        │ - Health Checks  │   │ - Streaming  │  │ - Model Mgmt
        └────────┬─────────┘   └──────────────┘  └────────────────┘
                 │
        ┌────────▼──────────────────────────┐
        │   Metrics Collection Layer        │
        │   (OpenTelemetry Instrumentation) │
        │   - Request tracking              │
        │   - Token counting                │
        │   - Performance metrics           │
        │   - Error tracking                │
        └────────┬──────────────────────────┘
                 │
        ┌────────▼──────────────────────────┐
        │   Storage Layer (Choose one/both) │
        ├──────────────────────────────────┤
        │  ClickHouse (Time-series)        │
        │  - Fast metrics queries           │
        │  - Real-time aggregations         │
        │                                   │
        │  PostgreSQL/TimescaleDB (Query)  │
        │  - Relational queries             │
        │  - Model metadata                 │
        └───────────────────────────────────┘

Common Workflows

Quick Local Testing

make init && make up
# Open http://localhost:3000 → Start chatting & monitoring

LangChain Agent Development

from ollama_local_serve import create_langchain_chat_client, OllamaService
async with OllamaService() as service:
    llm = create_langchain_chat_client(model="llama3.2")
    # Build your agent...

Production Kubernetes Deployment

cd k8s
helm install ollama-serve . -n production -f values.yaml
# Configure ingress, GPU, and database backends

Performance Benchmarking

make up  # Start stack
# Generate load and monitor metrics in dashboard
# Check /api/stats/history for detailed performance data

Documentation

Detailed documentation is organized into the following sections:

Guide	Description
Installation Guide	Installation methods, prerequisites, and basic usage
Docker Deployment	Docker Compose setup and Make commands
Kubernetes Deployment	Helm charts and K8s deployment
Configuration	Environment variables and Pydantic config
Monitoring & Instrumentation	Metrics, instrumentation, and API endpoints
LangChain Integration	LangChain and LangGraph usage examples
Development Guide	Setup, code quality, and development mode
API Reference	Python API and REST endpoint documentation
GPU Testing Guide	GPU setup, verification, and benchmarking

Project Structure

ollama-local-serve/
├── ollama_local_serve/          # Python package
│   ├── __init__.py
│   ├── config.py                # Pydantic configuration
│   ├── service.py               # OllamaService class
│   ├── client.py                # LangChain client
│   ├── exceptions.py            # Custom exceptions
│   ├── api/                     # FastAPI server
│   │   ├── server.py
│   │   ├── models.py
│   │   └── dependencies.py
│   ├── instrumentation/         # OTEL instrumentation
│   │   ├── metrics_provider.py
│   │   └── tracer.py
│   └── exporters/               # Database exporters
│       ├── base.py
│       ├── clickhouse_exporter.py
│       └── postgres_exporter.py
├── frontend/                    # React dashboard
│   ├── src/
│   │   ├── components/
│   │   │   ├── chat/            # Chat bubble with streaming
│   │   │   ├── charts/          # Visualization components
│   │   │   └── ...              # Other UI components
│   │   ├── pages/
│   │   ├── hooks/
│   │   ├── context/             # App and Theme context
│   │   └── utils/
│   ├── package.json
│   └── Dockerfile
├── schemas/                     # Database schemas
│   ├── clickhouse_init.sql
│   └── postgres_init.sql
├── k8s/                         # Kubernetes configuration
│   ├── values.yaml
│   ├── values-local.yaml
│   └── local-databases.yaml
├── docker-compose.yml           # Production stack
├── docker-compose.dev.yml       # Development overrides
├── Dockerfile                   # API Dockerfile
├── Makefile                     # Convenience commands
├── pyproject.toml               # Python project config
├── requirements-api.txt         # API dependencies
└── docs/                        # Documentation
    ├── INSTALLATION.md
    ├── DOCKER.md
    ├── KUBERNETES.md
    ├── CONFIGURATION.md
    ├── MONITORING.md
    ├── LANGCHAIN.md
    ├── API_REFERENCE.md
    └── DEVELOPMENT.md

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Mar 15, 2026

0.2.6

Jan 3, 2026

0.2.5

Dec 27, 2025

0.2.4

Dec 21, 2025

0.2.3

Dec 21, 2025

0.2.2

Dec 21, 2025

0.2.1

Dec 21, 2025

0.2.0

Dec 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollama_local_serve-0.3.0.tar.gz (73.2 kB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ollama_local_serve-0.3.0-py3-none-any.whl (79.7 kB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file ollama_local_serve-0.3.0.tar.gz.

File metadata

Download URL: ollama_local_serve-0.3.0.tar.gz
Upload date: Mar 15, 2026
Size: 73.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ollama_local_serve-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`56db4f5793ea1e4714e389bc26b141fb42f2923a86da0fd33941ce0414e55f20`
MD5	`60068f777d394c6905f7e43e108fc085`
BLAKE2b-256	`094285617d74aca557ca070702f339eae2279f88d1424a1d8b78c441e1eb4a70`

See more details on using hashes here.

File details

Details for the file ollama_local_serve-0.3.0-py3-none-any.whl.

File metadata

Download URL: ollama_local_serve-0.3.0-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 79.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ollama_local_serve-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e5508cba707d6d465e0d54e1c84e2d1a852f2da1f050bd99722401e30d68cd12`
MD5	`6563d77d78f6fe82c3687f3f4b15fe6f`
BLAKE2b-256	`c075c5ba6b52167089d5953fc0deff31f66d464a14cb0fad561be185b56223d8`

See more details on using hashes here.

ollama-local-serve 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Ollama Local Serve

Features

Quick Start

Installation

Basic Usage

Use Cases

Deployment Comparison

Architecture Overview

Common Workflows

Quick Local Testing

LangChain Agent Development

Production Kubernetes Deployment

Performance Benchmarking

Documentation

Project Structure

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes