Skip to main content

A simple Python package for Ollama utilities with built-in AI vibe tests

Project description

OllamaPy

A powerful terminal-based chat interface for Ollama with AI meta-reasoning capabilities and comprehensive performance analysis. OllamaPy provides an intuitive way to interact with local AI models while featuring unique "vibe tests" that evaluate AI decision-making consistency and timing performance.

Demo

Demo showing terminal app usage

Features

  • ๐Ÿค– Terminal Chat Interface - Clean, user-friendly chat experience in your terminal
  • ๐Ÿ”„ Streaming Responses - Real-time streaming for natural conversation flow
  • ๐Ÿ“š Model Management - Automatic model pulling and listing of available models
  • ๐Ÿง  Meta-Reasoning - AI analyzes user input and selects appropriate actions
  • ๐Ÿ› ๏ธ Extensible Actions - Easy-to-extend action system with parameter support
  • ๐Ÿงช AI Vibe Tests - Built-in tests to evaluate AI consistency and reliability
  • โฑ๏ธ Performance Analysis - Comprehensive timing analysis with consistency scoring
  • ๐Ÿ“Š Interactive Reports - Rich HTML reports with timing visualizations
  • ๐Ÿ”ข Parameter Extraction - AI intelligently extracts parameters from natural language
  • ๐Ÿ—๏ธ Modular Architecture - Clean separation of concerns for easy testing and extension

Prerequisites

You need to have Ollama installed and running on your system.

# Install Ollama (if not already installed)
curl -fsSL https://ollama.ai/install.sh | sh

# Start the Ollama server
ollama serve

Installation

Install from PyPI:

pip install ollamapy

Or install from source:

git clone https://github.com/ScienceIsVeryCool/OllamaPy.git
cd OllamaPy
pip install .

Quick Start

Simply run the chat interface:

ollamapy

This will start a chat session with the default model (gemma3:4b). If the model isn't available locally, OllamaPy will automatically pull it for you.

Usage Examples

Basic Chat

# Start chat with default model
ollamapy

Custom Model

# Use a specific model
ollamapy --model gemma2:2b
ollamapy -m codellama:7b

Dual Model Setup (Analysis + Chat)

# Use a small, fast model for analysis and a larger model for chat
ollamapy --analysis-model gemma2:2b --model llama3.2:7b
ollamapy -a gemma2:2b -m mistral:7b

# This is great for performance - small model does action selection, large model handles conversation

System Message

# Set context for the AI
ollamapy --system "You are a helpful coding assistant specializing in Python"
ollamapy -s "You are a creative writing partner"

Combined Options

# Use custom models with system message
ollamapy --analysis-model gemma2:2b --model mistral:7b --system "You are a helpful assistant"

Meta-Reasoning System

OllamaPy features a unique meta-reasoning system where the AI analyzes user input and dynamically selects from available actions. The AI examines the intent behind your message and chooses the most appropriate response action.

Dual Model Architecture

For optimal performance, you can use two different models:

  • Analysis Model: A smaller, faster model (like gemma2:2b) for quick action selection
  • Chat Model: A larger, more capable model (like llama3.2:7b) for generating responses

This architecture provides the best of both worlds - fast decision-making and high-quality responses.

# Example: Fast analysis with powerful chat
ollamapy --analysis-model gemma2:2b --model llama3.2:7b

Currently Available Actions

  • null - Default conversation mode. Used for normal chat when no special action is needed
  • fear - Responds to disturbing or delusional content with direct feedback
  • fileReader - Reads and displays file contents when user provides a file path
  • directoryReader - Explores entire directory contents for project analysis
  • getWeather - Provides weather information (accepts optional location parameter)
  • getTime - Returns the current date and time (accepts optional timezone parameter)
  • square_root - Calculates the square root of a number (requires number parameter)
  • calculate - Evaluates basic mathematical expressions (requires expression parameter)

How Meta-Reasoning Works

When you send a message, the AI:

  1. Analyzes your input to understand intent
  2. Selects the most appropriate action(s) from all available actions
  3. Extracts any required parameters from your input
  4. Executes the chosen action(s) with parameters
  5. Responds using the action's output as context

Creating Custom Actions

The action system is designed to be easily extensible. Here's a comprehensive guide on creating your own actions:

Basic Action Structure

from ollamapy.actions import register_action

@register_action(
    name="action_name",
    description="When to use this action",
    vibe_test_phrases=["test phrase 1", "test phrase 2"],  # Optional
    parameters={  # Optional
        "param_name": {
            "type": "string|number",
            "description": "What this parameter is for",
            "required": True|False
        }
    }
)
def action_name(param_name=None):
    """Your action implementation."""
    from ollamapy.actions import log
    
    # Log results so the AI can use them as context
    log(f"[Action] Result: {some_result}")
    # Actions communicate via logging, not return values

Example 1: Simple Action (No Parameters)

from ollamapy.actions import register_action, log

@register_action(
    name="joke",
    description="Use when the user wants to hear a joke or needs cheering up",
    vibe_test_phrases=[
        "tell me a joke",
        "I need a laugh",
        "cheer me up",
        "make me smile"
    ]
)
def joke():
    """Tell a random joke."""
    import random
    jokes = [
        "Why don't scientists trust atoms? Because they make up everything!",
        "Why did the scarecrow win an award? He was outstanding in his field!",
        "Why don't eggs tell jokes? They'd crack each other up!"
    ]
    selected_joke = random.choice(jokes)
    log(f"[Joke] {selected_joke}")

Example 2: Action with Required Parameter

@register_action(
    name="convert_temp",
    description="Convert temperature between Celsius and Fahrenheit",
    vibe_test_phrases=[
        "convert 32 fahrenheit to celsius",
        "what's 100C in fahrenheit?",
        "20 degrees celsius in F"
    ],
    parameters={
        "value": {
            "type": "number",
            "description": "The temperature value to convert",
            "required": True
        },
        "unit": {
            "type": "string",
            "description": "The unit to convert from (C or F)",
            "required": True
        }
    }
)
def convert_temp(value, unit):
    """Convert temperature between units."""
    unit = unit.upper()
    if unit == 'C':
        # Celsius to Fahrenheit
        result = (value * 9/5) + 32
        log(f"[Temperature] {value}ยฐC = {result:.1f}ยฐF")
    elif unit == 'F':
        # Fahrenheit to Celsius
        result = (value - 32) * 5/9
        log(f"[Temperature] {value}ยฐF = {result:.1f}ยฐC")
    else:
        log(f"[Temperature] Error: Unknown unit '{unit}'. Use 'C' or 'F'.")

Adding Your Actions to OllamaPy

  1. Create a new Python file for your actions (e.g., my_actions.py)
  2. Import and implement your actions using the patterns above
  3. Import your actions module before starting OllamaPy
# my_script.py
from ollamapy import chat
import my_actions  # This registers your actions

# Now start chat with your custom actions available
chat()

Vibe Tests with Performance Analysis

Vibe tests are a built-in feature that evaluates how consistently AI models interpret human intent and choose appropriate actions. These tests now include comprehensive timing analysis to help you understand both accuracy and performance characteristics.

Running Vibe Tests

# Run vibe tests with default settings
ollamapy --vibetest

# Run with multiple iterations for statistical confidence
ollamapy --vibetest -n 5

# Test a specific model
ollamapy --vibetest --model gemma2:2b -n 3

# Use dual models for testing (analysis + chat)
ollamapy --vibetest --analysis-model gemma2:2b --model llama3.2:7b -n 5

# Extended statistical analysis
ollamapy --vibetest --analysis-model gemma2:2b --model llama3.2:7b -n 10

Understanding Results

Vibe tests evaluate multiple dimensions:

Accuracy Metrics:

  • Action Selection: How reliably the AI chooses the correct action
  • Parameter Extraction: How accurately the AI extracts required parameters
  • Consistency: How stable the AI's decisions are across multiple runs

Performance Metrics:

  • Response Time: Average, median, min/max execution times
  • Consistency Score: 0-100 score based on timing variability
  • Performance Categories: "Very Fast", "Fast", "Moderate", "Slow", "Very Slow"
  • Percentile Analysis: 25th, 75th, 95th percentiles for timing distribution

Visual Analytics:

  • Interactive HTML Reports: Rich visualizations with timing charts
  • Performance Comparison: Speed vs consistency scatter plots
  • Per-phrase Analysis: Detailed breakdown for each test phrase
  • Quadrant Analysis: Identifies optimal performance zones

Performance Insights

The timing analysis helps you:

  • Optimize Model Selection: Choose the best speed/accuracy trade-offs
  • Identify Bottlenecks: Find slow or inconsistent actions
  • Validate Stability: Ensure consistent performance across runs
  • Compare Configurations: Evaluate different model combinations

Example timing output:

Timing Analysis:
  Average: 1.23s | Median: 1.15s
  Range: 0.89s - 2.11s
  Performance: Fast
  Consistency: 87.3/100

Tests pass with a 60% or higher success rate, ensuring reasonable consistency in decision-making.

Chat Commands

While chatting, you can use these built-in commands:

  • quit, exit, bye - End the conversation
  • clear - Clear conversation history
  • help - Show available commands
  • model - Display current models (both chat and analysis)
  • models - List all available models
  • actions - Show available actions the AI can choose from

Python API

You can also use OllamaPy programmatically:

from ollamapy import OllamaClient, ModelManager, AnalysisEngine, ChatSession, TerminalInterface

# Create components
client = OllamaClient()
model_manager = ModelManager(client)
analysis_engine = AnalysisEngine("gemma2:2b", client)  # Fast analysis model
chat_session = ChatSession("llama3.2:7b", client, "You are a helpful assistant")

# Start a terminal interface
terminal = TerminalInterface(model_manager, analysis_engine, chat_session)
terminal.run()

# Or use components directly
messages = [{"role": "user", "content": "Hello!"}]
for chunk in client.chat_stream("gemma3:4b", messages):
    print(chunk, end="", flush=True)

# Execute actions programmatically
from ollamapy import execute_action
execute_action("square_root", {"number": 16})

# Run vibe tests programmatically with timing analysis
from ollamapy import run_vibe_tests
success = run_vibe_tests(
    model="llama3.2:7b", 
    analysis_model="gemma2:2b", 
    iterations=5
)

Available Classes and Functions

Core Components:

  • OllamaClient - Low-level API client for Ollama
  • ModelManager - Model availability, pulling, and validation
  • AnalysisEngine - AI decision-making and action selection
  • ChatSession - Conversation state and response generation
  • TerminalInterface - Terminal UI and user interaction

Action System:

  • register_action() - Decorator for creating new actions
  • execute_action() - Execute an action with parameters
  • get_available_actions() - Get all registered actions
  • log() - Log messages from within actions

Testing & Analysis:

  • VibeTestRunner - Advanced vibe test runner with timing analysis
  • run_vibe_tests() - Simple function to run vibe tests
  • VibeTestReportGenerator - Generate rich HTML reports with visualizations
  • TimingStats - Sophisticated timing analysis with consistency scoring

Utilities:

  • convert_parameter_value() - Convert parameter types
  • extract_numbers_from_text() - Extract numbers from text
  • prepare_function_parameters() - Prepare parameters for function calls

Configuration

OllamaPy connects to Ollama on http://localhost:11434 by default. If your Ollama instance is running elsewhere:

from ollamapy import OllamaClient

client = OllamaClient(base_url="http://your-ollama-server:11434")

Supported Models

OllamaPy works with any model available in Ollama. Popular options include:

Recommended for Analysis (Fast):

  • gemma2:2b - Lightweight, excellent for action selection
  • gemma3:4b - Balanced speed and capability
  • llama3.2:3b - Fast and efficient

Recommended for Chat (Quality):

  • gemma3:4b (default) - Great all-around performance
  • gemma2:9b - Larger model for complex conversations
  • llama3.2:7b - High-quality responses
  • mistral:7b - Strong general-purpose model
  • codellama:7b - Specialized for coding tasks

Performance Optimization Examples:

# Speed-optimized: Fast analysis + moderate chat
ollamapy --analysis-model gemma2:2b --model gemma3:4b

# Quality-optimized: Moderate analysis + high-quality chat  
ollamapy --analysis-model gemma3:4b --model llama3.2:7b

# Balanced: Same capable model for both
ollamapy --model gemma3:4b

To see available models on your system: ollama list

Development

Clone the repository and install in development mode:

git clone https://github.com/ScienceIsVeryCool/OllamaPy.git
cd OllamaPy
pip install -e ".[dev]"

Run tests:

pytest

Run vibe tests with timing analysis:

pytest -m vibetest

Architecture Overview

OllamaPy uses a clean, modular architecture with performance monitoring:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TerminalInterface โ”‚    โ”‚  AnalysisEngine โ”‚    โ”‚   ChatSession   โ”‚
โ”‚                   โ”‚    โ”‚                 โ”‚    โ”‚                 โ”‚
โ”‚ โ€ข User input      โ”‚    โ”‚ โ€ข Action select โ”‚    โ”‚ โ€ข Conversation  โ”‚
โ”‚ โ€ข Commands        โ”‚    โ”‚ โ€ข Parameter     โ”‚    โ”‚ โ€ข Response gen  โ”‚
โ”‚ โ€ข Display         โ”‚    โ”‚   extraction    โ”‚    โ”‚ โ€ข History       โ”‚
โ”‚ โ€ข Timing display  โ”‚    โ”‚ โ€ข โฑ๏ธ Timing      โ”‚    โ”‚                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                       โ”‚                       โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                 โ”‚
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚             Testing & Analytics  โ”‚    โ”‚ OllamaClient    โ”‚
        โ”‚                                  โ”‚    โ”‚                 โ”‚
        โ”‚ โ€ข VibeTestRunner  โ€ข TimingStats  โ”‚    โ”‚ โ€ข HTTP API      โ”‚
        โ”‚ โ€ข ReportGenerator โ€ข Consistency  โ”‚    โ”‚ โ€ข Streaming     โ”‚
        โ”‚ โ€ข Performance Analysis           |    โ”‚ โ€ข Low-level     โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                 โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚  ModelManager  โ”‚
                    โ”‚                โ”‚
                    โ”‚ โ€ข Model pull   โ”‚
                    โ”‚ โ€ข Availability โ”‚
                    โ”‚ โ€ข Validation   โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Each component has a single responsibility and can be tested independently. The timing system is integrated throughout without affecting core functionality.

Troubleshooting

"Ollama server is not running!"

Make sure Ollama is installed and running:

ollama serve

Model not found

OllamaPy will automatically pull models, but you can also pull manually:

ollama pull gemma3:4b

Parameter extraction issues

  • Use a more capable analysis model: ollamapy --analysis-model llama3.2:3b
  • Ensure your action descriptions clearly indicate what parameters are needed
  • Check that your test phrases include the expected parameters

Vibe test failures

  • Try different models: ollamapy --vibetest --model gemma2:9b
  • Use separate analysis model: ollamapy --vibetest --analysis-model gemma2:2b
  • Increase iterations for better statistics: ollamapy --vibetest -n 10
  • Check that your test phrases clearly indicate the intended action

Performance issues

  • Use a smaller model for analysis: --analysis-model gemma2:2b
  • Check timing reports to identify slow actions
  • Ensure sufficient system resources for your chosen models
  • Check Ollama server performance with ollama ps
  • Review consistency scores in vibe test reports

Slow or inconsistent timing

  • Monitor consistency scores in vibe test reports
  • Try different model combinations for optimal speed/accuracy
  • Check system resources and Ollama server health
  • Use timing analysis to identify performance bottlenecks

Project Information

  • Version: 0.8.0
  • License: GPL-3.0-or-later
  • Author: The Lazy Artist
  • Python: >=3.8
  • Dependencies: requests>=2.25.0, plotly (for reports)

Links

License

This project is licensed under the GPL-3.0-or-later license. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollamapy-0.8.0.tar.gz (58.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ollamapy-0.8.0-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file ollamapy-0.8.0.tar.gz.

File metadata

  • Download URL: ollamapy-0.8.0.tar.gz
  • Upload date:
  • Size: 58.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for ollamapy-0.8.0.tar.gz
Algorithm Hash digest
SHA256 25063b1dffe21c1479b31648a37eeb77f7ce870f6f9ab3233bed869b7bb92a06
MD5 889ec91d2bf64d3e59b6787f45e45f59
BLAKE2b-256 32ba2d9b59d359752ce3a54b65d342df14a37b5292c1217500bf571aa3c21e6d

See more details on using hashes here.

File details

Details for the file ollamapy-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: ollamapy-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 37.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for ollamapy-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 83ac254d5717c3eda1b9bdca6cc00362b0d9669777ca38e8223eec020ba09723
MD5 e6d7c5d5bd82543f779f92ce3a33482a
BLAKE2b-256 d92a9f05cdd9b780bc7982564d1e78030105f5cedf946fd6c5bbbb76645b9063

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page