A simple Python package for Ollama utilities with built-in AI vibe tests

These details have not been verified by PyPI

Project links

Project description

OllamaPy

A powerful terminal-based chat interface for Ollama with AI meta-reasoning capabilities and comprehensive performance analysis. OllamaPy provides an intuitive way to interact with local AI models while featuring unique "vibe tests" that evaluate AI decision-making consistency and timing performance.

Demo

Demo showing terminal app usage

Features

🤖 Terminal Chat Interface - Clean, user-friendly chat experience in your terminal
🔄 Streaming Responses - Real-time streaming for natural conversation flow
📚 Model Management - Automatic model pulling and listing of available models
🧠 Meta-Reasoning - AI analyzes user input and selects appropriate actions
🛠️ Extensible Actions - Easy-to-extend action system with parameter support
🧪 AI Vibe Tests - Built-in tests to evaluate AI consistency and reliability
⏱️ Performance Analysis - Comprehensive timing analysis with consistency scoring
📊 Interactive Reports - Rich HTML reports with timing visualizations
🔢 Parameter Extraction - AI intelligently extracts parameters from natural language
🏗️ Modular Architecture - Clean separation of concerns for easy testing and extension

Prerequisites

You need to have Ollama installed and running on your system.

# Install Ollama (if not already installed)
curl -fsSL https://ollama.ai/install.sh | sh

# Start the Ollama server
ollama serve

Installation

Install from PyPI:

pip install ollamapy

Or install from source:

git clone https://github.com/ScienceIsVeryCool/OllamaPy.git
cd OllamaPy
pip install .

Quick Start

Simply run the chat interface:

ollamapy

This will start a chat session with the default model (gemma3:4b). If the model isn't available locally, OllamaPy will automatically pull it for you.

Usage Examples

Basic Chat

# Start chat with default model
ollamapy

Custom Model

# Use a specific model
ollamapy --model gemma2:2b
ollamapy -m codellama:7b

Dual Model Setup (Analysis + Chat)

# Use a small, fast model for analysis and a larger model for chat
ollamapy --analysis-model gemma2:2b --model llama3.2:7b
ollamapy -a gemma2:2b -m mistral:7b

# This is great for performance - small model does action selection, large model handles conversation

System Message

# Set context for the AI
ollamapy --system "You are a helpful coding assistant specializing in Python"
ollamapy -s "You are a creative writing partner"

Combined Options

# Use custom models with system message
ollamapy --analysis-model gemma2:2b --model mistral:7b --system "You are a helpful assistant"

Meta-Reasoning System

OllamaPy features a unique meta-reasoning system where the AI analyzes user input and dynamically selects from available actions. The AI examines the intent behind your message and chooses the most appropriate response action.

Dual Model Architecture

For optimal performance, you can use two different models:

Analysis Model: A smaller, faster model (like gemma2:2b) for quick action selection
Chat Model: A larger, more capable model (like llama3.2:7b) for generating responses

This architecture provides the best of both worlds - fast decision-making and high-quality responses.

# Example: Fast analysis with powerful chat
ollamapy --analysis-model gemma2:2b --model llama3.2:7b

Currently Available Actions

null - Default conversation mode. Used for normal chat when no special action is needed
fear - Responds to disturbing or delusional content with direct feedback
fileReader - Reads and displays file contents when user provides a file path
directoryReader - Explores entire directory contents for project analysis
getWeather - Provides weather information (accepts optional location parameter)
getTime - Returns the current date and time (accepts optional timezone parameter)
square_root - Calculates the square root of a number (requires number parameter)
calculate - Evaluates basic mathematical expressions (requires expression parameter)

How Meta-Reasoning Works

When you send a message, the AI:

Analyzes your input to understand intent
Selects the most appropriate action(s) from all available actions
Extracts any required parameters from your input
Executes the chosen action(s) with parameters
Responds using the action's output as context

Creating Custom Actions

The action system is designed to be easily extensible. Here's a comprehensive guide on creating your own actions:

Basic Action Structure

from ollamapy.actions import register_action

@register_action(
    name="action_name",
    description="When to use this action",
    vibe_test_phrases=["test phrase 1", "test phrase 2"],  # Optional
    parameters={  # Optional
        "param_name": {
            "type": "string|number",
            "description": "What this parameter is for",
            "required": True|False
        }
    }
)
def action_name(param_name=None):
    """Your action implementation."""
    from ollamapy.actions import log
    
    # Log results so the AI can use them as context
    log(f"[Action] Result: {some_result}")
    # Actions communicate via logging, not return values

Example 1: Simple Action (No Parameters)

from ollamapy.actions import register_action, log

@register_action(
    name="joke",
    description="Use when the user wants to hear a joke or needs cheering up",
    vibe_test_phrases=[
        "tell me a joke",
        "I need a laugh",
        "cheer me up",
        "make me smile"
    ]
)
def joke():
    """Tell a random joke."""
    import random
    jokes = [
        "Why don't scientists trust atoms? Because they make up everything!",
        "Why did the scarecrow win an award? He was outstanding in his field!",
        "Why don't eggs tell jokes? They'd crack each other up!"
    ]
    selected_joke = random.choice(jokes)
    log(f"[Joke] {selected_joke}")

Example 2: Action with Required Parameter

@register_action(
    name="convert_temp",
    description="Convert temperature between Celsius and Fahrenheit",
    vibe_test_phrases=[
        "convert 32 fahrenheit to celsius",
        "what's 100C in fahrenheit?",
        "20 degrees celsius in F"
    ],
    parameters={
        "value": {
            "type": "number",
            "description": "The temperature value to convert",
            "required": True
        },
        "unit": {
            "type": "string",
            "description": "The unit to convert from (C or F)",
            "required": True
        }
    }
)
def convert_temp(value, unit):
    """Convert temperature between units."""
    unit = unit.upper()
    if unit == 'C':
        # Celsius to Fahrenheit
        result = (value * 9/5) + 32
        log(f"[Temperature] {value}°C = {result:.1f}°F")
    elif unit == 'F':
        # Fahrenheit to Celsius
        result = (value - 32) * 5/9
        log(f"[Temperature] {value}°F = {result:.1f}°C")
    else:
        log(f"[Temperature] Error: Unknown unit '{unit}'. Use 'C' or 'F'.")

Adding Your Actions to OllamaPy

Create a new Python file for your actions (e.g., my_actions.py)
Import and implement your actions using the patterns above
Import your actions module before starting OllamaPy

# my_script.py
from ollamapy import chat
import my_actions  # This registers your actions

# Now start chat with your custom actions available
chat()

Vibe Tests with Performance Analysis

Vibe tests are a built-in feature that evaluates how consistently AI models interpret human intent and choose appropriate actions. These tests now include comprehensive timing analysis to help you understand both accuracy and performance characteristics.

Running Vibe Tests

# Run vibe tests with default settings
ollamapy --vibetest

# Run with multiple iterations for statistical confidence
ollamapy --vibetest -n 5

# Test a specific model
ollamapy --vibetest --model gemma2:2b -n 3

# Use dual models for testing (analysis + chat)
ollamapy --vibetest --analysis-model gemma2:2b --model llama3.2:7b -n 5

# Extended statistical analysis
ollamapy --vibetest --analysis-model gemma2:2b --model llama3.2:7b -n 10

Understanding Results

Vibe tests evaluate multiple dimensions:

Accuracy Metrics:

Action Selection: How reliably the AI chooses the correct action
Parameter Extraction: How accurately the AI extracts required parameters
Consistency: How stable the AI's decisions are across multiple runs

Performance Metrics:

Response Time: Average, median, min/max execution times
Consistency Score: 0-100 score based on timing variability
Performance Categories: "Very Fast", "Fast", "Moderate", "Slow", "Very Slow"
Percentile Analysis: 25th, 75th, 95th percentiles for timing distribution

Visual Analytics:

Interactive HTML Reports: Rich visualizations with timing charts
Performance Comparison: Speed vs consistency scatter plots
Per-phrase Analysis: Detailed breakdown for each test phrase
Quadrant Analysis: Identifies optimal performance zones

Performance Insights

The timing analysis helps you:

Optimize Model Selection: Choose the best speed/accuracy trade-offs
Identify Bottlenecks: Find slow or inconsistent actions
Validate Stability: Ensure consistent performance across runs
Compare Configurations: Evaluate different model combinations

Example timing output:

Timing Analysis:
  Average: 1.23s | Median: 1.15s
  Range: 0.89s - 2.11s
  Performance: Fast
  Consistency: 87.3/100

Tests pass with a 60% or higher success rate, ensuring reasonable consistency in decision-making.

Chat Commands

While chatting, you can use these built-in commands:

quit, exit, bye - End the conversation
clear - Clear conversation history
help - Show available commands
model - Display current models (both chat and analysis)
models - List all available models
actions - Show available actions the AI can choose from

Python API

You can also use OllamaPy programmatically:

from ollamapy import OllamaClient, ModelManager, AnalysisEngine, ChatSession, TerminalInterface

# Create components
client = OllamaClient()
model_manager = ModelManager(client)
analysis_engine = AnalysisEngine("gemma2:2b", client)  # Fast analysis model
chat_session = ChatSession("llama3.2:7b", client, "You are a helpful assistant")

# Start a terminal interface
terminal = TerminalInterface(model_manager, analysis_engine, chat_session)
terminal.run()

# Or use components directly
messages = [{"role": "user", "content": "Hello!"}]
for chunk in client.chat_stream("gemma3:4b", messages):
    print(chunk, end="", flush=True)

# Execute actions programmatically
from ollamapy import execute_action
execute_action("square_root", {"number": 16})

# Run vibe tests programmatically with timing analysis
from ollamapy import run_vibe_tests
success = run_vibe_tests(
    model="llama3.2:7b", 
    analysis_model="gemma2:2b", 
    iterations=5
)

Available Classes and Functions

Core Components:

OllamaClient - Low-level API client for Ollama
ModelManager - Model availability, pulling, and validation
AnalysisEngine - AI decision-making and action selection
ChatSession - Conversation state and response generation
TerminalInterface - Terminal UI and user interaction

Action System:

register_action() - Decorator for creating new actions
execute_action() - Execute an action with parameters
get_available_actions() - Get all registered actions
log() - Log messages from within actions

Testing & Analysis:

VibeTestRunner - Advanced vibe test runner with timing analysis
run_vibe_tests() - Simple function to run vibe tests
VibeTestReportGenerator - Generate rich HTML reports with visualizations
TimingStats - Sophisticated timing analysis with consistency scoring

Utilities:

convert_parameter_value() - Convert parameter types
extract_numbers_from_text() - Extract numbers from text
prepare_function_parameters() - Prepare parameters for function calls

Configuration

OllamaPy connects to Ollama on http://localhost:11434 by default. If your Ollama instance is running elsewhere:

from ollamapy import OllamaClient

client = OllamaClient(base_url="http://your-ollama-server:11434")

Supported Models

OllamaPy works with any model available in Ollama. Popular options include:

Recommended for Analysis (Fast):

gemma2:2b - Lightweight, excellent for action selection
gemma3:4b - Balanced speed and capability
llama3.2:3b - Fast and efficient

Recommended for Chat (Quality):

gemma3:4b (default) - Great all-around performance
gemma2:9b - Larger model for complex conversations
llama3.2:7b - High-quality responses
mistral:7b - Strong general-purpose model
codellama:7b - Specialized for coding tasks

Performance Optimization Examples:

# Speed-optimized: Fast analysis + moderate chat
ollamapy --analysis-model gemma2:2b --model gemma3:4b

# Quality-optimized: Moderate analysis + high-quality chat  
ollamapy --analysis-model gemma3:4b --model llama3.2:7b

# Balanced: Same capable model for both
ollamapy --model gemma3:4b

To see available models on your system: ollama list

Development

Clone the repository and install in development mode:

git clone https://github.com/ScienceIsVeryCool/OllamaPy.git
cd OllamaPy
pip install -e ".[dev]"

Run tests:

pytest

Run vibe tests with timing analysis:

pytest -m vibetest

Architecture Overview

OllamaPy uses a clean, modular architecture with performance monitoring:

┌───────────────────────┐    ┌─────────────────────┐    ┌─────────────────────┐
│ TerminalInterface │    │  AnalysisEngine │    │   ChatSession   │
│                   │    │                 │    │                 │
│ • User input      │    │ • Action select │    │ • Conversation  │
│ • Commands        │    │ • Parameter     │    │ • Response gen  │
│ • Display         │    │   extraction    │    │ • History       │
│ • Timing display  │    │ • ⏱️ Timing      │    │                 │
└───────────────────────┘    └─────────────────────┘    └─────────────────────┘
         │                       │                       │
         └─────────────────────────────┼───────────────────────────┘
                                 │
        ┌─────────────────────────────────────────┐    ┌─────────────────────┐
        │             Testing & Analytics  │    │ OllamaClient    │
        │                                  │    │                 │
        │ • VibeTestRunner  • TimingStats  │    │ • HTTP API      │
        │ • ReportGenerator • Consistency  │    │ • Streaming     │
        │ • Performance Analysis           |    │ • Low-level     │
        └─────────────────────────────────────────┘    └─────────────────────┘
                                 │
                    ┌───────────────────┐
                    │  ModelManager  │
                    │                │
                    │ • Model pull   │
                    │ • Availability │
                    │ • Validation   │
                    └───────────────────┘

Each component has a single responsibility and can be tested independently. The timing system is integrated throughout without affecting core functionality.

Troubleshooting

"Ollama server is not running!"

Make sure Ollama is installed and running:

ollama serve

Model not found

OllamaPy will automatically pull models, but you can also pull manually:

ollama pull gemma3:4b

Parameter extraction issues

Use a more capable analysis model: ollamapy --analysis-model llama3.2:3b
Ensure your action descriptions clearly indicate what parameters are needed
Check that your test phrases include the expected parameters

Vibe test failures

Try different models: ollamapy --vibetest --model gemma2:9b
Use separate analysis model: ollamapy --vibetest --analysis-model gemma2:2b
Increase iterations for better statistics: ollamapy --vibetest -n 10
Check that your test phrases clearly indicate the intended action

Performance issues

Use a smaller model for analysis: --analysis-model gemma2:2b
Check timing reports to identify slow actions
Ensure sufficient system resources for your chosen models
Check Ollama server performance with ollama ps
Review consistency scores in vibe test reports

Slow or inconsistent timing

Monitor consistency scores in vibe test reports
Try different model combinations for optimal speed/accuracy
Check system resources and Ollama server health
Use timing analysis to identify performance bottlenecks

Project Information

Version: 0.8.0
License: GPL-3.0-or-later
Author: The Lazy Artist
Python: >=3.8
Dependencies: requests>=2.25.0, plotly (for reports)

License

This project is licensed under the GPL-3.0-or-later license. See the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.8.0

Aug 11, 2025

0.7.0

Aug 8, 2025

0.6.2

Jul 30, 2025

0.6.1

Jul 30, 2025

0.6.0

Jul 30, 2025

0.5.1

Jul 29, 2025

0.5.0

Jul 29, 2025

0.4.4

Jul 29, 2025

0.4.3

Jul 29, 2025

0.4.2

Jul 29, 2025

0.4.1

Jul 29, 2025

0.4.0

Jul 29, 2025

0.3.0

Jul 29, 2025

0.2.0

Jul 29, 2025

0.1.0

Jul 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollamapy-0.8.0.tar.gz (58.1 kB view details)

Uploaded Aug 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ollamapy-0.8.0-py3-none-any.whl (37.4 kB view details)

Uploaded Aug 11, 2025 Python 3

File details

Details for the file ollamapy-0.8.0.tar.gz.

File metadata

Download URL: ollamapy-0.8.0.tar.gz
Upload date: Aug 11, 2025
Size: 58.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for ollamapy-0.8.0.tar.gz
Algorithm	Hash digest
SHA256	`25063b1dffe21c1479b31648a37eeb77f7ce870f6f9ab3233bed869b7bb92a06`
MD5	`889ec91d2bf64d3e59b6787f45e45f59`
BLAKE2b-256	`32ba2d9b59d359752ce3a54b65d342df14a37b5292c1217500bf571aa3c21e6d`

See more details on using hashes here.

File details

Details for the file ollamapy-0.8.0-py3-none-any.whl.

File metadata

Download URL: ollamapy-0.8.0-py3-none-any.whl
Upload date: Aug 11, 2025
Size: 37.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for ollamapy-0.8.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`83ac254d5717c3eda1b9bdca6cc00362b0d9669777ca38e8223eec020ba09723`
MD5	`e6d7c5d5bd82543f779f92ce3a33482a`
BLAKE2b-256	`d92a9f05cdd9b780bc7982564d1e78030105f5cedf946fd6c5bbbb76645b9063`

See more details on using hashes here.

ollamapy 0.8.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OllamaPy

Demo

Features

Prerequisites

Installation

Quick Start

Usage Examples

Basic Chat

Custom Model

Dual Model Setup (Analysis + Chat)

System Message

Combined Options

Meta-Reasoning System

Dual Model Architecture

Currently Available Actions

How Meta-Reasoning Works

Creating Custom Actions

Basic Action Structure

Example 1: Simple Action (No Parameters)

Example 2: Action with Required Parameter

Adding Your Actions to OllamaPy

Vibe Tests with Performance Analysis

Running Vibe Tests

Understanding Results

Accuracy Metrics:

Performance Metrics:

Visual Analytics:

Performance Insights

Chat Commands

Python API

Available Classes and Functions

Core Components:

Action System:

Testing & Analysis:

Utilities:

Configuration

Supported Models

Recommended for Analysis (Fast):

Recommended for Chat (Quality):

Performance Optimization Examples:

Development

Architecture Overview

Troubleshooting

"Ollama server is not running!"

Model not found

Parameter extraction issues

Vibe test failures

Performance issues

Slow or inconsistent timing

Project Information

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes