A simple Python package for Ollama utilities with built-in AI vibe tests
Project description
OllamaPy
A powerful terminal-based chat interface for Ollama with AI meta-reasoning capabilities and comprehensive performance analysis. OllamaPy provides an intuitive way to interact with local AI models while featuring unique "vibe tests" that evaluate AI decision-making consistency and timing performance.
Demo
Features
- ๐ค Terminal Chat Interface - Clean, user-friendly chat experience in your terminal
- ๐ Streaming Responses - Real-time streaming for natural conversation flow
- ๐ Model Management - Automatic model pulling and listing of available models
- ๐ง Meta-Reasoning - AI analyzes user input and selects appropriate actions
- ๐ ๏ธ Extensible Actions - Easy-to-extend action system with parameter support
- ๐งช AI Vibe Tests - Built-in tests to evaluate AI consistency and reliability
- โฑ๏ธ Performance Analysis - Comprehensive timing analysis with consistency scoring
- ๐ Interactive Reports - Rich HTML reports with timing visualizations
- ๐ข Parameter Extraction - AI intelligently extracts parameters from natural language
- ๐๏ธ Modular Architecture - Clean separation of concerns for easy testing and extension
Prerequisites
You need to have Ollama installed and running on your system.
# Install Ollama (if not already installed)
curl -fsSL https://ollama.ai/install.sh | sh
# Start the Ollama server
ollama serve
Installation
Install from PyPI:
pip install ollamapy
Or install from source:
git clone https://github.com/ScienceIsVeryCool/OllamaPy.git
cd OllamaPy
pip install .
Quick Start
Simply run the chat interface:
ollamapy
This will start a chat session with the default model (gemma3:4b). If the model isn't available locally, OllamaPy will automatically pull it for you.
Usage Examples
Basic Chat
# Start chat with default model
ollamapy
Custom Model
# Use a specific model
ollamapy --model gemma2:2b
ollamapy -m codellama:7b
Dual Model Setup (Analysis + Chat)
# Use a small, fast model for analysis and a larger model for chat
ollamapy --analysis-model gemma2:2b --model llama3.2:7b
ollamapy -a gemma2:2b -m mistral:7b
# This is great for performance - small model does action selection, large model handles conversation
System Message
# Set context for the AI
ollamapy --system "You are a helpful coding assistant specializing in Python"
ollamapy -s "You are a creative writing partner"
Combined Options
# Use custom models with system message
ollamapy --analysis-model gemma2:2b --model mistral:7b --system "You are a helpful assistant"
Meta-Reasoning System
OllamaPy features a unique meta-reasoning system where the AI analyzes user input and dynamically selects from available actions. The AI examines the intent behind your message and chooses the most appropriate response action.
Dual Model Architecture
For optimal performance, you can use two different models:
- Analysis Model: A smaller, faster model (like
gemma2:2b) for quick action selection - Chat Model: A larger, more capable model (like
llama3.2:7b) for generating responses
This architecture provides the best of both worlds - fast decision-making and high-quality responses.
# Example: Fast analysis with powerful chat
ollamapy --analysis-model gemma2:2b --model llama3.2:7b
Currently Available Actions
- null - Default conversation mode. Used for normal chat when no special action is needed
- fear - Responds to disturbing or delusional content with direct feedback
- fileReader - Reads and displays file contents when user provides a file path
- directoryReader - Explores entire directory contents for project analysis
- getWeather - Provides weather information (accepts optional location parameter)
- getTime - Returns the current date and time (accepts optional timezone parameter)
- square_root - Calculates the square root of a number (requires number parameter)
- calculate - Evaluates basic mathematical expressions (requires expression parameter)
How Meta-Reasoning Works
When you send a message, the AI:
- Analyzes your input to understand intent
- Selects the most appropriate action(s) from all available actions
- Extracts any required parameters from your input
- Executes the chosen action(s) with parameters
- Responds using the action's output as context
Creating Custom Actions
The action system is designed to be easily extensible. Here's a comprehensive guide on creating your own actions:
Basic Action Structure
from ollamapy.actions import register_action
@register_action(
name="action_name",
description="When to use this action",
vibe_test_phrases=["test phrase 1", "test phrase 2"], # Optional
parameters={ # Optional
"param_name": {
"type": "string|number",
"description": "What this parameter is for",
"required": True|False
}
}
)
def action_name(param_name=None):
"""Your action implementation."""
from ollamapy.actions import log
# Log results so the AI can use them as context
log(f"[Action] Result: {some_result}")
# Actions communicate via logging, not return values
Example 1: Simple Action (No Parameters)
from ollamapy.actions import register_action, log
@register_action(
name="joke",
description="Use when the user wants to hear a joke or needs cheering up",
vibe_test_phrases=[
"tell me a joke",
"I need a laugh",
"cheer me up",
"make me smile"
]
)
def joke():
"""Tell a random joke."""
import random
jokes = [
"Why don't scientists trust atoms? Because they make up everything!",
"Why did the scarecrow win an award? He was outstanding in his field!",
"Why don't eggs tell jokes? They'd crack each other up!"
]
selected_joke = random.choice(jokes)
log(f"[Joke] {selected_joke}")
Example 2: Action with Required Parameter
@register_action(
name="convert_temp",
description="Convert temperature between Celsius and Fahrenheit",
vibe_test_phrases=[
"convert 32 fahrenheit to celsius",
"what's 100C in fahrenheit?",
"20 degrees celsius in F"
],
parameters={
"value": {
"type": "number",
"description": "The temperature value to convert",
"required": True
},
"unit": {
"type": "string",
"description": "The unit to convert from (C or F)",
"required": True
}
}
)
def convert_temp(value, unit):
"""Convert temperature between units."""
unit = unit.upper()
if unit == 'C':
# Celsius to Fahrenheit
result = (value * 9/5) + 32
log(f"[Temperature] {value}ยฐC = {result:.1f}ยฐF")
elif unit == 'F':
# Fahrenheit to Celsius
result = (value - 32) * 5/9
log(f"[Temperature] {value}ยฐF = {result:.1f}ยฐC")
else:
log(f"[Temperature] Error: Unknown unit '{unit}'. Use 'C' or 'F'.")
Adding Your Actions to OllamaPy
- Create a new Python file for your actions (e.g.,
my_actions.py) - Import and implement your actions using the patterns above
- Import your actions module before starting OllamaPy
# my_script.py
from ollamapy import chat
import my_actions # This registers your actions
# Now start chat with your custom actions available
chat()
Vibe Tests with Performance Analysis
Vibe tests are a built-in feature that evaluates how consistently AI models interpret human intent and choose appropriate actions. These tests now include comprehensive timing analysis to help you understand both accuracy and performance characteristics.
Running Vibe Tests
# Run vibe tests with default settings
ollamapy --vibetest
# Run with multiple iterations for statistical confidence
ollamapy --vibetest -n 5
# Test a specific model
ollamapy --vibetest --model gemma2:2b -n 3
# Use dual models for testing (analysis + chat)
ollamapy --vibetest --analysis-model gemma2:2b --model llama3.2:7b -n 5
# Extended statistical analysis
ollamapy --vibetest --analysis-model gemma2:2b --model llama3.2:7b -n 10
Understanding Results
Vibe tests evaluate multiple dimensions:
Accuracy Metrics:
- Action Selection: How reliably the AI chooses the correct action
- Parameter Extraction: How accurately the AI extracts required parameters
- Consistency: How stable the AI's decisions are across multiple runs
Performance Metrics:
- Response Time: Average, median, min/max execution times
- Consistency Score: 0-100 score based on timing variability
- Performance Categories: "Very Fast", "Fast", "Moderate", "Slow", "Very Slow"
- Percentile Analysis: 25th, 75th, 95th percentiles for timing distribution
Visual Analytics:
- Interactive HTML Reports: Rich visualizations with timing charts
- Performance Comparison: Speed vs consistency scatter plots
- Per-phrase Analysis: Detailed breakdown for each test phrase
- Quadrant Analysis: Identifies optimal performance zones
Performance Insights
The timing analysis helps you:
- Optimize Model Selection: Choose the best speed/accuracy trade-offs
- Identify Bottlenecks: Find slow or inconsistent actions
- Validate Stability: Ensure consistent performance across runs
- Compare Configurations: Evaluate different model combinations
Example timing output:
Timing Analysis:
Average: 1.23s | Median: 1.15s
Range: 0.89s - 2.11s
Performance: Fast
Consistency: 87.3/100
Tests pass with a 60% or higher success rate, ensuring reasonable consistency in decision-making.
Chat Commands
While chatting, you can use these built-in commands:
quit,exit,bye- End the conversationclear- Clear conversation historyhelp- Show available commandsmodel- Display current models (both chat and analysis)models- List all available modelsactions- Show available actions the AI can choose from
Python API
You can also use OllamaPy programmatically:
from ollamapy import OllamaClient, ModelManager, AnalysisEngine, ChatSession, TerminalInterface
# Create components
client = OllamaClient()
model_manager = ModelManager(client)
analysis_engine = AnalysisEngine("gemma2:2b", client) # Fast analysis model
chat_session = ChatSession("llama3.2:7b", client, "You are a helpful assistant")
# Start a terminal interface
terminal = TerminalInterface(model_manager, analysis_engine, chat_session)
terminal.run()
# Or use components directly
messages = [{"role": "user", "content": "Hello!"}]
for chunk in client.chat_stream("gemma3:4b", messages):
print(chunk, end="", flush=True)
# Execute actions programmatically
from ollamapy import execute_action
execute_action("square_root", {"number": 16})
# Run vibe tests programmatically with timing analysis
from ollamapy import run_vibe_tests
success = run_vibe_tests(
model="llama3.2:7b",
analysis_model="gemma2:2b",
iterations=5
)
Available Classes and Functions
Core Components:
OllamaClient- Low-level API client for OllamaModelManager- Model availability, pulling, and validationAnalysisEngine- AI decision-making and action selectionChatSession- Conversation state and response generationTerminalInterface- Terminal UI and user interaction
Action System:
register_action()- Decorator for creating new actionsexecute_action()- Execute an action with parametersget_available_actions()- Get all registered actionslog()- Log messages from within actions
Testing & Analysis:
VibeTestRunner- Advanced vibe test runner with timing analysisrun_vibe_tests()- Simple function to run vibe testsVibeTestReportGenerator- Generate rich HTML reports with visualizationsTimingStats- Sophisticated timing analysis with consistency scoring
Utilities:
convert_parameter_value()- Convert parameter typesextract_numbers_from_text()- Extract numbers from textprepare_function_parameters()- Prepare parameters for function calls
Configuration
OllamaPy connects to Ollama on http://localhost:11434 by default. If your Ollama instance is running elsewhere:
from ollamapy import OllamaClient
client = OllamaClient(base_url="http://your-ollama-server:11434")
Supported Models
OllamaPy works with any model available in Ollama. Popular options include:
Recommended for Analysis (Fast):
gemma2:2b- Lightweight, excellent for action selectiongemma3:4b- Balanced speed and capabilityllama3.2:3b- Fast and efficient
Recommended for Chat (Quality):
gemma3:4b(default) - Great all-around performancegemma2:9b- Larger model for complex conversationsllama3.2:7b- High-quality responsesmistral:7b- Strong general-purpose modelcodellama:7b- Specialized for coding tasks
Performance Optimization Examples:
# Speed-optimized: Fast analysis + moderate chat
ollamapy --analysis-model gemma2:2b --model gemma3:4b
# Quality-optimized: Moderate analysis + high-quality chat
ollamapy --analysis-model gemma3:4b --model llama3.2:7b
# Balanced: Same capable model for both
ollamapy --model gemma3:4b
To see available models on your system: ollama list
Development
Clone the repository and install in development mode:
git clone https://github.com/ScienceIsVeryCool/OllamaPy.git
cd OllamaPy
pip install -e ".[dev]"
Run tests:
pytest
Run vibe tests with timing analysis:
pytest -m vibetest
Architecture Overview
OllamaPy uses a clean, modular architecture with performance monitoring:
โโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
โ TerminalInterface โ โ AnalysisEngine โ โ ChatSession โ
โ โ โ โ โ โ
โ โข User input โ โ โข Action select โ โ โข Conversation โ
โ โข Commands โ โ โข Parameter โ โ โข Response gen โ
โ โข Display โ โ extraction โ โ โข History โ
โ โข Timing display โ โ โข โฑ๏ธ Timing โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
โ Testing & Analytics โ โ OllamaClient โ
โ โ โ โ
โ โข VibeTestRunner โข TimingStats โ โ โข HTTP API โ
โ โข ReportGenerator โข Consistency โ โ โข Streaming โ
โ โข Performance Analysis | โ โข Low-level โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโ
โ ModelManager โ
โ โ
โ โข Model pull โ
โ โข Availability โ
โ โข Validation โ
โโโโโโโโโโโโโโโโโโโโโ
Each component has a single responsibility and can be tested independently. The timing system is integrated throughout without affecting core functionality.
Troubleshooting
"Ollama server is not running!"
Make sure Ollama is installed and running:
ollama serve
Model not found
OllamaPy will automatically pull models, but you can also pull manually:
ollama pull gemma3:4b
Parameter extraction issues
- Use a more capable analysis model:
ollamapy --analysis-model llama3.2:3b - Ensure your action descriptions clearly indicate what parameters are needed
- Check that your test phrases include the expected parameters
Vibe test failures
- Try different models:
ollamapy --vibetest --model gemma2:9b - Use separate analysis model:
ollamapy --vibetest --analysis-model gemma2:2b - Increase iterations for better statistics:
ollamapy --vibetest -n 10 - Check that your test phrases clearly indicate the intended action
Performance issues
- Use a smaller model for analysis:
--analysis-model gemma2:2b - Check timing reports to identify slow actions
- Ensure sufficient system resources for your chosen models
- Check Ollama server performance with
ollama ps - Review consistency scores in vibe test reports
Slow or inconsistent timing
- Monitor consistency scores in vibe test reports
- Try different model combinations for optimal speed/accuracy
- Check system resources and Ollama server health
- Use timing analysis to identify performance bottlenecks
Project Information
- Version: 0.8.0
- License: GPL-3.0-or-later
- Author: The Lazy Artist
- Python: >=3.8
- Dependencies: requests>=2.25.0, plotly (for reports)
Links
License
This project is licensed under the GPL-3.0-or-later license. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ollamapy-0.8.0.tar.gz.
File metadata
- Download URL: ollamapy-0.8.0.tar.gz
- Upload date:
- Size: 58.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25063b1dffe21c1479b31648a37eeb77f7ce870f6f9ab3233bed869b7bb92a06
|
|
| MD5 |
889ec91d2bf64d3e59b6787f45e45f59
|
|
| BLAKE2b-256 |
32ba2d9b59d359752ce3a54b65d342df14a37b5292c1217500bf571aa3c21e6d
|
File details
Details for the file ollamapy-0.8.0-py3-none-any.whl.
File metadata
- Download URL: ollamapy-0.8.0-py3-none-any.whl
- Upload date:
- Size: 37.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83ac254d5717c3eda1b9bdca6cc00362b0d9669777ca38e8223eec020ba09723
|
|
| MD5 |
e6d7c5d5bd82543f779f92ce3a33482a
|
|
| BLAKE2b-256 |
d92a9f05cdd9b780bc7982564d1e78030105f5cedf946fd6c5bbbb76645b9063
|