A multi-modal agent framework with unified interfaces for different AI model providers

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Multimodal Agent Framework

A Python framework providing unified interfaces for AI model providers (OpenAI, Claude, Azure) with conversation management, tool calling, and persistent storage.

Features

Unified Interface: Consistent API across OpenAI, Claude, and Azure AI models
Multimodal Support: Text and image inputs with seamless processing
Tool Calling: Automatic function schema generation and execution
Conversation Persistence: Save and restore conversations with multiple storage backends (File, AWS S3)
Agent Handoff: Continue conversations between different AI models
Token Management: Built-in cost tracking and usage monitoring

Installation

pip install multimodal-agent-framework

Configuration

Create a .env file or set environment variables:

# For OpenAI GPT models (GPT-4o, o1, GPT-5, etc.)
OPENAI_API_KEY=your_openai_api_key_here

# For Claude models (Claude 4 Sonnet, Claude 3.5, etc.)
ANTHROPIC_API_KEY=your_anthropic_api_key_here

# For Azure OpenAI services
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=your_azure_api_key_here

# For AWS S3 conversation storage
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AGENT_CONVERSATIONS_BUCKET=your-s3-bucket-name
AGENT_CONVERSATIONS_FOLDER=conversations

Quick Start

from multimodal_agent_framework import MultiModalAgent, OpenAIConnector, get_openai_client

# Create agent
agent = MultiModalAgent(
    connector=OpenAIConnector(get_openai_client()),
    system_prompt="You are a helpful assistant."
)

# Start conversation
response = agent.start_conversation("Hello, how are you?")
print(response)

Conversation Persistence

Save and restore agent conversations with multiple storage backends:

File Storage (Local)

from multimodal_agent_framework.conversation_manager import AgentConversationManager, AgentConversation
from multimodal_agent_framework.conversation_manager.storage import FileStorage

# Create storage and manager
storage = FileStorage(base_path='./conversations')
manager = AgentConversationManager(storage=storage)

# Create and save conversation
conversation = AgentConversation(
    agent_name='my_agent',
    chat_history=[
        {'role': 'user', 'content': 'Hello'},
        {'role': 'assistant', 'content': 'Hi there!'}
    ],
    metadata={'topic': 'greeting'}
)

manager.save_conversation('user123', 'my_agent', conversation, 'chat456')

# Load conversation later
loaded = manager.load_conversation('user123', 'my_agent', 'chat456')

AWS S3 Storage

from multimodal_agent_framework.conversation_manager.storage import S3Storage

# Create S3 storage (requires AWS credentials in environment)
storage = S3Storage(
    bucket_name='my-conversations',
    conversations_folder='agent_chats'
)
manager = AgentConversationManager(storage=storage)

# Same API as file storage
manager.save_conversation('user123', 'my_agent', conversation, 'chat456')

Agent Handoff with Persistence

# Start with OpenAI agent
openai_agent = MultiModalAgent(
    connector=OpenAIConnector(get_openai_client()),
    system_prompt="You are a technical advisor."
)

response, chat_history = openai_agent.execute_user_ask("Explain microservices")

# Save conversation
conversation = AgentConversation(
    agent_name='technical_advisor',
    chat_history=chat_history,
    metadata={'topic': 'microservices'}
)
manager.save_conversation('user123', 'technical_advisor', conversation, 'session1')

# Load and continue with Claude
loaded = manager.load_conversation('user123', 'technical_advisor', 'session1')
claude_agent = MultiModalAgent(
    connector=ClaudeConnector(get_claude_client()),
    system_prompt="You are a code reviewer."
)

# Continue conversation with loaded history
response, updated_history = claude_agent.execute_user_ask(
    "Review the microservices approach",
    chat_history=loaded.chat_history
)

Advanced Features

Tool Calling

Use generate_function_schema to convert Python callables into the tool schema expected by the connectors and pass the resulting list through the tools argument of execute_user_ask.

from multimodal_agent_framework import generate_function_schema

def get_weather(location: str) -> str:
    """Get weather information for a location"""
    return {"text": f"The weather in {location} is sunny and 75°F"}

tools = [generate_function_schema(get_weather)]

response, updated_history = agent.execute_user_ask(
    user_input="What's the weather in New York?",
    tools=tools,
    model="gpt-4o-mini"
)
print(response)

Multimodal Input

# Process image with text
response = agent.add_message(
    text="Describe this image",
    base64_image={"data": base64_image, "img_fmt": "png"}
)

Token Monitoring

agent = MultiModalAgent(
    connector=connector,
    system_prompt="You are a helpful assistant.",
    token_callback=lambda tokens: print(f"Used: {tokens} tokens")
)

Supported Models

OpenAI: GPT-4o, o1/o3 series, GPT-5 series, search-enabled models Claude: Claude 4 Sonnet, Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku Azure: Azure OpenAI models, Azure AI Inference

Development

Setup

git clone <repository-url>
cd multimodalagentframework
pip install -r requirements.txt
pip install -e .[dev]  # Install with development dependencies

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=multimodal_agent_framework

# Run specific test file
pytest tests/test_function_schema_generator.py

Code Quality

# Format code
black .

# Check formatting (run before committing)
black --check .

# Type checking
mypy multimodal_agent_framework/

License

MIT License

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

uditk2

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.10

Oct 7, 2025

0.1.9

Sep 28, 2025

0.1.7

Sep 22, 2025

0.1.6

Sep 16, 2025

0.1.4

Sep 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multimodal_agent_framework-0.1.10.tar.gz (40.7 kB view details)

Uploaded Oct 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

multimodal_agent_framework-0.1.10-py3-none-any.whl (32.7 kB view details)

Uploaded Oct 7, 2025 Python 3

File details

Details for the file multimodal_agent_framework-0.1.10.tar.gz.

File metadata

Download URL: multimodal_agent_framework-0.1.10.tar.gz
Upload date: Oct 7, 2025
Size: 40.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for multimodal_agent_framework-0.1.10.tar.gz
Algorithm	Hash digest
SHA256	`27350f5c73ecf23eac0b0e593ae4c4017c288423e5271fec732b38ae41e3bf9b`
MD5	`8ab7acbf86907ceb11b86a4dc2744350`
BLAKE2b-256	`e69994f5eb03f79e4e477c609cec0e14901bad980620ca18bd7af7b6998c5c16`

See more details on using hashes here.

Provenance

The following attestation bundles were made for multimodal_agent_framework-0.1.10.tar.gz:

Publisher: ci-cd.yml on uditk2/multimodalagentframework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: multimodal_agent_framework-0.1.10.tar.gz
- Subject digest: 27350f5c73ecf23eac0b0e593ae4c4017c288423e5271fec732b38ae41e3bf9b
- Sigstore transparency entry: 588643552
- Sigstore integration time: Oct 7, 2025
Source repository:
- Permalink: uditk2/multimodalagentframework@2fd91f06ba20607c1ba6cf764898c4aab1b86ed0
- Branch / Tag: refs/tags/v0.1.11
- Owner: https://github.com/uditk2
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci-cd.yml@2fd91f06ba20607c1ba6cf764898c4aab1b86ed0
- Trigger Event: release

File details

Details for the file multimodal_agent_framework-0.1.10-py3-none-any.whl.

File metadata

Download URL: multimodal_agent_framework-0.1.10-py3-none-any.whl
Upload date: Oct 7, 2025
Size: 32.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for multimodal_agent_framework-0.1.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a4ea170827d34d340671b7d8d1cdad85759ec3ee6fd44691e21b4547c6bfef3a`
MD5	`37109028517c5c2b61819aef23aa1bed`
BLAKE2b-256	`0f2ba14e962d8aec4f6f08d557ec8ba6efc03307020a84c0e7f31f998faaa14f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for multimodal_agent_framework-0.1.10-py3-none-any.whl:

Publisher: ci-cd.yml on uditk2/multimodalagentframework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: multimodal_agent_framework-0.1.10-py3-none-any.whl
- Subject digest: a4ea170827d34d340671b7d8d1cdad85759ec3ee6fd44691e21b4547c6bfef3a
- Sigstore transparency entry: 588643598
- Sigstore integration time: Oct 7, 2025
Source repository:
- Permalink: uditk2/multimodalagentframework@2fd91f06ba20607c1ba6cf764898c4aab1b86ed0
- Branch / Tag: refs/tags/v0.1.11
- Owner: https://github.com/uditk2
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci-cd.yml@2fd91f06ba20607c1ba6cf764898c4aab1b86ed0
- Trigger Event: release

multimodal-agent-framework 0.1.10

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Multimodal Agent Framework

Features

Installation

Configuration

Quick Start

Conversation Persistence

File Storage (Local)

AWS S3 Storage

Agent Handoff with Persistence

Advanced Features

Tool Calling

Multimodal Input

Token Monitoring

Supported Models

Development

Setup

Running Tests

Code Quality

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance