A unified API for calling multiple LLM providers through a consistent, OpenAI-compatible interface
Project description
Model Calling
A unified API for calling multiple LLM providers through a consistent, OpenAI-compatible interface.
Key Features
- 🔄 OpenAI-compatible API: Uses the familiar chat completions format
- ☎️ Multiple Backends: Support for Ollama, vLLM, OpenAI, Anthropic, Cohere, and more
- 🛠️ Function Calling: Unified support for tools/function calling across models
- 📊 Streaming Support: Efficient streaming for all supported models
- 🔧 Runtime Configuration: Adjust model settings without restarting
- 📦 Importable Library: Can be used as a service or imported library
Installation
pip install model-calling
Quick Example
from model_calling.client import SyncModelCallingClient
client = SyncModelCallingClient()
try:
response = client.chat_completion(
model="ollama/mistral-small3.1:24b", # Use any model from any provider
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"}
]
)
print(response["choices"][0]["message"]["content"])
finally:
client.close()
Supported Providers
| Provider | Prefix | Example Models |
|---|---|---|
| Ollama (local) | ollama/ | mistral-small3.1, llama3, qwen |
| vLLM (cluster) | vllm/ | Any model deployed with vLLM |
| OpenAI | openai/ | gpt-4, gpt-3.5-turbo |
| Anthropic | anthropic/ | claude-3-opus, claude-3-sonnet |
| Cohere | cohere/ | command, command-r |
Function Calling
Model Calling provides a consistent interface for function calling (tools) across all supported providers:
import json
from model_calling.client import SyncModelCallingClient
client = SyncModelCallingClient()
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
}
]
try:
# Initial request with tools
response = client.chat_completion(
model="ollama/mistral-small3.1:24b",
messages=[
{"role": "user", "content": "What's the weather like in Paris?"}
],
tools=tools
)
# Check if function call was requested
message = response["choices"][0]["message"]
if "function_call" in message:
function_name = message["function_call"]["name"]
arguments = json.loads(message["function_call"]["arguments"])
# Call your function with the arguments
weather_data = get_weather(arguments["location"])
# Continue the conversation with the function result
final_response = client.chat_completion(
model="ollama/mistral-small3.1:24b",
messages=[
{"role": "user", "content": "What's the weather like in Paris?"},
{
"role": "assistant",
"content": "",
"function_call": {
"name": "get_weather",
"arguments": json.dumps({"location": "Paris, France"})
}
},
{
"role": "function",
"name": "get_weather",
"content": json.dumps(weather_data)
}
]
)
print(final_response["choices"][0]["message"]["content"])
else:
print(message["content"])
finally:
client.close()
Using as a Service
Model Calling can be run as a service to provide a unified API for all your applications:
# Start the service
python -m model_calling
Then make API calls to the service:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/mistral-small3.1:24b",
"messages": [
{"role": "user", "content": "What is machine learning?"}
]
}'
Using Hosted Providers
To use hosted providers like OpenAI and Anthropic, set your API keys in environment variables or a .env file:
# Create a .env file with your API keys
cp .env.example .env
# Edit .env with your API keys
Then you can use the hosted models:
from model_calling.client import SyncModelCallingClient
client = SyncModelCallingClient()
try:
# OpenAI
response = client.chat_completion(
model="openai/gpt-4",
messages=[
{"role": "user", "content": "What is quantum computing?"}
]
)
# Anthropic
response = client.chat_completion(
model="anthropic/claude-3-sonnet-20240229",
messages=[
{"role": "user", "content": "What is quantum computing?"}
]
)
finally:
client.close()
Documentation
For complete documentation, visit the Model Calling Documentation.
Examples
Check out the examples directory for more examples of how to use Model Calling, including:
- Basic chat completions
- Function calling
- Streaming responses
- Building agents
- Working with different providers
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file model_calling-0.1.0.tar.gz.
File metadata
- Download URL: model_calling-0.1.0.tar.gz
- Upload date:
- Size: 44.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17a07816a5527365008c952fa0177efed543168346de7377b7688f1772f20b8e
|
|
| MD5 |
6d32324bf7b6d670e4d1fa40ef6228b6
|
|
| BLAKE2b-256 |
5c56fa400b35a3b1ac200425e28eab1f1049b26156736b853314fbb821a8ccda
|
File details
Details for the file model_calling-0.1.0-py3-none-any.whl.
File metadata
- Download URL: model_calling-0.1.0-py3-none-any.whl
- Upload date:
- Size: 52.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0838c0152a18b9bf05eee9e2d52fc55445273d075ccf3bd4dcbb44803437ce08
|
|
| MD5 |
bd4853e8c6854e1d08b91ac8f7cdd4a8
|
|
| BLAKE2b-256 |
54e3410a457fcd3a89a9cb9f072195fff6225f151fdbea0f7538b3a229da8ae2
|