Skip to main content

Python SDK for Inference Provider API - Build powerful AI agents with RAG, tool calling, and MCP integration

Project description

Inference Provider SDK (Python)

Python SDK for Inference Provider V2 API. Build powerful AI agents with RAG, tool calling, and MCP integration.

Installation

pip install inference-provider-sdk

Or with poetry:

poetry add inference-provider-sdk

Quick Start

from inference_provider import InferenceProviderClient

# Initialize client
client = InferenceProviderClient(
    api_key="ip_xxxxxxxxxx",
    api_secret="xxxxxxxxxxxxxx"
)

# Run agent inference
response = client.agents.run(
    agent_id="your-agent-id",
    user_message="Hello, world!"
)

print(response.response)
print(f"Cost: ${response.usage.cost}")
print(f"Tokens: {response.usage.total_tokens}")

Features

  • ✅ Full Python type hints with Pydantic
  • ✅ Both sync and async support
  • ✅ Agent inference with conversation history
  • ✅ RAG (Retrieval-Augmented Generation)
  • ✅ Tool calling (REST API, JavaScript, MCP)
  • ✅ MCP server integration
  • ✅ Provider and model management
  • ✅ Custom response formatting
  • ✅ Automatic retry with exponential backoff
  • ✅ Rate limit handling
  • ✅ Context manager support

Authentication

Environment Variables

export INFERENCE_API_KEY=ip_xxxxxxxxxx
export INFERENCE_API_SECRET=xxxxxxxxxxxxxx
# Auto-loads from environment
client = InferenceProviderClient()

Explicit Configuration

client = InferenceProviderClient(
    api_key="ip_xxxxxxxxxx",
    api_secret="xxxxxxxxxxxxxx",
    base_url="https://your-instance.supabase.co",  # Optional
    timeout=60,  # Optional, default 60s
    max_retries=3,  # Optional, default 3
    debug=False  # Optional, default False
)

Usage Examples

Basic Agent Inference

from inference_provider import InferenceProviderClient

client = InferenceProviderClient()

response = client.agents.run(
    agent_id="agent-id",
    user_message="What is the weather today?"
)

print(response.response)

With Conversation History

from inference_provider.types import ConversationMessage

history = [
    ConversationMessage(role="user", content="My name is Alice"),
    ConversationMessage(role="assistant", content="Nice to meet you, Alice!")
]

response = client.agents.chat(
    agent_id="agent-id",
    message="What is my name?",
    history=history
)

print(response.response)

With RAG

response = client.agents.run_with_rag(
    agent_id="agent-id",
    message="Tell me about our product features",
    collection_id="collection-id",
    match_threshold=0.8,
    match_count=5
)

if response.rag:
    print(f"Found {response.rag.results_count} relevant documents")
    for result in response.rag.results:
        print(f"Similarity: {result.similarity}")
        print(f"Content: {result.content}")

With Vision (Image Inputs)

from inference_provider.types import ImageInput

response = client.agents.run_with_vision(
    agent_id="agent-id",
    message="What do you see in this image?",
    images=[
        ImageInput(
            type="image_url",
            image_url={"url": "data:image/jpeg;base64,/9j/4AAQSkZJRg..."}
        )
    ]
)

print(response.response)

With Variable Substitution

response = client.agents.run(
    agent_id="agent-id",
    user_message="Process this request",
    variables={
        "user_name": "Alice",
        "company_name": "Acme Corp"
    }
)

Agent Management

Create Agent

from inference_provider.types import Variable, VariableType

agent = client.agents.create(
    name="Customer Support Agent",
    description="Handles customer inquiries",
    system_prompt="You are a helpful customer support agent for {{company_name}}",
    model_name="gpt-4",
    temperature=0.7,
    max_tokens=2000,
    variables=[
        Variable(
            name="company_name",
            type=VariableType.TEXT,
            description="Company name",
            default_value="Acme Corp"
        )
    ],
    tags=["customer-support", "production"]
)

print(f"Created agent: {agent.id}")

List Agents

agents = client.agents.list()
active_agents = client.agents.list(is_active=True)

for agent in agents:
    print(f"{agent.name} ({agent.id})")

Update Agent

updated = client.agents.update(
    agent_id="agent-id",
    temperature=0.8,
    system_prompt="Updated prompt"
)

Delete Agent

client.agents.delete("agent-id")

Async Support

import asyncio
from inference_provider import AsyncInferenceProviderClient

async def main():
    async with AsyncInferenceProviderClient() as client:
        response = await client.agents.run(
            agent_id="agent-id",
            user_message="Hello, world!"
        )
        print(response.response)

asyncio.run(main())

Error Handling

from inference_provider import (
    InferenceProviderClient,
    AuthenticationError,
    ValidationError,
    NotFoundError,
    RateLimitError,
    APIError,
    NetworkError
)

try:
    response = client.agents.run(
        agent_id="agent-id",
        user_message="Hello"
    )
except AuthenticationError as e:
    print(f"Invalid credentials: {e}")
except ValidationError as e:
    print(f"Invalid input: {e.message} (field: {e.field})")
except NotFoundError as e:
    print(f"Resource not found: {e}")
except RateLimitError as e:
    print(f"Rate limit exceeded: {e.message}")
    print(f"Reset time: {e.reset_time}")
except APIError as e:
    print(f"API error: {e.status_code} - {e.message}")
except NetworkError as e:
    print(f"Network error: {e}")

Context Manager

# Sync
with InferenceProviderClient() as client:
    response = client.agents.run(
        agent_id="agent-id",
        user_message="Hello"
    )

# Async
async with AsyncInferenceProviderClient() as client:
    response = await client.agents.run(
        agent_id="agent-id",
        user_message="Hello"
    )

Utility Functions

Text Chunking for RAG

from inference_provider.utils import chunk_text

text = "Long document text..."
chunks = chunk_text(text, chunk_size=500, chunk_overlap=50)

for i, chunk in enumerate(chunks):
    print(f"Chunk {i + 1}: {chunk[:100]}...")

Variable Substitution

from inference_provider.utils import substitute_variables, extract_variable_names

template = "Hello {{name}}, welcome to {{company}}!"
variables = {"name": "Alice", "company": "Acme Corp"}

result = substitute_variables(template, variables)
# => "Hello Alice, welcome to Acme Corp!"

names = extract_variable_names(template)
# => ["name", "company"]

Type Safety

The SDK provides comprehensive type hints:

from inference_provider.types import (
    Agent,
    AgentInferenceResponse,
    AIProvider,
    AIModel,
    ToolDefinition,
    DocumentCollection,
    MCPServer
)

# Full type safety with IDE auto-completion
response: AgentInferenceResponse = client.agents.run(
    agent_id="agent-id",
    user_message="Hello"
)

print(response.usage.total_tokens)
print(response.agent.name)

Development

# Install dependencies
poetry install

# Run tests
pytest

# Run tests with coverage
pytest --cov=inference_provider

# Format code
black inference_provider tests

# Lint
ruff inference_provider tests

# Type check
mypy inference_provider

License

MIT

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inference_provider_sdk-1.0.0.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inference_provider_sdk-1.0.0-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file inference_provider_sdk-1.0.0.tar.gz.

File metadata

  • Download URL: inference_provider_sdk-1.0.0.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for inference_provider_sdk-1.0.0.tar.gz
Algorithm Hash digest
SHA256 47c8e229a66fd328b71a70501f72b354c3ad5aa1e8456408ddc3cfbbf00552bc
MD5 647778f3a55d5616b231cb5263074526
BLAKE2b-256 1e142b25b4089a7ad6d18549d96f8c7391310bba05b8966fdddae65cfb541e66

See more details on using hashes here.

File details

Details for the file inference_provider_sdk-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for inference_provider_sdk-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 48becd3eaaa1a5ca752bf783c9fe3b1942b8bdc8f044beda9b696be05c28a41a
MD5 d1c69aeb5981abd302d9434c24a9dc67
BLAKE2b-256 17dd3110b75f1ee51878eb482560869c17c6c48c4332d13cf15bac015be32244

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page