inference.sh Python SDK
Project description
inferencesh — python sdk for ai inference api
official python sdk for inference.sh — the ai agent runtime for serverless ai inference.
run ai models, build ai agents, and deploy generative ai applications. access 150+ models including flux, stable diffusion, llms (claude, gpt, gemini), video generation (veo, seedance), and more.
installation
pip install inferencesh
client usage
from inferencesh import inference, TaskStatus
# Create client
client = inference(api_key="your-api-key")
# Simple synchronous usage - waits for completion by default
result = client.tasks.run({
"app": "your-app",
"input": {"key": "value"},
"infra": "cloud",
"variant": "default"
})
print(f"Task ID: {result.get('id')}")
print(f"Output: {result.get('output')}")
with setup parameters
Setup parameters configure the app instance (e.g., model selection). Workers with matching setup are "warm" and skip the setup phase:
result = client.tasks.run({
"app": "your-app",
"setup": {"model": "schnell"}, # Setup parameters
"input": {"prompt": "hello"}
})
run options
# Wait for completion (default behavior)
result = client.tasks.run(params) # wait=True is default
# Return immediately without waiting
task = client.tasks.run(params, wait=False)
task_id = task["id"] # Use this to check status later
# Stream updates as they happen
for update in client.tasks.run(params, stream=True):
print(f"Status: {TaskStatus(update['status']).name}")
if update.get("status") == TaskStatus.COMPLETED:
print(f"Output: {update.get('output')}")
task management
# Get current task state
task = client.tasks.get(task_id)
print(f"Status: {TaskStatus(task['status']).name}")
# Cancel a running task
client.tasks.cancel(task_id)
# Wait for a task to complete
result = client.tasks.wait_for_completion(task_id)
# Stream updates for an existing task
with client.tasks.stream(task_id) as stream:
for update in stream:
print(f"Status: {TaskStatus(update['status']).name}")
if update.get("status") == TaskStatus.COMPLETED:
print(f"Result: {update.get('output')}")
break
# Access final result after streaming
print(f"Final result: {stream.result}")
task status values
from inferencesh import TaskStatus
TaskStatus.RECEIVED # 1 - Task received by server
TaskStatus.QUEUED # 2 - Task queued for processing
TaskStatus.SCHEDULED # 3 - Task scheduled to a worker
TaskStatus.PREPARING # 4 - Worker preparing environment
TaskStatus.SERVING # 5 - Model being loaded
TaskStatus.SETTING_UP # 6 - Task setup in progress
TaskStatus.RUNNING # 7 - Task actively running
TaskStatus.UPLOADING # 8 - Uploading results
TaskStatus.COMPLETED # 9 - Task completed successfully
TaskStatus.FAILED # 10 - Task failed
TaskStatus.CANCELLED # 11 - Task was cancelled
sessions (stateful execution)
Sessions allow you to maintain state across multiple task invocations. The worker stays warm between calls, preserving loaded models and in-memory state.
# Start a new session
result = client.tasks.run({
"app": "my-stateful-app",
"input": {"prompt": "hello"},
"session": "new"
})
session_id = result.get("session_id")
print(f"Session ID: {session_id}")
# Continue the session with another call
result2 = client.tasks.run({
"app": "my-stateful-app",
"input": {"prompt": "remember what I said?"},
"session": session_id
})
custom session timeout
By default, sessions expire after 60 seconds of inactivity. You can customize this with session_timeout (1-3600 seconds):
# Create a session with 5-minute idle timeout
result = client.tasks.run({
"app": "my-stateful-app",
"input": {"prompt": "hello"},
"session": "new",
"session_timeout": 300 # 5 minutes
})
# Session stays alive for 5 minutes after each call
Notes:
session_timeoutis only valid whensession: "new"- Minimum timeout: 1 second
- Maximum timeout: 3600 seconds (1 hour)
- Each successful call resets the idle timer
For complete session documentation including error handling, best practices, and advanced patterns, see the Sessions Developer Guide.
file upload
from inferencesh import UploadFileOptions
# Upload from file path
file_obj = client.files.upload("/path/to/image.png")
print(f"URI: {file_obj['uri']}")
# Upload from bytes
file_obj = client.files.upload(
b"raw bytes data",
UploadFileOptions(
filename="data.bin",
content_type="application/octet-stream"
)
)
# Upload with options
file_obj = client.files.upload(
"/path/to/image.png",
UploadFileOptions(
filename="custom_name.png",
content_type="image/png",
public=True # Make publicly accessible
)
)
Note: Files in task input are automatically uploaded. You only need files.upload() for manual uploads.
agent chat
Chat with AI agents using client.agents.create().
using a template agent
Use an existing agent from your workspace by its namespace/name@shortid:
from inferencesh import inference
client = inference(api_key="your-api-key")
# Create agent from template
agent = client.agents.create("my-org/assistant@abc123")
# Send a message with streaming
def on_message(msg):
content = msg.get("content", [])
for c in content:
if c.get("type") == "text" and c.get("text"):
print(c["text"], end="", flush=True)
response = agent.send_message("Hello!", on_message=on_message)
print(f"\nChat ID: {agent.chat_id}")
creating an ad-hoc agent
Create agents on-the-fly without saving to your workspace:
from inferencesh import inference, AdHocAgentOptions
from inferencesh import tool, string
client = inference(api_key="your-api-key")
# Define a client tool
weather_tool = (
tool("get_weather")
.description("Get current weather")
.params({"city": string("City name")})
.handler(lambda args: '{"temp": 72, "conditions": "sunny"}')
.build()
)
# Create ad-hoc agent
agent = client.agents.create(AdHocAgentOptions(
core_app="infsh/claude-sonnet-4@abc123", # LLM to use
system_prompt="You are a helpful assistant.",
tools=[weather_tool]
))
def on_tool_call(call):
print(f"[Tool: {call.name}]")
# Tools with handlers are auto-executed
response = agent.send_message(
"What's the weather in Paris?",
on_message=on_message,
on_tool_call=on_tool_call
)
agent methods
| Method | Description |
|---|---|
send_message(text, ...) |
Send a message to the agent |
get_chat(chat_id=None) |
Get chat history |
stop_chat(chat_id=None) |
Stop current generation |
submit_tool_result(tool_id, result_or_action) |
Submit result for a client tool (string or {action, form_data}) |
stream_messages(chat_id=None, ...) |
Stream message updates |
stream_chat(chat_id=None, ...) |
Stream chat updates |
reset() |
Start a new conversation |
async agent
from inferencesh import async_inference
client = async_inference(api_key="your-api-key")
agent = client.agents.create("my-org/assistant@abc123")
response = await agent.send_message("Hello!")
async client
from inferencesh import async_inference, TaskStatus
async def main():
client = async_inference(api_key="your-api-key")
# Simple usage - wait for completion
result = await client.tasks.run({
"app": "your-app",
"input": {"key": "value"},
"infra": "cloud",
"variant": "default"
})
print(f"Output: {result.get('output')}")
# Return immediately without waiting
task = await client.tasks.run(params, wait=False)
# Stream updates
async for update in await client.tasks.run(params, stream=True):
print(f"Status: {TaskStatus(update['status']).name}")
if update.get("status") == TaskStatus.COMPLETED:
print(f"Output: {update.get('output')}")
# Task management
task = await client.tasks.get(task_id)
await client.tasks.cancel(task_id)
result = await client.tasks.wait_for_completion(task_id)
# Stream existing task
async with client.tasks.stream(task_id) as stream:
async for update in stream:
print(f"Update: {update}")
file handling
the File class provides a standardized way to handle files in the inference.sh ecosystem:
from infsh import File
# Basic file creation
file = File(path="/path/to/file.png")
# File with explicit metadata
file = File(
path="/path/to/file.png",
content_type="image/png",
filename="custom_name.png",
size=1024 # in bytes
)
# Create from path (automatically populates metadata)
file = File.from_path("/path/to/file.png")
# Check if file exists
exists = file.exists()
# Access file metadata
print(file.content_type) # automatically detected if not specified
print(file.size) # file size in bytes
print(file.filename) # basename of the file
# Refresh metadata (useful if file has changed)
file.refresh_metadata()
the File class automatically handles:
- mime type detection
- file size calculation
- filename extraction from path
- file existence checking
creating an app
to create an inference app, inherit from BaseApp and define your input/output types:
from infsh import BaseApp, BaseAppInput, BaseAppOutput, File
class AppInput(BaseAppInput):
image: str # URL or file path to image
mask: str # URL or file path to mask
class AppOutput(BaseAppOutput):
image: File
class MyApp(BaseApp):
async def setup(self):
# Initialize your model here
pass
async def run(self, app_input: AppInput) -> AppOutput:
# Process input and return output
result_path = "/tmp/result.png"
return AppOutput(image=File(path=result_path))
async def unload(self):
# Clean up resources
pass
app lifecycle has three main methods:
setup(): called when the app starts, use it to initialize modelsrun(): called for each inference requestunload(): called when shutting down, use it to free resources
resources
- documentation — getting started guides and api reference
- blog — tutorials on ai agents, image generation, and more
- app store — browse 150+ ai models
- discord — community support
- github — open source projects
license
MIT © inference.sh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inferencesh-0.6.28.tar.gz.
File metadata
- Download URL: inferencesh-0.6.28.tar.gz
- Upload date:
- Size: 79.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21a4e2f3222552b12848b61be25a2865c59ea7222627c92287923589ff23cb2d
|
|
| MD5 |
86907230800ffac6e40b3741b9f3f392
|
|
| BLAKE2b-256 |
a88b80926ce3a027eb51afa511cd56cf521e87c32b35b4b3748fcb8414fc4767
|
Provenance
The following attestation bundles were made for inferencesh-0.6.28.tar.gz:
Publisher:
publish.yml on inference-sh/sdk-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
inferencesh-0.6.28.tar.gz -
Subject digest:
21a4e2f3222552b12848b61be25a2865c59ea7222627c92287923589ff23cb2d - Sigstore transparency entry: 1214703028
- Sigstore integration time:
-
Permalink:
inference-sh/sdk-py@683e827170611bcc512c6a594f124da1731197db -
Branch / Tag:
refs/tags/v0.6.28 - Owner: https://github.com/inference-sh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@683e827170611bcc512c6a594f124da1731197db -
Trigger Event:
release
-
Statement type:
File details
Details for the file inferencesh-0.6.28-py3-none-any.whl.
File metadata
- Download URL: inferencesh-0.6.28-py3-none-any.whl
- Upload date:
- Size: 68.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b2688f690020a1c1173c2c43f651772831a0925bdf18f910cbd5821ff7f0dbb
|
|
| MD5 |
444544db36e25a5a99c251c7c107601a
|
|
| BLAKE2b-256 |
a477d26e80d609b47643e2f9231975b2a7de0d421bb39d429a27762a958bbcd2
|
Provenance
The following attestation bundles were made for inferencesh-0.6.28-py3-none-any.whl:
Publisher:
publish.yml on inference-sh/sdk-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
inferencesh-0.6.28-py3-none-any.whl -
Subject digest:
2b2688f690020a1c1173c2c43f651772831a0925bdf18f910cbd5821ff7f0dbb - Sigstore transparency entry: 1214703099
- Sigstore integration time:
-
Permalink:
inference-sh/sdk-py@683e827170611bcc512c6a594f124da1731197db -
Branch / Tag:
refs/tags/v0.6.28 - Owner: https://github.com/inference-sh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@683e827170611bcc512c6a594f124da1731197db -
Trigger Event:
release
-
Statement type: