Skip to main content

A lightweight and developer-friendly library for building and orchestrating AI agents

Project description

SlimAgents

A lightweight and developer-friendly library for building and orchestrating AI agents.

SlimAgents wraps any LLM (via LiteLLM) with a simple Agent class that handles tool calling, streaming, structured outputs, multi-modal inputs, and agent handoffs — all in under 1200 lines of code.

Install

Requires Python 3.10+

pip install slimagents

Or install the latest development version:

pip install git+https://github.com/aremeis/slimagents.git

Quick start

from slimagents import Agent

def calculator(expression: str) -> str:
    """Evaluate a Python expression."""
    return str(eval(expression))

agent = Agent(
    instructions="You are a helpful assistant. Use the calculator tool for math.",
    tools=[calculator],
)

value = agent.apply("What is 1234 * 5678?")
print(value)  # "1234 * 5678 = 7,006,652."

apply() is a synchronous convenience method that returns just the response value. For async code, call the agent directly:

value = await agent("What is 1234 * 5678?")

Tools

A tool is just a Python function. The function name, docstring, and type annotations are automatically converted to the LLM's tool schema — no decorators or registration needed.

def get_weather(city: str, unit: str = "celsius") -> str:
    """Get the current weather for a city."""
    return f"22 degrees {unit} in {city}"

agent = Agent(tools=[get_weather])

Async tools

Both sync and async tools are supported. When the LLM generates multiple tool calls, async tools run concurrently:

import httpx

async def fetch_url(url: str) -> str:
    """Fetch the content of a URL."""
    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        return response.text

agent = Agent(tools=[fetch_url])

Tools as methods

Tools can be methods on an Agent subclass, which allows you to encapsulate state and logic:

import python_weather
from slimagents import Agent

class WeatherAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful assistant who answers questions about the weather.",
            tools=[self.get_temperature],
        )

    async def get_temperature(self, location: str) -> float:
        """Get the current temperature in a given location, in degrees Celsius."""
        async with python_weather.Client(unit=python_weather.METRIC) as client:
            weather = await client.get(location)
            return weather.temperature

agent = WeatherAgent()
value = agent.apply("What is the temperature difference between London and Paris?")
print(value)
The temperature difference between London and Paris is 1°C, with London being warmer.

Since get_temperature is async, both calls run in parallel when the LLM requests them simultaneously.

LLM support

SlimAgents uses LiteLLM under the hood, so you can use virtually any LLM. The default model is gpt-4.1. Specify any model string that LiteLLM supports:

# OpenAI
agent = Agent(model="gpt-4.1-mini")

# Anthropic
agent = Agent(model="anthropic/claude-sonnet-4-20250514")

# Google Gemini
agent = Agent(model="gemini/gemini-2.5-flash")

# Azure, AWS Bedrock, Ollama, etc. — see LiteLLM docs

Any extra keyword arguments are passed through to LiteLLM:

agent = Agent(model="gpt-4.1", api_key="sk-...", base_url="https://my-proxy.com")

LiteLLM parameters

All LiteLLM-specific parameters are supported via keyword arguments:

# Retry transient errors (429, 500+) with exponential backoff
agent = Agent(num_retries=3)

# Fallback to a different model on context window errors
agent = Agent(
    model="gpt-4.1",
    context_window_fallback_dict={"gpt-4.1": "gpt-4.1-mini"},
)

# Model fallbacks on any failure
agent = Agent(model="gpt-4.1", fallbacks=["anthropic/claude-sonnet-4-20250514"])

Instructions

Instructions become the system message. They can be a string or a callable for dynamic instructions:

# Static instructions
agent = Agent(instructions="You are a helpful assistant.")

# Dynamic instructions via callable
agent = Agent(instructions=lambda: f"Today's date is {date.today()}")

You can also override the instructions property in a subclass for full control:

class StrictAgent(Agent):
    def __init__(self, max_responses: int):
        super().__init__(tools=[self.decrement])
        self._answers_left = max_responses

    @property
    def instructions(self) -> str:
        if self._answers_left > 0:
            return f"You have {self._answers_left} responses left. Call `decrement` before each response."
        return "You always answer 'I can't answer that.'."

    def decrement(self):
        """Call this before every response."""
        self._answers_left -= 1
        return "OK"

Memory

Memory is a list of message dicts in OpenAI chat format. There are two levels:

  • Default memory (agent.memory): always included in every call, set at construction or via the property.
  • Per-call memory: passed to run() / apply() and tracks the conversation for that call.
agent = Agent(instructions="You are a helpful assistant.")

# Maintain a conversation across multiple calls
memory = []
agent.apply("My name is Alice.", memory=memory)
value = agent.apply("What's my name?", memory=memory)
print(value)  # "Your name is Alice."

Use memory_delta to capture only the new messages added during a call:

delta = []
agent.apply("Hello!", memory=memory, memory_delta=delta)
print(len(delta))  # Number of new messages (user message + assistant response + any tool calls)

Handoffs

A tool can return an Agent instance to transfer control to a different agent. The new agent inherits the conversation memory:

sales_agent = Agent(
    name="Sales Agent",
    instructions="You are a sales agent. Help the customer with purchases.",
)

support_agent = Agent(
    name="Support Agent",
    instructions="You are a support agent. Help with technical issues.",
)

def transfer_to_sales():
    """Transfer the customer to the sales team."""
    return sales_agent

def transfer_to_support():
    """Transfer the customer to the support team."""
    return support_agent

triage = Agent(
    name="Triage",
    instructions="Route the customer to the right department.",
    tools=[transfer_to_sales, transfer_to_support],
)

response = triage.run_sync("I want to buy a new laptop.")
print(response.agent.name)  # "Sales Agent"

Nested agent calls (non-handoff)

If you want an agent to process a sub-task and return the result as a tool output (without transferring control), use ToolResult:

from slimagents import ToolResult

researcher = Agent(instructions="You are a research assistant.")

def research(topic: str):
    """Research a topic using a specialized agent."""
    return ToolResult(agent=researcher, handoff=False)

agent = Agent(tools=[research])
# The researcher processes the topic, and its response becomes the tool result.
# Control stays with `agent`.

Structured outputs

Use response_format to get typed responses instead of plain strings.

Pydantic models

from pydantic import BaseModel
from slimagents import Agent

class MovieReview(BaseModel):
    title: str
    rating: float
    summary: str

agent = Agent[MovieReview](
    instructions="You are a movie critic.",
    response_format=MovieReview,
)

review = agent.apply("Review The Matrix")
print(review.title)    # "The Matrix"
print(review.rating)   # 9.0
print(review.summary)  # "A groundbreaking sci-fi film..."

JSON mode

Pass dict to get a parsed JSON dictionary:

agent = Agent[dict](response_format=dict)
data = agent.apply("Return a JSON object with fields: name, age")
print(data["name"])  # str

Primitive types

You can also use int, float, bool, or list as the response format:

agent = Agent[int](response_format=int)
count = agent.apply("How many continents are there?")
print(count)  # 7 (int, not str)

Multi-modal inputs

Pass file-like objects, bytes, FileContent, or URLs alongside text. The agent handles base64 encoding and MIME type detection automatically:

from slimagents import Agent

agent = Agent(
    model="gemini/gemini-2.0-flash",
    instructions="Describe the contents of the provided files.",
)

# File object
with open("photo.jpg", "rb") as f:
    description = agent.apply("What's in this image?", f)

# Multiple inputs
with open("report.pdf", "rb") as pdf:
    summary = agent.apply("Summarize this document", pdf)

For programmatic file content, use FileContent:

from slimagents.core import FileContent

content = FileContent(
    content=image_bytes,
    filename="chart.png",
    mime_type="image/png",
)
description = agent.apply("Describe this chart", content)

Streaming

Enable streaming to receive tokens as they arrive:

response = await agent.run("Tell me a story", stream=True)

async for chunk in response:
    if isinstance(chunk, str):
        print(chunk, end="", flush=True)

Fine-tune what gets streamed:

response = await agent.run(
    "Tell me a story",
    stream=True,
    stream_tokens=True,        # Yield individual tokens as strings (default: True)
    stream_delimiters=True,    # Yield MessageDelimiter events for message boundaries
    stream_tool_calls=True,    # Yield tool call deltas as they arrive
    stream_response=True,      # Yield the final Response object at the end of the stream
)

When stream_response=True, the final item in the stream is a Response object:

from slimagents import Response

async for chunk in response:
    if isinstance(chunk, Response):
        print(f"\nTokens used: {chunk.metadata.total_tokens}")
    elif isinstance(chunk, str):
        print(chunk, end="")

The Response object

run() and run_sync() return a Response[T] with:

response = agent.run_sync("Hello!")

response.value          # The response content (str, dict, or BaseModel depending on response_format)
response.memory_delta   # List of messages added during this call
response.agent          # The agent that produced the response (may differ from original if handoff occurred)
response.metadata       # ResponseMetadata with token counts and cost

ResponseMetadata tracks usage across all turns:

meta = response.metadata
meta.input_tokens       # Total input tokens
meta.output_tokens      # Total output tokens
meta.total_tokens       # Total tokens
meta.cost               # Total cost (USD)

Interactive CLI

Use run_demo_loop to quickly test an agent in your terminal:

from slimagents import Agent, run_demo_loop

agent = Agent(instructions="You are a helpful assistant.")
run_demo_loop(agent, stream=True)
Starting SlimAgents CLI 🪶
User: Hello!
Agent: Hi there! How can I help you today?
User:

Logging

SlimAgents uses Python's standard logging module:

import logging
from slimagents import logger

logging.basicConfig(level=logging.INFO)
logger.setLevel(logging.DEBUG)  # Verbose agent logs

Hooks

Agent exposes a set of async lifecycle hooks you can override in a subclass to extend behavior — tracing, RAG injection, guardrails, semantic validation, tool authorization, graceful error recovery, and more. All hooks default to a no-op so overriding is opt-in.

Hook When it fires Return
_on_run_start(run_id, memory, memory_delta) Once at the start of a run
_before_chat_completion(messages) Just before every LLM call (possibly modified) messages list
_after_chat_completion(message, completion) After every LLM call (possibly modified) assistant message
_before_tool_call(tool_call) Before each tool executes None to proceed, or a ToolResult to short-circuit
_after_tool_call(tool_call, result) After each tool returns (possibly modified) ToolResult
_on_tool_error(tool_call, exc) Tool argument parsing or execution raised ToolResult to recover (default re-raises)
_on_handoff(from_agent, to_agent) Control is transferred to another agent
_validate_response(value, message) Final response has been parsed against response_format None to accept, or an error string to retry
_on_run_end(response) Once with the final Response before it's returned (possibly modified) Response

Hooks fire for both streaming and non-streaming runs, and for recursive sub-agent calls (via ToolResult(agent=..., handoff=False)).

Injecting context before every call

class RagAgent(Agent):
    async def _before_chat_completion(self, messages):
        snippets = await retrieve_for(messages[-1]["content"])
        return messages + [{"role": "system", "content": f"Context:\n{snippets}"}]

Validating tool arguments / mocking tools

class GuardedAgent(Agent):
    async def _before_tool_call(self, tool_call):
        if tool_call["function"]["name"] == "delete_user" and not user_is_admin():
            return ToolResult(value="Permission denied.")
        return None

Recovering from tool errors

By default a tool raising an exception aborts the run. Override _on_tool_error to convert the exception into a tool result and let the loop continue:

class ResilientAgent(Agent):
    async def _on_tool_error(self, tool_call, exc):
        return ToolResult(value=f"Tool failed: {exc}. Try a different approach.")

Semantic response validation with retry

_validate_response lets you reject a response that parses correctly but is semantically wrong, and prompt the model to try again. It shares the response_format_retries budget with the built-in JSON-parse retry.

When response_format is set, value is the parsed object (Pydantic model / dict / primitive). Otherwise it's the raw string content.

from pydantic import BaseModel
from slimagents import Agent

class Score(BaseModel):
    value: float
    reason: str

class StrictAgent(Agent):
    async def _validate_response(self, value: Score, message):
        if not 0.0 <= value.value <= 1.0:
            return f"value must be in [0, 1], got {value.value}"
        if len(value.reason) < 10:
            return "reason must be at least 10 characters"
        return None  # accept

agent = StrictAgent(
    instructions="Score the input from 0 to 1 and justify briefly.",
    response_format=Score,
    response_format_retries=3,  # up to 3 retries shared with JSON-parse retries
)
score = await agent("Is the sky blue?")

On a returned error string, slimagents appends a corrective user message to the conversation and lets the model try again — same mechanism as the built-in JSON-parse retry, but driven by your own rules.

Origin

SlimAgents started as a fork of OpenAI's Swarm framework. Major differences:

  • Works with any LLM (not just OpenAI)
  • Designed for subclassing Agent to encapsulate behavior
  • Async-native with concurrent tool execution
  • Multi-modal input support
  • Structured outputs with Pydantic
  • Proper Python logging

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slimagents-0.9.0.tar.gz (37.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slimagents-0.9.0-py3-none-any.whl (43.1 kB view details)

Uploaded Python 3

File details

Details for the file slimagents-0.9.0.tar.gz.

File metadata

  • Download URL: slimagents-0.9.0.tar.gz
  • Upload date:
  • Size: 37.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.3

File hashes

Hashes for slimagents-0.9.0.tar.gz
Algorithm Hash digest
SHA256 f84f832d8baaa6577bdd236ff69552e28abcd502f4fb906c0474e50e745d406d
MD5 6150924963857f7465a56a900d7d498e
BLAKE2b-256 23769b470f677f9c2cfaa78b0d51b41a5aae662f7cae8955af108b5f0b2b4536

See more details on using hashes here.

File details

Details for the file slimagents-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: slimagents-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 43.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.3

File hashes

Hashes for slimagents-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 071e1caef08741e826a6bf066d2e284649f967a9d1f61c89a7314d47d6d3c780
MD5 f7672578886cecd862b036456cc4d39a
BLAKE2b-256 da22f9a7607767d19544511c9cd1a2aba2f7a453b010552ea81d1bfbe69f9861

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page