Production-ready AI agents with tool calling, structured output, execution traces, and RAG. Provider-agnostic (OpenAI, Anthropic, Gemini, Ollama) with fallback chains, batch processing, tool policies, streaming, caching, and cost tracking.
Project description
Selectools
Production-ready AI agents with tool calling, RAG, and hybrid search. Connect LLMs to your Python functions, embed and search your documents with vector + keyword fusion, stream responses in real time, and dynamically manage tools at runtime. Works with OpenAI, Anthropic, Gemini, and Ollama. Tracks costs automatically.
What's New in v0.14.1
Critical streaming fix — All streaming methods across every provider were silently dropping tool definitions:
- 13 bugs fixed:
stream()andastream()in OpenAI, Anthropic, Gemini, Ollama, and FallbackProvider now correctly passtoolsto the API and yieldToolCallobjects - Agents using
run(stream=True),arun(stream=True), orastream()can now use tools (previously broken across all providers) - Ollama
_format_messages()now correctly handles tool role and assistant tool_calls in multi-turn conversations FallbackProvider.astream()now has proper error handling, failover, and circuit breaker support
Test suite massively expanded — 141 new tests (total: 1100):
- Regression tests for every prior bug fix to prevent reintroduction
- Recording-provider tests that verify exact arguments passed to streaming methods
- Unit tests for 6 modules that previously only had E2E coverage (policy, structured, trace, fallback, format_messages, batch)
Full changelog: CHANGELOG.md
v0.14.0 highlights
- AgentObserver Protocol — 15 lifecycle events with
run_id/call_idcorrelation for Langfuse, Datadog, OpenTelemetry - Model Registry Update — 145 models with March 2026 pricing (GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Pro)
- OpenAI
max_tokens→max_completion_tokensauto-detection for GPT-5.x, o-series - 11 additional bug fixes for structured output, policy bypass, memory trimming, async timeouts, and more
Why Selectools
| Capability | What You Get |
|---|---|
| Provider Agnostic | Switch between OpenAI, Anthropic, Gemini, Ollama with one line. Your tools stay identical. |
| Structured Output | Pydantic or JSON Schema response_format with auto-retry on validation failure. |
| Execution Traces | Every run() returns result.trace — structured timeline of LLM calls, tool picks, and executions. |
| Reasoning Visibility | result.reasoning surfaces why the agent chose a tool, extracted from LLM responses. |
| Provider Fallback | FallbackProvider tries providers in priority order with circuit breaker on failure. |
| Batch Processing | agent.batch() / agent.abatch() for concurrent multi-prompt classification. |
| Tool Policy Engine | Declarative allow/review/deny rules with glob patterns. Human-in-the-loop approval callbacks. |
| Hybrid Search | BM25 keyword + vector semantic search with RRF/weighted fusion and cross-encoder reranking. |
| Advanced Chunking | Fixed, recursive, semantic (embedding-based), and contextual (LLM-enriched) chunking strategies. |
| E2E Streaming | Token-level astream() with native tool call support. Parallel tool execution via asyncio.gather. |
| Dynamic Tools | Load tools from files/directories at runtime. Add, remove, replace tools without restarting. |
| Response Caching | LRU + TTL in-memory cache and Redis backend. Avoid redundant LLM calls for identical requests. |
| Routing Mode | Agent selects a tool without executing it. Use for intent classification and request routing. |
| AgentObserver Protocol | 15-event lifecycle observer with run_id/call_id correlation. Built-in LoggingObserver for structured JSON logs. |
| Production Hardened | Retries with backoff, per-tool timeouts, iteration caps, cost warnings, observability hooks + observers. |
| Library-First | Not a framework. No magic globals, no hidden state. Use as much or as little as you need. |
What's Included
- 5 LLM Providers: OpenAI, Anthropic, Gemini, Ollama + FallbackProvider (auto-failover)
- Structured Output: Pydantic / JSON Schema
response_formatwith auto-retry - Execution Traces:
result.tracewith typed timeline of every agent step - Reasoning Visibility:
result.reasoningexplains why the agent chose a tool - Batch Processing:
agent.batch()/agent.abatch()for concurrent classification - Tool Policy Engine: Declarative allow/review/deny rules with human-in-the-loop
- 4 Embedding Providers: OpenAI, Anthropic/Voyage, Gemini (free!), Cohere
- 4 Vector Stores: In-memory, SQLite, Chroma, Pinecone
- Hybrid Search: BM25 + vector fusion with Cohere/Jina reranking
- Advanced Chunking: Semantic + contextual chunking for better retrieval
- Dynamic Tool Loading: Plugin system with hot-reload support
- Response Caching: InMemoryCache and RedisCache with stats tracking
- 145 Model Registry: Type-safe constants with pricing and metadata
- Pre-built Toolbox: 22 tools for files, data, text, datetime, web
- 28 Examples: RAG, hybrid search, streaming, structured output, traces, batch, policy, observer, and more
- AgentObserver Protocol: 15 lifecycle events with
run_idcorrelation,LoggingObserver, OTel export - 938+ Tests: Unit, integration, and E2E with real API calls
Install
pip install selectools # Core + basic RAG
pip install selectools[rag] # + Chroma, Pinecone, Voyage, Cohere, PyPDF
pip install selectools[cache] # + Redis cache
pip install selectools[rag,cache] # Everything
Set your API key:
export OPENAI_API_KEY="sk-..."
Quick Start
New to Selectools? Follow the 5-minute Quickstart tutorial — no API key needed.
Tool Calling Agent (No API Key)
from selectools import Agent, AgentConfig, tool
from selectools.providers.stubs import LocalProvider
@tool(description="Look up the price of a product")
def get_price(product: str) -> str:
prices = {"laptop": "$999", "phone": "$699", "headphones": "$149"}
return prices.get(product.lower(), f"No price found for {product}")
agent = Agent(
tools=[get_price],
provider=LocalProvider(),
config=AgentConfig(max_iterations=3),
)
result = agent.ask("How much is a laptop?")
print(result.content)
Tool Calling Agent (OpenAI)
from selectools import Agent, AgentConfig, OpenAIProvider, tool
from selectools.models import OpenAI
@tool(description="Search the web for information")
def search(query: str) -> str:
return f"Results for: {query}"
agent = Agent(
tools=[search],
provider=OpenAIProvider(default_model=OpenAI.GPT_4O_MINI.id),
config=AgentConfig(max_iterations=5),
)
result = agent.ask("Search for Python tutorials")
print(result.content)
RAG Agent
from selectools import OpenAIProvider
from selectools.embeddings import OpenAIEmbeddingProvider
from selectools.models import OpenAI
from selectools.rag import RAGAgent, VectorStore
embedder = OpenAIEmbeddingProvider(model=OpenAI.Embeddings.TEXT_EMBEDDING_3_SMALL.id)
store = VectorStore.create("memory", embedder=embedder)
agent = RAGAgent.from_directory(
directory="./docs",
provider=OpenAIProvider(default_model=OpenAI.GPT_4O_MINI.id),
vector_store=store,
chunk_size=500, top_k=3,
)
result = agent.ask("What are the main features?")
print(result.content)
print(agent.get_usage_summary()) # LLM + embedding costs
Hybrid Search (Keyword + Semantic)
from selectools.rag import BM25, HybridSearcher, FusionMethod, HybridSearchTool, VectorStore
store = VectorStore.create("memory", embedder=embedder)
store.add_documents(chunked_docs)
searcher = HybridSearcher(
vector_store=store,
vector_weight=0.6,
keyword_weight=0.4,
fusion=FusionMethod.RRF,
)
searcher.add_documents(chunked_docs)
# Use with agent
hybrid_tool = HybridSearchTool(searcher=searcher, top_k=5)
agent = Agent(tools=[hybrid_tool.search_knowledge_base], provider=provider)
Streaming with Parallel Tools
import asyncio
from selectools import Agent, AgentConfig
from selectools.types import StreamChunk, AgentResult
agent = Agent(
tools=[tool_a, tool_b, tool_c],
provider=provider,
config=AgentConfig(parallel_tool_execution=True), # Default: enabled
)
async for item in agent.astream("Run all tasks"):
if isinstance(item, StreamChunk):
print(item.content, end="", flush=True)
elif isinstance(item, AgentResult):
print(f"\nDone in {item.iterations} iterations")
Key Features
Hybrid Search & Reranking
Combine semantic search with BM25 keyword matching for better recall on exact terms, names, and acronyms:
from selectools.rag import BM25, HybridSearcher, CohereReranker, FusionMethod
searcher = HybridSearcher(
vector_store=store,
fusion=FusionMethod.RRF,
reranker=CohereReranker(), # Optional cross-encoder reranking
)
results = searcher.search("GDPR compliance", top_k=5)
See docs/modules/HYBRID_SEARCH.md for full documentation.
Advanced Chunking
Go beyond fixed-size splitting with embedding-aware and LLM-enriched chunking:
from selectools.rag import SemanticChunker, ContextualChunker
# Split at topic boundaries using embedding similarity
semantic = SemanticChunker(embedder=embedder, similarity_threshold=0.75)
# Enrich each chunk with LLM-generated context (Anthropic-style contextual retrieval)
contextual = ContextualChunker(base_chunker=semantic, provider=provider)
enriched_docs = contextual.split_documents(documents)
See docs/modules/ADVANCED_CHUNKING.md for full documentation.
Dynamic Tool Loading
Discover and load @tool functions from files and directories at runtime:
from selectools.tools import ToolLoader
# Load tools from a plugin directory
tools = ToolLoader.from_directory("./plugins", recursive=True)
agent.add_tools(tools)
# Hot-reload after editing a plugin
updated = ToolLoader.reload_file("./plugins/search.py")
agent.replace_tool(updated[0])
# Remove tools the agent no longer needs
agent.remove_tool("deprecated_search")
See docs/modules/DYNAMIC_TOOLS.md for full documentation.
Response Caching
Avoid redundant LLM calls with pluggable caching:
from selectools import Agent, AgentConfig, InMemoryCache
cache = InMemoryCache(max_size=1000, default_ttl=300)
agent = Agent(
tools=[...],
provider=provider,
config=AgentConfig(cache=cache),
)
# Same question twice -> second call is instant (cache hit)
agent.ask("What is Python?")
agent.reset()
agent.ask("What is Python?")
print(cache.stats) # CacheStats(hits=1, misses=1, hit_rate=50.00%)
For distributed setups: from selectools.cache_redis import RedisCache
Routing Mode
Agent selects a tool without executing it -- use for intent classification:
config = AgentConfig(routing_only=True)
agent = Agent(tools=[send_email, schedule_meeting, search_kb], provider=provider, config=config)
result = agent.ask("Book a meeting with Alice tomorrow")
print(result.tool_name) # "schedule_meeting"
print(result.tool_args) # {"attendee": "Alice", "date": "tomorrow"}
Structured Output
Get typed, validated results from the LLM:
from pydantic import BaseModel
from typing import Literal
class Classification(BaseModel):
intent: Literal["billing", "support", "sales", "cancel"]
confidence: float
priority: Literal["low", "medium", "high"]
result = agent.ask("I want to cancel my account", response_format=Classification)
print(result.parsed) # Classification(intent="cancel", confidence=0.95, priority="high")
Auto-retries with error feedback when validation fails.
Execution Traces & Reasoning
See exactly what your agent did and why:
result = agent.run("Classify this ticket")
# Structured timeline of every step
for step in result.trace:
print(f"{step.type} | {step.duration_ms:.0f}ms | {step.summary}")
# Why the agent chose a tool
print(result.reasoning) # "Customer is asking about billing, routing to billing_support"
# Export for dashboards
result.trace.to_json("trace.json")
Provider Fallback
Automatic failover with circuit breaker:
from selectools import FallbackProvider, OpenAIProvider, AnthropicProvider
provider = FallbackProvider([
OpenAIProvider(default_model="gpt-4o-mini"),
AnthropicProvider(default_model="claude-haiku"),
])
agent = Agent(tools=[...], provider=provider)
# If OpenAI is down → tries Anthropic automatically
Batch Processing
Classify multiple requests concurrently:
results = await agent.abatch(
["Cancel my subscription", "How do I upgrade?", "My payment failed"],
max_concurrency=10,
)
Tool Policy & Human-in-the-Loop
Declarative safety rules with approval callbacks:
from selectools import ToolPolicy
policy = ToolPolicy(
allow=["search_*", "read_*"],
review=["send_*", "create_*"],
deny=["delete_*"],
)
async def confirm(tool_name, tool_args, reason):
return await get_user_approval(tool_name, tool_args)
config = AgentConfig(tool_policy=policy, confirm_action=confirm)
AgentObserver Protocol
Class-based observability with run_id correlation for Langfuse, OpenTelemetry, Datadog, or custom integrations:
from selectools import Agent, AgentConfig, AgentObserver, LoggingObserver
class MyObserver(AgentObserver):
def on_tool_end(self, run_id, call_id, tool_name, result, duration_ms):
print(f"[{run_id}] {tool_name} finished in {duration_ms:.1f}ms")
def on_provider_fallback(self, run_id, failed_provider, next_provider, error):
print(f"[{run_id}] {failed_provider} failed, falling back to {next_provider}")
agent = Agent(
tools=[...], provider=provider,
config=AgentConfig(observers=[MyObserver(), LoggingObserver()]),
)
15 lifecycle events: run, LLM, tool, iteration, batch, policy, structured output, fallback, retry, memory trim. See observer.py for full reference.
E2E Streaming & Parallel Execution
agent.astream()yieldsStreamChunk(text deltas) thenAgentResult(final)- Multiple tool calls execute concurrently via
asyncio.gather()(3 tools @ 0.15s each = ~0.15s total) - Fallback chain:
astream->acomplete->completevia executor - Context propagation with
contextvarsfor tracing/auth
See docs/modules/STREAMING.md for full documentation.
Providers
| Provider | Streaming | Vision | Native Tools | Cost |
|---|---|---|---|---|
| OpenAI | Yes | Yes | Yes | Paid |
| Anthropic | Yes | Yes | Yes | Paid |
| Gemini | Yes | Yes | Yes | Free tier |
| Ollama | Yes | No | No | Free (local) |
| Fallback | Yes | Yes | Yes | Varies (wraps others) |
| Local | No | No | No | Free (testing) |
from selectools.models import OpenAI, Anthropic, Gemini, Ollama
# IDE autocomplete for all 145 models with pricing metadata
model = OpenAI.GPT_4O_MINI
print(f"Cost: ${model.prompt_cost}/${model.completion_cost} per 1M tokens")
print(f"Context: {model.context_window:,} tokens")
Embedding Providers
from selectools.embeddings import (
OpenAIEmbeddingProvider, # text-embedding-3-small/large
AnthropicEmbeddingProvider, # Voyage AI (voyage-3, voyage-3-lite)
GeminiEmbeddingProvider, # FREE (text-embedding-001/004)
CohereEmbeddingProvider, # embed-english-v3.0
)
Vector Stores
from selectools.rag import VectorStore
store = VectorStore.create("memory", embedder=embedder) # Fast, no persistence
store = VectorStore.create("sqlite", embedder=embedder, db_path="docs.db") # Persistent
store = VectorStore.create("chroma", embedder=embedder, persist_directory="./chroma")
store = VectorStore.create("pinecone", embedder=embedder, index_name="my-index")
Agent Configuration
config = AgentConfig(
model="gpt-4o-mini",
temperature=0.0,
max_tokens=2000,
max_iterations=6,
max_retries=3,
retry_backoff_seconds=2.0,
request_timeout=60.0,
tool_timeout_seconds=30.0,
cost_warning_threshold=0.50,
parallel_tool_execution=True,
routing_only=False,
stream=False,
cache=None, # InMemoryCache or RedisCache
tool_policy=None, # ToolPolicy with allow/review/deny rules
confirm_action=None, # Human-in-the-loop approval callback
approval_timeout=60.0, # Seconds before auto-deny
enable_analytics=True,
verbose=False,
hooks={ # Lifecycle callbacks
"on_tool_start": lambda name, args: ...,
"on_tool_end": lambda name, result, duration: ...,
"on_llm_end": lambda response, usage: ...,
},
system_prompt="You are a helpful assistant...",
)
Tool Definition
@tool Decorator (Recommended)
from selectools import tool
@tool(description="Calculate compound interest")
def calculate_interest(principal: float, rate: float, years: int) -> str:
amount = principal * (1 + rate / 100) ** years
return f"After {years} years: ${amount:.2f}"
Tool Registry
from selectools import ToolRegistry
registry = ToolRegistry()
@registry.tool(description="Search the knowledge base")
def search_kb(query: str, max_results: int = 5) -> str:
return f"Results for: {query}"
agent = Agent(tools=registry.all(), provider=provider)
Injected Parameters
Keep secrets out of the LLM's view:
db_tool = Tool(
name="query_db",
description="Execute SQL query",
parameters=[ToolParameter(name="sql", param_type=str, description="SQL query")],
function=query_database,
injected_kwargs={"db_connection": db_conn} # Hidden from LLM
)
Streaming Tools
from typing import Generator
@tool(description="Process large file", streaming=True)
def process_file(filepath: str) -> Generator[str, None, None]:
with open(filepath) as f:
for i, line in enumerate(f, 1):
yield f"[Line {i}] {line.strip()}\n"
config = AgentConfig(hooks={"on_tool_chunk": lambda name, chunk: print(chunk, end="")})
Conversation Memory
from selectools import Agent, ConversationMemory
memory = ConversationMemory(max_messages=20)
agent = Agent(tools=[...], provider=provider, memory=memory)
agent.ask("My name is Alice")
agent.ask("What's my name?") # Remembers "Alice"
Cost Tracking
result = agent.ask("Search and summarize")
print(f"Total cost: ${agent.total_cost:.6f}")
print(f"Total tokens: {agent.total_tokens:,}")
print(agent.get_usage_summary())
# Includes LLM + embedding costs, per-tool breakdown
Examples
Examples are numbered by difficulty. Start from 01 and work your way up.
| # | Example | Features | API Key? |
|---|---|---|---|
| 01 | 01_hello_world.py |
First agent, @tool, ask() |
No |
| 02 | 02_search_weather.py |
ToolRegistry, multiple tools | No |
| 03 | 03_toolbox.py |
22 pre-built tools (file, data, text, datetime) | No |
| 04 | 04_conversation_memory.py |
Multi-turn memory | Yes |
| 05 | 05_cost_tracking.py |
Token counting, cost warnings | Yes |
| 06 | 06_async_agent.py |
arun(), concurrent agents, FastAPI |
Yes |
| 07 | 07_streaming_tools.py |
Generator-based streaming | Yes |
| 08 | 08_streaming_parallel.py |
astream(), parallel execution, StreamChunk |
Yes |
| 09 | 09_caching.py |
InMemoryCache, RedisCache, cache stats | Yes |
| 10 | 10_routing_mode.py |
Routing mode, intent classification | Yes |
| 11 | 11_tool_analytics.py |
Call counts, success rates, timing | Yes |
| 12 | 12_observability_hooks.py |
Lifecycle hooks, tool validation | Yes |
| 13 | 13_dynamic_tools.py |
ToolLoader, plugins, hot-reload | Yes |
| 14 | 14_rag_basic.py |
RAG pipeline, document loading, vector search | Yes + [rag] |
| 15 | 15_semantic_search.py |
Pure semantic search, metadata filtering | Yes + [rag] |
| 16 | 16_rag_advanced.py |
PDFs, SQLite persistence, custom chunking | Yes + [rag] |
| 17 | 17_rag_multi_provider.py |
Embedding/store/chunk-size comparisons | Yes + [rag] |
| 18 | 18_hybrid_search.py |
BM25 + vector fusion, RRF, reranking | Yes + [rag] |
| 19 | 19_advanced_chunking.py |
Semantic and contextual chunking | Yes + [rag] |
| 20 | 20_customer_support_bot.py |
Multi-tool customer support workflow | Yes |
| 21 | 21_data_analysis_agent.py |
Data exploration and analysis | Yes |
| 22 | 22_ollama_local.py |
Fully local LLM via Ollama | No (Ollama) |
| 23 | 23_structured_output.py |
Pydantic response_format, auto-retry, JSON extraction | No |
| 24 | 24_traces_and_reasoning.py |
AgentTrace timeline, reasoning visibility, JSON export | No |
| 25 | 25_provider_fallback.py |
FallbackProvider, circuit breaker, failover chain | No |
| 26 | 26_batch_processing.py |
batch(), abatch(), structured batch, error isolation | No |
| 27 | 27_tool_policy.py |
ToolPolicy, deny_when, HITL approval, memory trimming | No |
Run any example:
python examples/01_hello_world.py # No API key needed
python examples/14_rag_basic.py # Needs OPENAI_API_KEY
Documentation
Comprehensive technical documentation is available in docs/:
| Module | Description |
|---|---|
| AGENT | Agent loop, structured output, traces, reasoning, batch, policy |
| STREAMING | E2E streaming, parallel execution, routing |
| TOOLS | Tool definition, validation, registry |
| DYNAMIC_TOOLS | ToolLoader, plugins, hot-reload |
| HYBRID_SEARCH | BM25, fusion, reranking |
| ADVANCED_CHUNKING | Semantic & contextual chunking |
| RAG | Complete RAG pipeline |
| EMBEDDINGS | Embedding providers |
| VECTOR_STORES | Storage backends |
| PROVIDERS | LLM provider adapters + FallbackProvider |
| MEMORY | Conversation memory + tool-pair trimming |
| USAGE | Cost tracking & analytics |
| MODELS | Model registry & pricing |
| PARSER | Tool call parsing |
| PROMPT | System prompt generation |
Tests
pytest tests/ -x -q # All tests
pytest tests/ -k "not e2e" # Skip E2E (no API keys needed)
400+ tests covering parsing, agent loop, providers, RAG pipeline, hybrid search, advanced chunking, dynamic tools, caching, streaming, and E2E integration.
License
LGPL-3.0-or-later - Use freely in commercial applications. Only modifications to the library itself must be shared. See LICENSE.
Contributing
See CONTRIBUTING.md. We welcome contributions for new tools, providers, vector stores, examples, and documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file selectools-0.14.1.tar.gz.
File metadata
- Download URL: selectools-0.14.1.tar.gz
- Upload date:
- Size: 140.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
719d3ae5d11855c7fa2264f97dbce870def959b41caffe2bfd7f3f157010c7a3
|
|
| MD5 |
0944946035269e3af2367bc3469f3d96
|
|
| BLAKE2b-256 |
e68ed51169960cd3680beacbe7cd084a114a3409d9db9617da197558131c4a95
|
File details
Details for the file selectools-0.14.1-py3-none-any.whl.
File metadata
- Download URL: selectools-0.14.1-py3-none-any.whl
- Upload date:
- Size: 145.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
998bdadd3cd1e828b8237ef28ef07a3c481de7bd9a6970e30f84d25f10724d78
|
|
| MD5 |
0dbd4a67bfec968c6ad91dab491c7136
|
|
| BLAKE2b-256 |
247970ba414a4aa156c100fae7c57b3b3b6c19b9deaa3857623558e21ede2215
|