Skip to main content

Production-ready AI agents with tool calling, structured output, execution traces, and RAG. Provider-agnostic (OpenAI, Anthropic, Gemini, Ollama) with fallback chains, batch processing, tool policies, streaming, caching, and cost tracking.

Project description

Selectools

PyPI version Documentation License: LGPL v3 Python 3.9+

Production-ready AI agents with tool calling, RAG, and hybrid search. Connect LLMs to your Python functions, embed and search your documents with vector + keyword fusion, stream responses in real time, and dynamically manage tools at runtime. Works with OpenAI, Anthropic, Gemini, and Ollama. Tracks costs automatically.

What's New in v0.16.4

Parallel Execution Safety — Bug fixes for parallel tool execution, guardrail mutation, and streaming usage tracking:

  • Parallel coherence + screening — Coherence checks and output screening now run correctly during parallel tool execution.
  • Guardrail immutability — Input guardrails no longer mutate the caller's message list.
  • Streaming usage trackingastream() now correctly tracks token usage.
  • ask/aask parent_run_id — Convenience methods now propagate parent_run_id correctly.
  • 1640 tests across unit, integration, regression, and E2E suites

Full changelog: CHANGELOG.md

v0.15.x highlights
  • v0.15.0: Enterprise Reliability — Guardrails engine (5 built-in), audit logging (4 privacy levels), tool output screening (15 patterns), coherence checking
v0.14.x highlights
  • v0.14.1: Critical streaming fix — 13 bugs fixed across all providers; 141 new tests (total: 1100)
  • v0.14.0: AgentObserver Protocol (25 events), 145 models with March 2026 pricing, OpenAI max_completion_tokens auto-detection, 11 bug fixes

Why Selectools

Capability What You Get
Provider Agnostic Switch between OpenAI, Anthropic, Gemini, Ollama with one line. Your tools stay identical.
Structured Output Pydantic or JSON Schema response_format with auto-retry on validation failure.
Execution Traces Every run() returns result.trace — structured timeline of LLM calls, tool picks, and executions.
Reasoning Visibility result.reasoning surfaces why the agent chose a tool, extracted from LLM responses.
Provider Fallback FallbackProvider tries providers in priority order with circuit breaker on failure.
Batch Processing agent.batch() / agent.abatch() for concurrent multi-prompt classification.
Tool Policy Engine Declarative allow/review/deny rules with glob patterns. Human-in-the-loop approval callbacks.
Hybrid Search BM25 keyword + vector semantic search with RRF/weighted fusion and cross-encoder reranking.
Advanced Chunking Fixed, recursive, semantic (embedding-based), and contextual (LLM-enriched) chunking strategies.
E2E Streaming Token-level astream() with native tool call support. Parallel tool execution via asyncio.gather.
Dynamic Tools Load tools from files/directories at runtime. Add, remove, replace tools without restarting.
Response Caching LRU + TTL in-memory cache and Redis backend. Avoid redundant LLM calls for identical requests.
Routing Mode Agent selects a tool without executing it. Use for intent classification and request routing.
Guardrails Engine Input/output validation pipeline with PII redaction, topic blocking, toxicity detection, and format enforcement.
Audit Logging JSONL audit trail with privacy controls (redact, hash, omit) and daily rotation.
Tool Output Screening Prompt injection detection with 15 built-in patterns. Per-tool or global.
Coherence Checking LLM-based verification that tool calls match user intent — catches injection-driven tool misuse.
Persistent Sessions SessionStore with JSON file, SQLite, and Redis backends. Auto-save/load with TTL expiry.
Entity Memory LLM-based entity extraction with deduplication, LRU pruning, and system prompt injection.
Knowledge Graph Relationship triple extraction with in-memory and SQLite storage and keyword-based querying.
Cross-Session Knowledge Daily logs + persistent facts with auto-registered remember tool.
AgentObserver Protocol 25-event lifecycle observer with run_id/call_id correlation. Built-in LoggingObserver for structured JSON logs.
Production Hardened Retries with backoff, per-tool timeouts, iteration caps, cost warnings, observability hooks + observers.
Library-First Not a framework. No magic globals, no hidden state. Use as much or as little as you need.

What's Included

  • 5 LLM Providers: OpenAI, Anthropic, Gemini, Ollama + FallbackProvider (auto-failover)
  • Structured Output: Pydantic / JSON Schema response_format with auto-retry
  • Execution Traces: result.trace with typed timeline of every agent step
  • Reasoning Visibility: result.reasoning explains why the agent chose a tool
  • Batch Processing: agent.batch() / agent.abatch() for concurrent classification
  • Tool Policy Engine: Declarative allow/review/deny rules with human-in-the-loop
  • 4 Embedding Providers: OpenAI, Anthropic/Voyage, Gemini (free!), Cohere
  • 4 Vector Stores: In-memory, SQLite, Chroma, Pinecone
  • Hybrid Search: BM25 + vector fusion with Cohere/Jina reranking
  • Advanced Chunking: Semantic + contextual chunking for better retrieval
  • Dynamic Tool Loading: Plugin system with hot-reload support
  • Response Caching: InMemoryCache and RedisCache with stats tracking
  • 146 Model Registry: Type-safe constants with pricing and metadata
  • Pre-built Toolbox: 24 tools for files, data, text, datetime, web
  • Persistent Sessions: 3 backends (JSON file, SQLite, Redis) with TTL
  • Entity Memory: LLM-based named entity extraction and tracking
  • Knowledge Graph: Triple extraction with in-memory and SQLite storage
  • Cross-Session Knowledge: Daily logs + persistent memory with remember tool
  • 38 Examples: RAG, hybrid search, streaming, structured output, traces, batch, policy, observer, guardrails, audit, sessions, entity memory, knowledge graph, and more
  • AgentObserver Protocol: 25 lifecycle events with run_id correlation, LoggingObserver, OTel export
  • 1640 Tests: Unit, integration, regression, and E2E with real API calls

Install

pip install selectools                    # Core + basic RAG
pip install selectools[rag]               # + Chroma, Pinecone, Voyage, Cohere, PyPDF
pip install selectools[cache]             # + Redis cache
pip install selectools[rag,cache]         # Everything

Set your API key:

export OPENAI_API_KEY="sk-..."

Quick Start

New to Selectools? Follow the 5-minute Quickstart tutorial — no API key needed.

Tool Calling Agent (No API Key)

from selectools import Agent, AgentConfig, tool
from selectools.providers.stubs import LocalProvider

@tool(description="Look up the price of a product")
def get_price(product: str) -> str:
    prices = {"laptop": "$999", "phone": "$699", "headphones": "$149"}
    return prices.get(product.lower(), f"No price found for {product}")

agent = Agent(
    tools=[get_price],
    provider=LocalProvider(),
    config=AgentConfig(max_iterations=3),
)

result = agent.ask("How much is a laptop?")
print(result.content)

Tool Calling Agent (OpenAI)

from selectools import Agent, AgentConfig, OpenAIProvider, tool
from selectools.models import OpenAI

@tool(description="Search the web for information")
def search(query: str) -> str:
    return f"Results for: {query}"

agent = Agent(
    tools=[search],
    provider=OpenAIProvider(default_model=OpenAI.GPT_4O_MINI.id),
    config=AgentConfig(max_iterations=5),
)

result = agent.ask("Search for Python tutorials")
print(result.content)

RAG Agent

from selectools import OpenAIProvider
from selectools.embeddings import OpenAIEmbeddingProvider
from selectools.models import OpenAI
from selectools.rag import RAGAgent, VectorStore

embedder = OpenAIEmbeddingProvider(model=OpenAI.Embeddings.TEXT_EMBEDDING_3_SMALL.id)
store = VectorStore.create("memory", embedder=embedder)

agent = RAGAgent.from_directory(
    directory="./docs",
    provider=OpenAIProvider(default_model=OpenAI.GPT_4O_MINI.id),
    vector_store=store,
    chunk_size=500, top_k=3,
)

result = agent.ask("What are the main features?")
print(result.content)
print(agent.get_usage_summary())  # LLM + embedding costs

Hybrid Search (Keyword + Semantic)

from selectools.rag import BM25, HybridSearcher, FusionMethod, HybridSearchTool, VectorStore

store = VectorStore.create("memory", embedder=embedder)
store.add_documents(chunked_docs)

searcher = HybridSearcher(
    vector_store=store,
    vector_weight=0.6,
    keyword_weight=0.4,
    fusion=FusionMethod.RRF,
)
searcher.add_documents(chunked_docs)

# Use with agent
hybrid_tool = HybridSearchTool(searcher=searcher, top_k=5)
agent = Agent(tools=[hybrid_tool.search_knowledge_base], provider=provider)

Streaming with Parallel Tools

import asyncio
from selectools import Agent, AgentConfig
from selectools.types import StreamChunk, AgentResult

agent = Agent(
    tools=[tool_a, tool_b, tool_c],
    provider=provider,
    config=AgentConfig(parallel_tool_execution=True),  # Default: enabled
)

async for item in agent.astream("Run all tasks"):
    if isinstance(item, StreamChunk):
        print(item.content, end="", flush=True)
    elif isinstance(item, AgentResult):
        print(f"\nDone in {item.iterations} iterations")

Key Features

Hybrid Search & Reranking

Combine semantic search with BM25 keyword matching for better recall on exact terms, names, and acronyms:

from selectools.rag import BM25, HybridSearcher, CohereReranker, FusionMethod

searcher = HybridSearcher(
    vector_store=store,
    fusion=FusionMethod.RRF,
    reranker=CohereReranker(),  # Optional cross-encoder reranking
)
results = searcher.search("GDPR compliance", top_k=5)

See docs/modules/HYBRID_SEARCH.md for full documentation.

Advanced Chunking

Go beyond fixed-size splitting with embedding-aware and LLM-enriched chunking:

from selectools.rag import SemanticChunker, ContextualChunker

# Split at topic boundaries using embedding similarity
semantic = SemanticChunker(embedder=embedder, similarity_threshold=0.75)

# Enrich each chunk with LLM-generated context (Anthropic-style contextual retrieval)
contextual = ContextualChunker(base_chunker=semantic, provider=provider)
enriched_docs = contextual.split_documents(documents)

See docs/modules/ADVANCED_CHUNKING.md for full documentation.

Dynamic Tool Loading

Discover and load @tool functions from files and directories at runtime:

from selectools.tools import ToolLoader

# Load tools from a plugin directory
tools = ToolLoader.from_directory("./plugins", recursive=True)
agent.add_tools(tools)

# Hot-reload after editing a plugin
updated = ToolLoader.reload_file("./plugins/search.py")
agent.replace_tool(updated[0])

# Remove tools the agent no longer needs
agent.remove_tool("deprecated_search")

See docs/modules/DYNAMIC_TOOLS.md for full documentation.

Response Caching

Avoid redundant LLM calls with pluggable caching:

from selectools import Agent, AgentConfig, InMemoryCache

cache = InMemoryCache(max_size=1000, default_ttl=300)
agent = Agent(
    tools=[...],
    provider=provider,
    config=AgentConfig(cache=cache),
)

# Same question twice -> second call is instant (cache hit)
agent.ask("What is Python?")
agent.reset()
agent.ask("What is Python?")

print(cache.stats)  # CacheStats(hits=1, misses=1, hit_rate=50.00%)

For distributed setups: from selectools.cache_redis import RedisCache

Routing Mode

Agent selects a tool without executing it -- use for intent classification:

config = AgentConfig(routing_only=True)
agent = Agent(tools=[send_email, schedule_meeting, search_kb], provider=provider, config=config)

result = agent.ask("Book a meeting with Alice tomorrow")
print(result.tool_name)  # "schedule_meeting"
print(result.tool_args)  # {"attendee": "Alice", "date": "tomorrow"}

Structured Output

Get typed, validated results from the LLM:

from pydantic import BaseModel
from typing import Literal

class Classification(BaseModel):
    intent: Literal["billing", "support", "sales", "cancel"]
    confidence: float
    priority: Literal["low", "medium", "high"]

result = agent.ask("I want to cancel my account", response_format=Classification)
print(result.parsed)  # Classification(intent="cancel", confidence=0.95, priority="high")

Auto-retries with error feedback when validation fails.

Execution Traces & Reasoning

See exactly what your agent did and why:

result = agent.run("Classify this ticket")

# Structured timeline of every step
for step in result.trace:
    print(f"{step.type} | {step.duration_ms:.0f}ms | {step.summary}")

# Why the agent chose a tool
print(result.reasoning)  # "Customer is asking about billing, routing to billing_support"

# Export for dashboards
result.trace.to_json("trace.json")

Provider Fallback

Automatic failover with circuit breaker:

from selectools import FallbackProvider, OpenAIProvider, AnthropicProvider

provider = FallbackProvider([
    OpenAIProvider(default_model="gpt-4o-mini"),
    AnthropicProvider(default_model="claude-haiku"),
])
agent = Agent(tools=[...], provider=provider)
# If OpenAI is down → tries Anthropic automatically

Batch Processing

Classify multiple requests concurrently:

results = await agent.abatch(
    ["Cancel my subscription", "How do I upgrade?", "My payment failed"],
    max_concurrency=10,
)

Tool Policy & Human-in-the-Loop

Declarative safety rules with approval callbacks:

from selectools import ToolPolicy

policy = ToolPolicy(
    allow=["search_*", "read_*"],
    review=["send_*", "create_*"],
    deny=["delete_*"],
)

async def confirm(tool_name, tool_args, reason):
    return await get_user_approval(tool_name, tool_args)

config = AgentConfig(tool_policy=policy, confirm_action=confirm)

AgentObserver Protocol

Class-based observability with run_id correlation for Langfuse, OpenTelemetry, Datadog, or custom integrations:

from selectools import Agent, AgentConfig, AgentObserver, LoggingObserver

class MyObserver(AgentObserver):
    def on_tool_end(self, run_id, call_id, tool_name, result, duration_ms):
        print(f"[{run_id}] {tool_name} finished in {duration_ms:.1f}ms")

    def on_provider_fallback(self, run_id, failed_provider, next_provider, error):
        print(f"[{run_id}] {failed_provider} failed, falling back to {next_provider}")

agent = Agent(
    tools=[...], provider=provider,
    config=AgentConfig(observers=[MyObserver(), LoggingObserver()]),
)

25 lifecycle events: run, LLM, tool, iteration, batch, policy, structured output, fallback, retry, memory trim, guardrail, coherence, screening, session, entity, KG. See observer.py for full reference.

E2E Streaming & Parallel Execution

  • agent.astream() yields StreamChunk (text deltas) then AgentResult (final)
  • Multiple tool calls execute concurrently via asyncio.gather() (3 tools @ 0.15s each = ~0.15s total)
  • Fallback chain: astream -> acomplete -> complete via executor
  • Context propagation with contextvars for tracing/auth

See docs/modules/STREAMING.md for full documentation.

Providers

Provider Streaming Vision Native Tools Cost
OpenAI Yes Yes Yes Paid
Anthropic Yes Yes Yes Paid
Gemini Yes Yes Yes Free tier
Ollama Yes No No Free (local)
Fallback Yes Yes Yes Varies (wraps others)
Local No No No Free (testing)
from selectools.models import OpenAI, Anthropic, Gemini, Ollama

# IDE autocomplete for all 146 models with pricing metadata
model = OpenAI.GPT_4O_MINI
print(f"Cost: ${model.prompt_cost}/${model.completion_cost} per 1M tokens")
print(f"Context: {model.context_window:,} tokens")

Embedding Providers

from selectools.embeddings import (
    OpenAIEmbeddingProvider,     # text-embedding-3-small/large
    AnthropicEmbeddingProvider,  # Voyage AI (voyage-3, voyage-3-lite)
    GeminiEmbeddingProvider,     # FREE (text-embedding-001/004)
    CohereEmbeddingProvider,     # embed-english-v3.0
)

Vector Stores

from selectools.rag import VectorStore

store = VectorStore.create("memory", embedder=embedder)           # Fast, no persistence
store = VectorStore.create("sqlite", embedder=embedder, db_path="docs.db")  # Persistent
store = VectorStore.create("chroma", embedder=embedder, persist_directory="./chroma")
store = VectorStore.create("pinecone", embedder=embedder, index_name="my-index")

Agent Configuration

config = AgentConfig(
    model="gpt-4o-mini",
    temperature=0.0,
    max_tokens=2000,
    max_iterations=6,
    max_retries=3,
    retry_backoff_seconds=2.0,
    request_timeout=60.0,
    tool_timeout_seconds=30.0,
    cost_warning_threshold=0.50,
    parallel_tool_execution=True,
    routing_only=False,
    stream=False,
    cache=None,                  # InMemoryCache or RedisCache
    tool_policy=None,            # ToolPolicy with allow/review/deny rules
    confirm_action=None,         # Human-in-the-loop approval callback
    approval_timeout=60.0,       # Seconds before auto-deny
    enable_analytics=True,
    verbose=False,
    hooks={                      # Lifecycle callbacks
        "on_tool_start": lambda name, args: ...,
        "on_tool_end": lambda name, result, duration: ...,
        "on_llm_end": lambda response, usage: ...,
    },
    system_prompt="You are a helpful assistant...",
)

Tool Definition

@tool Decorator (Recommended)

from selectools import tool

@tool(description="Calculate compound interest")
def calculate_interest(principal: float, rate: float, years: int) -> str:
    amount = principal * (1 + rate / 100) ** years
    return f"After {years} years: ${amount:.2f}"

Tool Registry

from selectools import ToolRegistry

registry = ToolRegistry()

@registry.tool(description="Search the knowledge base")
def search_kb(query: str, max_results: int = 5) -> str:
    return f"Results for: {query}"

agent = Agent(tools=registry.all(), provider=provider)

Injected Parameters

Keep secrets out of the LLM's view:

db_tool = Tool(
    name="query_db",
    description="Execute SQL query",
    parameters=[ToolParameter(name="sql", param_type=str, description="SQL query")],
    function=query_database,
    injected_kwargs={"db_connection": db_conn}  # Hidden from LLM
)

Streaming Tools

from typing import Generator

@tool(description="Process large file", streaming=True)
def process_file(filepath: str) -> Generator[str, None, None]:
    with open(filepath) as f:
        for i, line in enumerate(f, 1):
            yield f"[Line {i}] {line.strip()}\n"

config = AgentConfig(hooks={"on_tool_chunk": lambda name, chunk: print(chunk, end="")})

Conversation Memory

from selectools import Agent, ConversationMemory

memory = ConversationMemory(max_messages=20)
agent = Agent(tools=[...], provider=provider, memory=memory)

agent.ask("My name is Alice")
agent.ask("What's my name?")  # Remembers "Alice"

Cost Tracking

result = agent.ask("Search and summarize")

print(f"Total cost: ${agent.total_cost:.6f}")
print(f"Total tokens: {agent.total_tokens:,}")
print(agent.get_usage_summary())
# Includes LLM + embedding costs, per-tool breakdown

Examples

Examples are numbered by difficulty. Start from 01 and work your way up.

# Example Features API Key?
01 01_hello_world.py First agent, @tool, ask() No
02 02_search_weather.py ToolRegistry, multiple tools No
03 03_toolbox.py 22 pre-built tools (file, data, text, datetime) No
04 04_conversation_memory.py Multi-turn memory Yes
05 05_cost_tracking.py Token counting, cost warnings Yes
06 06_async_agent.py arun(), concurrent agents, FastAPI Yes
07 07_streaming_tools.py Generator-based streaming Yes
08 08_streaming_parallel.py astream(), parallel execution, StreamChunk Yes
09 09_caching.py InMemoryCache, RedisCache, cache stats Yes
10 10_routing_mode.py Routing mode, intent classification Yes
11 11_tool_analytics.py Call counts, success rates, timing Yes
12 12_observability_hooks.py Lifecycle hooks, tool validation Yes
13 13_dynamic_tools.py ToolLoader, plugins, hot-reload Yes
14 14_rag_basic.py RAG pipeline, document loading, vector search Yes + [rag]
15 15_semantic_search.py Pure semantic search, metadata filtering Yes + [rag]
16 16_rag_advanced.py PDFs, SQLite persistence, custom chunking Yes + [rag]
17 17_rag_multi_provider.py Embedding/store/chunk-size comparisons Yes + [rag]
18 18_hybrid_search.py BM25 + vector fusion, RRF, reranking Yes + [rag]
19 19_advanced_chunking.py Semantic and contextual chunking Yes + [rag]
20 20_customer_support_bot.py Multi-tool customer support workflow Yes
21 21_data_analysis_agent.py Data exploration and analysis Yes
22 22_ollama_local.py Fully local LLM via Ollama No (Ollama)
23 23_structured_output.py Pydantic response_format, auto-retry, JSON extraction No
24 24_traces_and_reasoning.py AgentTrace timeline, reasoning visibility, JSON export No
25 25_provider_fallback.py FallbackProvider, circuit breaker, failover chain No
26 26_batch_processing.py batch(), abatch(), structured batch, error isolation No
27 27_tool_policy.py ToolPolicy, deny_when, HITL approval, memory trimming No

Run any example:

python examples/01_hello_world.py   # No API key needed
python examples/14_rag_basic.py     # Needs OPENAI_API_KEY

Documentation

Read the full documentation — hosted on GitHub Pages with search, dark mode, and easy navigation.

Also available in docs/:

Module Description
AGENT Agent loop, structured output, traces, reasoning, batch, policy
STREAMING E2E streaming, parallel execution, routing
TOOLS Tool definition, validation, registry
DYNAMIC_TOOLS ToolLoader, plugins, hot-reload
HYBRID_SEARCH BM25, fusion, reranking
ADVANCED_CHUNKING Semantic & contextual chunking
RAG Complete RAG pipeline
EMBEDDINGS Embedding providers
VECTOR_STORES Storage backends
PROVIDERS LLM provider adapters + FallbackProvider
MEMORY Conversation memory + tool-pair trimming
USAGE Cost tracking & analytics
MODELS Model registry & pricing
PARSER Tool call parsing
PROMPT System prompt generation

Tests

pytest tests/ -x -q          # All tests
pytest tests/ -k "not e2e"   # Skip E2E (no API keys needed)

400+ tests covering parsing, agent loop, providers, RAG pipeline, hybrid search, advanced chunking, dynamic tools, caching, streaming, and E2E integration.

License

LGPL-3.0-or-later - Use freely in commercial applications. Only modifications to the library itself must be shared. See LICENSE.

Contributing

See CONTRIBUTING.md. We welcome contributions for new tools, providers, vector stores, examples, and documentation.


Roadmap | Changelog | Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

selectools-0.16.5.tar.gz (209.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

selectools-0.16.5-py3-none-any.whl (182.4 kB view details)

Uploaded Python 3

File details

Details for the file selectools-0.16.5.tar.gz.

File metadata

  • Download URL: selectools-0.16.5.tar.gz
  • Upload date:
  • Size: 209.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for selectools-0.16.5.tar.gz
Algorithm Hash digest
SHA256 b7159f503846b9623ff7e83d5eefef017b5f9b38b1a8cea37d28ac4f17bd1d72
MD5 d8487894bff47b68bd9a03a8ab8c1a12
BLAKE2b-256 3605097532e76651db2ec066322110039ac1a94e40347c5132a3bc0a1a0e9686

See more details on using hashes here.

File details

Details for the file selectools-0.16.5-py3-none-any.whl.

File metadata

  • Download URL: selectools-0.16.5-py3-none-any.whl
  • Upload date:
  • Size: 182.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for selectools-0.16.5-py3-none-any.whl
Algorithm Hash digest
SHA256 d03bdd1649bc9a8baa55bab9f419518edd6f8e667b44527db0d1a0c18ac5a03b
MD5 f27c400f5a53d6519603852adefec420
BLAKE2b-256 f35f4df7297200d8e52b76e28b579003d1bf334319a3dbb0cfa1ef6e9c8c6bc3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page