A comprehensive framework for building agents with Small Language Models

These details have not been verified by PyPI

Project links

Project description

effGen

Build AI Agents with Small Language Models

Fast • Efficient • Powerful

📰 News & Updates

	Date	Update
🖼️	21 May 2026	v0.2.8 Released: First-class multimodal input — image, audio, and video across 6 providers (Gemini, OpenAI, Groq, Anthropic, Together, HF). New `multimodal` preset, `MultimodalDescribeTool`, unified `Message` content schema, 5 cookbook walkthroughs. See changelog
📚	20 May 2026	v0.2.7 Released: 31 prompt templates across 7 domains — research, coding, data/SQL, legal, medical, creative, business — with golden eval harness, interactive playground, and auto-generated gallery. See changelog
🚀	19 May 2026	v0.2.6 Released: 14 new tools — OCR, AudioTranscribe, ImageInfo, ImageCaption, PDF, DOCX, Excel, Weather, Geocode, Maps, EmailSMTP, EmailIMAP, SlackWebhook, DiscordWebhook. New presets: `media`, `notify`. 58+ built-in tools total. See changelog
🚀	18 May 2026	v0.2.5 Released: 13 new free tools — PubMed, ArXiv, SemanticScholar, RSS, News, YouTubeTranscript, YouTubeMetadata, Reddit, HackerNews, Translate, LanguageDetect, QRGenerate, QRRead. 44+ built-in tools total. See changelog
🚀	14 May 2026	v0.2.4 Released: ModelRouter with CostBased/LatencyBased/FirstAvailable policies, transparent provider failover, cross-process SQLite rate-limit coordination, persistent cost tracker + `effgen cost` dashboard CLI. See changelog
🚀	4 May 2026	v0.2.3 Released: 5 new cloud backends (Groq, Together AI, Fireworks, Replicate, HuggingFace Inference) — 9 providers total. Unified ProviderRegistry, `effgen doctor` auth check, backend parity matrix. See changelog
🚀	25 Apr 2026	v0.2.1 Released: Cerebras backend (4 free-tier models, streaming, native tool-calling, rate-limit coordinator, cost tracking) + OpenAI gpt-5/gpt-5.4-nano/o-series with `reasoning_effort`, prompt caching, structured outputs v2, and OpenAI native tools (web_search, code_interpreter, file_search). See changelog
🚀	9 Apr 2026	v0.2.0 Released: Major release — native tool calling, guardrails, multi-agent orchestration, RAG pipeline, 31 tools, eval framework, production API server, MLX Apple Silicon support, Python & TypeScript SDKs. See changelog
🍎	8 Apr 2026	MLX & Apple Silicon support merged (PR #4): Native Metal GPU acceleration via MLX & MLX-VLM backends. `pip install effgen[mlx]`
🔧	25 Mar 2026	v0.1.3 Released: Verification hardening — smarter loop detection, "skip the tool" prompting, model-aware token counting, sub-agent depth limits, circuit breaker persistence. See changelog
🔧	12 Mar 2026	v0.1.2 Released: Test-driven hardening — 10 example agents, 19 bug fixes, cross-model compatibility matrix (11 models, 73% pass rate). See changelog
🔒	6 Mar 2026	v0.1.1 Released: Stabilization — fixed license/metadata consistency, improved error handling, added 6 examples, expanded test suite. See changelog
🎉	1 Mar 2026	v0.1.0 Released: Major feature release — 14 built-in tools, agent presets, plugin system, real streaming, memory integration, ACP/MCP protocols, CI/CD, and comprehensive test suite. See changelog
🔧	3 Feb 2026	v0.0.2 Released: vLLM backend fixes with automatic chat template support, GPU memory control, improved OOM error handling, and multi-model family compatibility
📄	2 Feb 2026	Preprint available: EffGen: Enabling Small Language Models as Capable Autonomous Agents
🚀	31 Jan 2026	Initial release of effGen framework (v0.0.1)

🤔 What is effGen?

effGen transforms Small Language Models into powerful AI agents. While most frameworks require massive LLMs, effGen is optimized from the ground up for efficient, smaller models — delivering fast, capable agents without the compute overhead.

from effgen import Agent, load_model
from effgen.core.agent import AgentConfig
from effgen.tools.builtin import Calculator, PythonREPL

# Load a small but mighty model
model = load_model("Qwen/Qwen2.5-1.5B-Instruct", quantization="4bit")

# Create agent with tools
config = AgentConfig(
    name="math_agent",
    model=model,
    tools=[Calculator(), PythonREPL()]
)
agent = Agent(config=config)

# Run computation
result = agent.run("What is 24344 * 334?")
print(f"Answer: {result.output}")

⚡ Installation

Requires Python 3.10 or newer. Tested on Python 3.10, 3.11, 3.12, 3.13.

📦 From PyPI (Recommended)

pip install effgen

🍎 Apple Silicon (MLX)

pip install effgen[mlx]          # Text models on Apple Silicon
pip install effgen[mlx-vlm]      # Vision-Language models on Apple Silicon

🚀 With vLLM for Faster Inference

pip install effgen[vllm]

📊 Optional Extras

pip install effgen[cerebras]  # Cerebras inference backend (cerebras-cloud-sdk)
pip install effgen[rag]       # RAG pipeline (sentence-transformers, faiss-cpu)
pip install effgen[finance]   # Finance tools (yfinance)
pip install effgen[data]      # Data science tools (matplotlib, plotly)
pip install effgen[eval]      # Evaluation (rouge-score, nltk)
pip install effgen[gguf]      # GGUF model support (llama-cpp-python)

🔧 From Source

git clone https://github.com/ctrl-gaurav/effGen.git
cd effGen

# Quick install
./install.sh

# Full install (includes vLLM + dev tools)
./install.sh --full

# Manual install
pip install -e .

🚀 Quick Start

💻 CLI Usage

# Run a task
effgen run "What is the capital of France?"

# Interactive chat
effgen chat

# Start API server
effgen serve --port 8000

# Interactive wizard
effgen

🐍 Python API

from effgen import Agent, load_model
from effgen.core.agent import AgentConfig
from effgen.tools.builtin import Calculator

# Load model
model = load_model("Qwen/Qwen2.5-1.5B-Instruct", quantization="4bit")

# Configure agent
config = AgentConfig(
    name="calculator_agent",
    model=model,
    tools=[Calculator()],
    system_prompt="You are a helpful math assistant."
)

# Create and run
agent = Agent(config=config)
result = agent.run("Calculate 15% tip on $85.50")
print(result.output)

✨ Features

🧠
SLM Optimized
_{Small models}

🍎
Apple Silicon
_{MLX + Metal GPU}

🛡️
Guardrails
_{PII, injection, safety}

📚
RAG Pipeline
_{Ingest, search, cite}

👥
Multi-Agent
_{DAG workflows}

🖼️
Multimodal
_{image/audio/video}

🏭
Production API
_{OpenAI-compat}

🆕 What's New in v0.2.8

effGen v0.2.8 makes multimodal input a first-class citizen. Send images, audio clips, and short video to any vision-capable provider through a unified Message schema — the adapter handles the translation, not your code. No breaking API changes.

Image input — Gemini, OpenAI gpt-4o, Groq, Anthropic (code-only), Together, HF. Automatic resize/MIME validation via image_pre.py. Raises CapabilityNotSupportedError cleanly when the provider doesn't support vision.

Audio input — Gemini native inline audio, OpenAI Whisper transcription + gpt-4o audio, HF Inference ASR. Auto-downsamples to 16 kHz mono; chunks files over provider max duration. Anthropic raises CapabilityNotSupportedError.

Video input — Gemini native video for providers that accept raw video; frame-sampling fallback (ffmpeg) for all others. MissingSystemDependency with install hints when ffmpeg is absent.

Unified message schema — TextPart, ImagePart, AudioPart, VideoPart form a typed ContentPart union. Message.content is always a List[ContentPart]; backwards-compatible string constructor still works.

multimodal preset — create_agent("multimodal", model) wires Gemini Flash-Lite (primary) + OpenAI gpt-4o-mini (fallback) with ImageInfo, ImageCaption, OCR, AudioTranscribe, MultimodalDescribeTool, and the full tool suite.

5 cookbook walkthroughs — image Q&A, audio transcribe + reason, video summarize, OCR + LLM structured extraction, chart reading from an image.

from effgen import image_from, audio_from, video_from, load_model
from effgen.core.messages import Message, Role
from effgen.presets import create_agent

model = load_model("gemini-2.0-flash", provider="gemini")
agent = create_agent("multimodal", model)

# Image question
img = image_from("https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/PNG_transparency_demonstration_1.png/240px-PNG_transparency_demonstration_1.png")
msg = Message(role=Role.USER, content=[img, "What is in this image?"])
result = agent.run_message(msg)
print(result.output)

# Audio transcription
aud = audio_from("/tmp/clip.mp3")
msg = Message(role=Role.USER, content=[aud, "Transcribe and summarize."])
result = agent.run_message(msg)

# Multimodal preset via CLI
effgen run --preset multimodal "Describe this image" --image /tmp/photo.jpg

See the multimodal overview and the cookbook index.

🆕 What's New in v0.2.7

effGen v0.2.7 ships the Prompt Library — a curated, domain-organized catalog of 31 reusable prompt templates across 7 domains, paired with a golden evaluation harness and an interactive playground CLI. No breaking API changes.

Research — literature review (zero-shot + CoT), paper summary, citation extraction, methodology critique.

Coding — code review, bug diagnosis, refactoring plan, test generation, docstring fill.

Data / SQL — NL-to-SQL with JSON output + sqlglot validation, SQL explain, SQL optimize, data profile, ETL plan.

Legal — contract summary, clause classify, research brief. Every template enforces a mandatory legal disclaimer verbatim.

Medical — symptom triage, drug interaction, medical literature synthesis. Every template enforces a mandatory medical disclaimer verbatim.

Creative — story continuation (×2), poetry forms, character bio, world building.

Business — meeting summary, email draft (formal/casual), OKR generation, SWOT analysis, elevator pitch (≤150 words, word-count-verified).

# Discover templates
effgen prompts list
effgen prompts list --domain research --format markdown

# Evaluate (no model needed — golden test)
effgen prompts eval

# Live eval (runs through a real model)
effgen prompts eval --domain coding --live --model llama3.1-8b

# Interactive playground
effgen prompts playground

from effgen.prompts.library import registry

p = registry.get("data.sql_from_nl.v1")
sql_prompt = p.template(
    schema_ddl="CREATE TABLE orders (id INT, total FLOAT, created_at DATE)",
    question="Total revenue this month",
    dialect="sqlite",
)

See the full prompt gallery for all 31 templates.

🆕 What's New in v0.2.6

effGen v0.2.6 adds 14 new built-in tools across document, media, and communication categories — bringing the total to 58+ — plus two new presets (media, notify). No breaking API changes.

OCR — OCRTool (Tesseract local + OCR.space fallback). Raises a clear error with per-OS install hints when no backend is available.
Audio Transcription — AudioTranscribeTool (faster-whisper local; HF Inference fallback; GPU auto-detected).
Image Analysis — ImageInfoTool (Pillow metadata, zero network) + ImageCaptionTool (router-driven vision provider).
Document Parsing — PDFTool (pypdf + pdfplumber), DOCXTool (python-docx), ExcelTool (openpyxl + pandas). Added to research and general presets.
Geo / Weather — WeatherTool (Open-Meteo, free), GeocodeTool (Nominatim/OSM, 1 req/s), MapsTool (staticmap PNG).
Email & Webhooks — EmailSMTPTool, EmailIMAPTool, SlackWebhookTool, DiscordWebhookTool. All in the new notify preset. Webhook URLs are redacted in logs.

from effgen.tools.builtin.ocr import OCRTool
result = OCRTool().execute({"operation": "extract", "image_path": "/tmp/scan.png"})
print(result["data"]["text"])

See the full tool gallery for quickstart snippets for all 58+ tools.

🆕 ModelRouter — Smart Multi-Provider Routing (v0.2.4)

Route requests across 9 cloud providers automatically — pick the cheapest, fastest, or first available:

from effgen import PolicyBasedRouter, RoutingContext, CostBasedPolicy, LatencyBasedPolicy
from effgen.models.capabilities import Capability

# Build a router: try fastest first, fall back to cheapest
router = PolicyBasedRouter(policies=[LatencyBasedPolicy(), CostBasedPolicy()])

ctx = RoutingContext(
    prompt_tokens_estimate=500,
    user_budget_usd=0.01,       # stay within $0.01
    latency_budget_ms=3000,     # need response in under 3s
    required_capabilities={Capability.chat},
)

decision = router.route(ctx)
print(decision.chosen)      # e.g., ProviderModelPair("cerebras", "llama3.1-8b")
print(decision.eliminated)  # [(pair, reason), ...] — fully explainable

Transparent failover — route_and_execute retries on rate-limits, 5xx errors, or timeouts and seamlessly moves to the next-best provider.

Cost dashboard — track every API call:

effgen cost today          # per-provider per-model table
effgen cost week           # rolling 7-day view
effgen cost set-budget 1.0 # set $1/day cap

🎯 Agent Presets

Get started instantly with ready-to-use agent configurations:

from effgen import load_model
from effgen.presets import create_agent

model = load_model("Qwen/Qwen2.5-3B-Instruct", quantization="4bit")

# One-line agent creation
math_agent     = create_agent("math", model)       # Calculator + PythonREPL
research_agent = create_agent("research", model)   # WebSearch + URLFetch + Wikipedia + PubMed + ArXiv + PDF + DOCX + Excel
coding_agent   = create_agent("coding", model)     # CodeExecutor + PythonREPL + FileOps + Bash
general_agent  = create_agent("general", model)    # 64+ built-in tools
minimal_agent  = create_agent("minimal", model)    # Direct inference, no tools
media_agent    = create_agent("media", model)      # AudioTranscribe + ImageCaption
notify_agent   = create_agent("notify", model)     # EmailSMTP + EmailIMAP + Slack + Discord
multimodal_agent = create_agent("multimodal", model) # Image/Audio/Video + MultimodalDescribe

# CLI preset support
effgen run --preset math "What is sqrt(144)?"
effgen run --preset research "Tell me about quantum computing"

🛠️ Built-in Tools (64+)

🔢 Calculator _{Math & Units}	🌐 WebSearch _DuckDuckGo	💻 CodeExecutor _Sandboxed	🐍 PythonREPL _Interactive	📁 FileOps _Read/Write	🔍 Retrieval _{RAG + BM25}	🎯 AgenticSearch _ripgrep
🖥️ BashTool _{Shell Cmds}	🌤️ WeatherTool _Open-Meteo	📋 JSONTool _{Query/Validate}	🕐 DateTimeTool _Timezones	📝 TextProcessing _Regex/Count	🔗 URLFetch _{Web Scrape}	📖 Wikipedia _{Free API}
📰 News & RSS _{NewsAPI + feeds}	📚 PubMed/ArXiv _{Academic search}	🎬 YouTube _{Transcript+meta}	💬 Reddit / HN _{Social feeds}	🌍 Translate _{LibreTranslate}	🔠 LanguageDetect _{55+ langs}	▩ QR Gen/Read _{zbar fallback}
🔍 OCR _{Tesseract + OCR.space}	🎙️ AudioTranscribe _{faster-whisper}	🖼️ ImageInfo+Caption _{Pillow + VLM}	📄 PDF/DOCX/Excel _{Document parsing}	🗺️ Geocode + Maps _{OSM/Nominatim}	✉️ Email SMTP/IMAP _{Send + read}	📣 Slack + Discord _Webhooks

See the full tool gallery for quickstart snippets for all 64+ tools.

📚 Examples

python examples/basic/basic_agent.py               # Basic agent (Transformers backend)

python examples/basic/basic_agent_vllm.py          # Basic agent (vLLM backend - 5-10x faster)

python examples/web_retrieval/web_agent.py         # Web search agent

python examples/web_retrieval/retrieval_agent.py   # RAG-based retrieval

python examples/web_retrieval/agentic_search_agent.py # Grep-based agentic search

📖 More Examples

Multi-Tool Agent

from effgen import Agent, load_model
from effgen.core.agent import AgentConfig
from effgen.tools.builtin import Calculator, WebSearch, PythonREPL

model = load_model("Qwen/Qwen2.5-3B-Instruct")

config = AgentConfig(
    name="research_agent",
    model=model,
    tools=[Calculator(), WebSearch(), PythonREPL()],
    system_prompt="You are a research assistant."
)

agent = Agent(config=config)
result = agent.run("Search for the population of Tokyo and calculate what percentage it is of Japan's total population")

Streaming

from effgen import Agent, load_model
from effgen.core.agent import AgentConfig
from effgen.tools.builtin import Calculator

model = load_model("Qwen/Qwen2.5-3B-Instruct", quantization="4bit")
agent = Agent(config=AgentConfig(
    name="stream_demo", model=model,
    tools=[Calculator()], enable_streaming=True
))

for token in agent.stream("What is 2 + 2?"):
    print(token, end="", flush=True)

Memory (Multi-Turn)

agent = Agent(config=AgentConfig(
    name="memory_demo", model=model,
    tools=[], enable_memory=True
))

agent.run("My name is Alice and I'm working on quantum computing.")
result = agent.run("What's my name and what am I working on?")
# → "Your name is Alice and you're working on quantum computing."

Retrieval Agent (RAG)

from effgen.tools.builtin import Retrieval

retrieval_tool = Retrieval(knowledge_base_path="./docs")
config = AgentConfig(name="qa_agent", model=model, tools=[retrieval_tool])
agent = Agent(config=config)
result = agent.run("What does the documentation say about configuration?")

🔒 Security

🐳
Docker Sandbox
_{Isolated execution}

🛡️
Input Validation
_{Auto sanitization}

⚡
Rate Limiting
_{Configurable limits}

📋 For security policies and vulnerability reporting, see SECURITY.md

📖 Citation

If you use effGen in your research, please cite our paper:

@software{srivastava2026effgen,
      title={effGen: Enabling Small Language Models as Capable Autonomous Agents},
      author={Gaurav Srivastava and Aafiya Hussain and Chi Wang and Yingyan Celine Lin and Xuan Wang},
      year={2026},
      eprint={2602.00887},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.00887},
}

🔗 Links

📄 License

Apache License 2.0 — see LICENSE for details.

Made with ❤️ for the AI community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.8

May 21, 2026

0.2.7

May 20, 2026

0.2.6

May 19, 2026

0.2.5

May 18, 2026

0.2.4

May 14, 2026

0.2.3

May 4, 2026

0.2.2

Apr 28, 2026

0.2.1

Apr 25, 2026

0.2.0

Apr 10, 2026

0.1.3

Mar 25, 2026

0.1.2

Mar 13, 2026

0.1.1

Mar 6, 2026

0.1.0

Mar 1, 2026

0.0.2

Feb 3, 2026

0.0.1

Jan 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

effgen-0.2.8.tar.gz (896.4 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

effgen-0.2.8-py3-none-any.whl (1.1 MB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file effgen-0.2.8.tar.gz.

File metadata

Download URL: effgen-0.2.8.tar.gz
Upload date: May 21, 2026
Size: 896.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for effgen-0.2.8.tar.gz
Algorithm	Hash digest
SHA256	`f181e0c04f60676b89c2220b8e814fd77805212b8765024484a99be49ee46c74`
MD5	`6f9dc76b954e08535cb9edf27a8886e0`
BLAKE2b-256	`2efb720bcbd3ce93ec59d1212b2e466c3ae2f18681fdb30c6263935e0b4428d0`

See more details on using hashes here.

File details

Details for the file effgen-0.2.8-py3-none-any.whl.

File metadata

Download URL: effgen-0.2.8-py3-none-any.whl
Upload date: May 21, 2026
Size: 1.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for effgen-0.2.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`de61f1baa1a2150f2f23e8b0e22c579cecd05330475b67a05077e5a774973359`
MD5	`e255e4ef8a3fea42e41ad74fee4bdc2b`
BLAKE2b-256	`69de742d03b12f941ddbeee79ba6d2151e1d0edf72c22ddf60cc67044f2113ee`

See more details on using hashes here.

effgen 0.2.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

effGen

Build AI Agents with Small Language Models

📰 News & Updates

🤔 What is effGen?

⚡ Installation

📦 From PyPI (Recommended)

🍎 Apple Silicon (MLX)

🚀 With vLLM for Faster Inference

📊 Optional Extras

🔧 From Source

🚀 Quick Start

💻 CLI Usage

🐍 Python API

✨ Features

🆕 What's New in v0.2.8

🆕 What's New in v0.2.7

🆕 What's New in v0.2.6

🆕 ModelRouter — Smart Multi-Provider Routing (v0.2.4)

🎯 Agent Presets

🛠️ Built-in Tools (64+)

📚 Examples

Multi-Tool Agent

Streaming

Memory (Multi-Turn)

Retrieval Agent (RAG)

🔒 Security

📖 Citation

🔗 Links

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes