Skip to main content

A comprehensive framework for building agents with Small Language Models

Project description

effGen

effGen

Build AI Agents with Small Language Models

Fast โ€ข Efficient โ€ข Powerful


CI arXiv PyPI Python License

Total Downloads Monthly Downloads Stars Forks

Paper Website Docs PyPI


๐Ÿ“ฐ News & Updates

Date Update
๐Ÿš€ 25 Apr 2026 v0.2.1 Released: Cerebras backend (4 free-tier models, streaming, native tool-calling, rate-limit coordinator, cost tracking) + OpenAI gpt-5/gpt-5.4-nano/o-series with reasoning_effort, prompt caching, structured outputs v2, and OpenAI native tools (web_search, code_interpreter, file_search). See changelog
๐Ÿš€ 9 Apr 2026 v0.2.0 Released: Major release โ€” native tool calling, guardrails, multi-agent orchestration, RAG pipeline, 31 tools, eval framework, production API server, MLX Apple Silicon support, Python & TypeScript SDKs. See changelog
๐ŸŽ 8 Apr 2026 MLX & Apple Silicon support merged (PR #4): Native Metal GPU acceleration via MLX & MLX-VLM backends. pip install effgen[mlx]
๐Ÿ”ง 25 Mar 2026 v0.1.3 Released: Verification hardening โ€” smarter loop detection, "skip the tool" prompting, model-aware token counting, sub-agent depth limits, circuit breaker persistence. See changelog
๐Ÿ”ง 12 Mar 2026 v0.1.2 Released: Test-driven hardening โ€” 10 example agents, 19 bug fixes, cross-model compatibility matrix (11 models, 73% pass rate). See changelog
๐Ÿ”’ 6 Mar 2026 v0.1.1 Released: Stabilization โ€” fixed license/metadata consistency, improved error handling, added 6 examples, expanded test suite. See changelog
๐ŸŽ‰ 1 Mar 2026 v0.1.0 Released: Major feature release โ€” 14 built-in tools, agent presets, plugin system, real streaming, memory integration, ACP/MCP protocols, CI/CD, and comprehensive test suite. See changelog
๐Ÿ”ง 3 Feb 2026 v0.0.2 Released: vLLM backend fixes with automatic chat template support, GPU memory control, improved OOM error handling, and multi-model family compatibility
๐Ÿ“„ 2 Feb 2026 Preprint available: EffGen: Enabling Small Language Models as Capable Autonomous Agents
๐Ÿš€ 31 Jan 2026 Initial release of effGen framework (v0.0.1)

๐Ÿค” What is effGen?

effGen transforms Small Language Models into powerful AI agents. While most frameworks require massive LLMs, effGen is optimized from the ground up for efficient, smaller models โ€” delivering fast, capable agents without the compute overhead.

from effgen import Agent, load_model
from effgen.core.agent import AgentConfig
from effgen.tools.builtin import Calculator, PythonREPL

# Load a small but mighty model
model = load_model("Qwen/Qwen2.5-1.5B-Instruct", quantization="4bit")

# Create agent with tools
config = AgentConfig(
    name="math_agent",
    model=model,
    tools=[Calculator(), PythonREPL()]
)
agent = Agent(config=config)

# Run computation
result = agent.run("What is 24344 * 334?")
print(f"Answer: {result.output}")

โšก Installation

Requires Python 3.10 or newer. Tested on Python 3.10, 3.11, 3.12, 3.13.

๐Ÿ“ฆ From PyPI (Recommended)

pip install effgen

๐ŸŽ Apple Silicon (MLX)

pip install effgen[mlx]          # Text models on Apple Silicon
pip install effgen[mlx-vlm]      # Vision-Language models on Apple Silicon

๐Ÿš€ With vLLM for Faster Inference

pip install effgen[vllm]

๐Ÿ“Š Optional Extras

pip install effgen[cerebras]  # Cerebras inference backend (cerebras-cloud-sdk)
pip install effgen[rag]       # RAG pipeline (sentence-transformers, faiss-cpu)
pip install effgen[finance]   # Finance tools (yfinance)
pip install effgen[data]      # Data science tools (matplotlib, plotly)
pip install effgen[eval]      # Evaluation (rouge-score, nltk)
pip install effgen[gguf]      # GGUF model support (llama-cpp-python)

๐Ÿ”ง From Source

git clone https://github.com/ctrl-gaurav/effGen.git
cd effGen

# Quick install
./install.sh

# Full install (includes vLLM + dev tools)
./install.sh --full

# Manual install
pip install -e .

๐Ÿš€ Quick Start

๐Ÿ’ป CLI Usage

# Run a task
effgen run "What is the capital of France?"

# Interactive chat
effgen chat

# Start API server
effgen serve --port 8000

# Interactive wizard
effgen

๐Ÿ Python API

from effgen import Agent, load_model
from effgen.core.agent import AgentConfig
from effgen.tools.builtin import Calculator

# Load model
model = load_model("Qwen/Qwen2.5-1.5B-Instruct", quantization="4bit")

# Configure agent
config = AgentConfig(
    name="calculator_agent",
    model=model,
    tools=[Calculator()],
    system_prompt="You are a helpful math assistant."
)

# Create and run
agent = Agent(config=config)
result = agent.run("Calculate 15% tip on $85.50")
print(result.output)

โœจ Features

๐Ÿง 
SLM Optimized
Small models

๐ŸŽ
Apple Silicon
MLX + Metal GPU

๐Ÿ›ก๏ธ
Guardrails
PII, injection, safety

๐Ÿ“š
RAG Pipeline
Ingest, search, cite

๐Ÿ‘ฅ
Multi-Agent
DAG workflows

๐Ÿ”ง
31 Tools
+ MCP/A2A/ACP

๐Ÿญ
Production API
OpenAI-compat


๐ŸŽฏ Agent Presets

Get started instantly with ready-to-use agent configurations:

from effgen import load_model
from effgen.presets import create_agent

model = load_model("Qwen/Qwen2.5-3B-Instruct", quantization="4bit")

# One-line agent creation
math_agent = create_agent("math", model)       # Calculator + PythonREPL
research_agent = create_agent("research", model) # WebSearch + URLFetch + Wikipedia
coding_agent = create_agent("coding", model)     # CodeExecutor + PythonREPL + FileOps + Bash
general_agent = create_agent("general", model)   # All 11 tools
minimal_agent = create_agent("minimal", model)   # Direct inference, no tools
# CLI preset support
effgen run --preset math "What is sqrt(144)?"
effgen run --preset research "Tell me about quantum computing"

๐Ÿ› ๏ธ Built-in Tools (31)

๐Ÿ”ข
Calculator
Math & Units

๐ŸŒ
WebSearch
DuckDuckGo

๐Ÿ’ป
CodeExecutor
Sandboxed

๐Ÿ
PythonREPL
Interactive

๐Ÿ“
FileOps
Read/Write

๐Ÿ”
Retrieval
RAG + BM25

๐ŸŽฏ
AgenticSearch
ripgrep

๐Ÿ–ฅ๏ธ
BashTool
Shell Cmds

๐ŸŒค๏ธ
WeatherTool
Open-Meteo

๐Ÿ“‹
JSONTool
Query/Validate

๐Ÿ•
DateTimeTool
Timezones

๐Ÿ“
TextProcessing
Regex/Count

๐Ÿ”—
URLFetch
Web Scrape

๐Ÿ“–
Wikipedia
Free API


๐Ÿ“š Examples

python examples/basic/basic_agent.py               # Basic agent (Transformers backend)

python examples/basic/basic_agent_vllm.py          # Basic agent (vLLM backend - 5-10x faster)

python examples/web_retrieval/web_agent.py         # Web search agent

python examples/web_retrieval/retrieval_agent.py   # RAG-based retrieval

python examples/web_retrieval/agentic_search_agent.py # Grep-based agentic search
๐Ÿ“– More Examples

Multi-Tool Agent

from effgen import Agent, load_model
from effgen.core.agent import AgentConfig
from effgen.tools.builtin import Calculator, WebSearch, PythonREPL

model = load_model("Qwen/Qwen2.5-3B-Instruct")

config = AgentConfig(
    name="research_agent",
    model=model,
    tools=[Calculator(), WebSearch(), PythonREPL()],
    system_prompt="You are a research assistant."
)

agent = Agent(config=config)
result = agent.run("Search for the population of Tokyo and calculate what percentage it is of Japan's total population")

Streaming

from effgen import Agent, load_model
from effgen.core.agent import AgentConfig
from effgen.tools.builtin import Calculator

model = load_model("Qwen/Qwen2.5-3B-Instruct", quantization="4bit")
agent = Agent(config=AgentConfig(
    name="stream_demo", model=model,
    tools=[Calculator()], enable_streaming=True
))

for token in agent.stream("What is 2 + 2?"):
    print(token, end="", flush=True)

Memory (Multi-Turn)

agent = Agent(config=AgentConfig(
    name="memory_demo", model=model,
    tools=[], enable_memory=True
))

agent.run("My name is Alice and I'm working on quantum computing.")
result = agent.run("What's my name and what am I working on?")
# โ†’ "Your name is Alice and you're working on quantum computing."

Retrieval Agent (RAG)

from effgen.tools.builtin import Retrieval

retrieval_tool = Retrieval(knowledge_base_path="./docs")
config = AgentConfig(name="qa_agent", model=model, tools=[retrieval_tool])
agent = Agent(config=config)
result = agent.run("What does the documentation say about configuration?")

๐Ÿ”’ Security

๐Ÿณ
Docker Sandbox
Isolated execution

๐Ÿ›ก๏ธ
Input Validation
Auto sanitization

โšก
Rate Limiting
Configurable limits

๐Ÿ“‹ For security policies and vulnerability reporting, see SECURITY.md


๐Ÿ“– Citation

If you use effGen in your research, please cite our paper:

@software{srivastava2026effgen,
      title={effGen: Enabling Small Language Models as Capable Autonomous Agents},
      author={Gaurav Srivastava and Aafiya Hussain and Chi Wang and Yingyan Celine Lin and Xuan Wang},
      year={2026},
      eprint={2602.00887},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.00887},
}

๐Ÿ”— Links

Paper Website Docs PyPI Issues


๐Ÿ“„ License

Apache License 2.0 โ€” see LICENSE for details.


Get Started Examples Paper GitHub

Made with โค๏ธ for the AI community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

effgen-0.2.3.tar.gz (691.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

effgen-0.2.3-py3-none-any.whl (816.4 kB view details)

Uploaded Python 3

File details

Details for the file effgen-0.2.3.tar.gz.

File metadata

  • Download URL: effgen-0.2.3.tar.gz
  • Upload date:
  • Size: 691.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for effgen-0.2.3.tar.gz
Algorithm Hash digest
SHA256 db7727b6e7bf6b770f31095cb5988bcd757271864d33cae4a7509e751cec6c10
MD5 81e4eaaac0cf3fcc4997ea516ac8740a
BLAKE2b-256 24a55f2db6d7578216f848dbcba47f86fbd9a41d805b0d33ee705b60995e8c81

See more details on using hashes here.

File details

Details for the file effgen-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: effgen-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 816.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for effgen-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e90bfaea2b6e82d86c248e0a428c21ca3cac0758e7621fcbc1a1093040493e68
MD5 1bd9478e4bf6ce1960859bc069d3227b
BLAKE2b-256 da2d7bfa86c202902dd56d6b618d398ac0919700410a27b89b0ede41adb4a970

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page