Skip to main content

AgentForge — ReAct agents on open-weight LLMs with tools (RAG, REPL, web, SQL, calculator) and an eval harness. Pairs with ragforge-ml and turboquant-ml.

Project description

AgentForge

ReAct agents on open-weight LLMs — tools, memory, and an eval harness.
Pairs with ragforge-ml for retrieval and turboquant-ml for quantized model serving.

PyPI Python License Docs


Why AgentForge?

Most "agent framework" projects use proprietary models (GPT-4, Claude) behind a DSL of Runnable.invoke() chains nobody can debug. AgentForge is the opposite: ReAct loops on open-weight LLMs (Llama, Qwen, Mistral), with a small registry of well-bounded tools, and an evaluation harness so you can measure whether your agent is actually doing what you asked.

Three opinions:

  1. Open models first. Defaults work on Qwen/Qwen2.5-3B-Instruct and any chat-template HF model. No API key required. Plug in turboquant-ml to serve the model quantized.
  2. ReAct, not magic. The loop is a 60-line function (agent.py:run) that alternates Thought / Action / Observation steps. Easy to read, easy to debug.
  3. Tools have hard boundaries. Python REPL runs in an AST-whitelisted sandbox; SQL is read-only; web search is rate-limited; RAG retrieval is delegated to ragforge-ml.

Features

Stage Default
LLM Any HuggingFace chat-template model. Optional bnb-nf4 via turboquant-ml.
Loop ReAct with max_steps, structured Thought/Action/Observation parser
Tools calculator, python (sandboxed), web_search (DuckDuckGo), sql (read-only sqlite), rag (RAGforge)
Memory In-memory conversation, persistent SQLite store
Eval task_completion, tool_accuracy, step_efficiency, final_answer_match
Serve FastAPI /ask, /tools, /health
CLI agentforge ask / eval / tools / serve

Installation

The PyPI distribution is agentforge-ml (the unsuffixed agentforge name was taken by an unrelated project). Python import and CLI are just agentforge / af:

pip install agentforge-ml                       # core
pip install "agentforge-ml[tools]"              # + sympy + duckduckgo-search
pip install "agentforge-ml[rag]"                # + ragforge-ml integration
pip install "agentforge-ml[quantized]"          # + turboquant-ml NF4 path
pip install "agentforge-ml[serve]"              # + FastAPI
pip install "agentforge-ml[all]"                # everything

60-second tour

from agentforge import Agent
from agentforge.tools import Calculator, WebSearch, PythonREPL

agent = Agent.from_defaults(
    model_id="Qwen/Qwen2.5-3B-Instruct",
    tools=[Calculator(), PythonREPL(), WebSearch()],
)

result = agent.run("What is 47 * 1337, then take its square root?")
print(result.final_answer)
for step in result.steps:
    print(f"  [{step.tool}] {step.action_input!r} -> {step.observation!r}")

With RAG

from agentforge import Agent
from agentforge.tools import RAGTool
from ragforge import Pipeline

rag = Pipeline.from_defaults(model_id="Qwen/Qwen2.5-3B-Instruct")
rag.ingest(["docs/"])

agent = Agent.from_defaults(
    model_id="Qwen/Qwen2.5-3B-Instruct",
    tools=[RAGTool(rag)],
)
print(agent.run("What is our company refund policy?").final_answer)

CLI

af ask "What is 17 squared?" --tools calculator
af ask "Latest CVE for log4j?" --tools web_search
af eval data/eval_set.jsonl --tools calculator,python_repl
af serve --tools calculator,python_repl --port 8080

ReAct loop, in a picture

question -> [LLM] Thought + Action -> [Tool] Observation
            ^                                       |
            |_______________________________________|
                       up to max_steps

If the LLM emits Final Answer: the loop exits. Otherwise it loops until max_steps. The parser is forgiving: it tolerates whitespace and case but falls back to the last completed step on truncation.

Eval harness

Built-in, pure Python, no judge model required:

Metric What it measures
task_completion Did the agent produce a Final Answer:?
final_answer_match Does the answer contain the ground-truth string (case-folded substring)?
tool_accuracy Of the steps, what fraction used the expected tool?
step_efficiency ground_truth_steps / actual_steps, clipped to [0, 1]
af eval examples/eval_set.jsonl --tools all
+--------------------+--------+
|  metric            |  mean  |
+--------------------+--------+
| task_completion    |  0.95  |
| final_answer_match |  0.81  |
| tool_accuracy      |  0.88  |
| step_efficiency    |  0.72  |
+--------------------+--------+
n=80  ·  p50=2.4s  ·  p95=8.1s

Architecture

agentforge/
├── core/         # ReAct loop + parser + prompts
├── tools/        # registry, calculator, python repl, web search, sql, rag
├── memory/       # conversation, persistent sqlite
├── llm/          # HuggingFace causal LM wrapper
├── eval/         # 4 metrics + orchestrator
├── serve/        # FastAPI app
└── cli.py        # af / agentforge

Every stage is a small module behind a small interface (LLM, Tool, Memory) — swap any of them in two lines.

Roadmap

  • ReAct loop with structured parsing
  • Tool protocol + registry
  • 5 built-in tools (calculator, python, web, sql, rag)
  • Persistent SQLite memory
  • Eval: task completion, final-answer match, tool accuracy, step efficiency
  • FastAPI server + Typer CLI
  • turboquant-ml integration (NF4 / GPTQ / AWQ models)
  • Plan-and-execute pattern alongside ReAct
  • Streaming step output in /ask
  • Tool-use chat templates (Qwen tool format, Llama-3 tool format)
  • Multi-agent coordination

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentforge_ml-0.1.0.tar.gz (31.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentforge_ml-0.1.0-py3-none-any.whl (29.6 kB view details)

Uploaded Python 3

File details

Details for the file agentforge_ml-0.1.0.tar.gz.

File metadata

  • Download URL: agentforge_ml-0.1.0.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for agentforge_ml-0.1.0.tar.gz
Algorithm Hash digest
SHA256 03b56da5a658da80cecf3f8e63aa9d37bbcca2e9dddd29555506b4f2c5dd8014
MD5 37cac5365c97367f43eb02fe56ecec1a
BLAKE2b-256 4e69bcb6e3a4a7411c6735bb9e774765522b559d615136f84d96822e6f8ffda5

See more details on using hashes here.

File details

Details for the file agentforge_ml-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agentforge_ml-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for agentforge_ml-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3bfe9f4604392c54abcf3063bf5d2e57001eea84d6963d64556c97777cb7106e
MD5 0a90e1952cf14a6202a2eaa30b9406e6
BLAKE2b-256 7f96c668b4ad0d7c19d87882add24a0859cbd9de9b56ca06420064de2271e0c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page