An effective and easy-to-use agentic framework with extendable tools for complex reasoning.
Project description
OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning
Updates
-
TBD: We're excited to collaborate with the community to expand OctoTools to more tools, domains, and beyond! Join our Discord to get started!
-
2025-04-17 🚀: Support for a broader range of LLM engines is available now! See the full list of supported LLM engines here.
-
2025-03-08 📺: Thrilled to have OctoTools featured in a tutorial by Discover AI at YouTube! Watch the engaging video here.
-
2025-02-16 📄: Our paper is now available as a preprint on ArXiv! Read it here!
Get Started
YouTube Tutorial
Excited to have a tutorial video for OctoTools covered by Discover AI at YouTube!
Introduction
We introduce OctoTools, a training-free, user-friendly, and easily extensible open-source agentic framework designed to tackle complex reasoning across diverse domains. OctoTools introduces standardized tool cards to encapsulate tool functionality, a planner for both high-level and low-level planning, and an executor to carry out tool usage.
Tool cards define tool-usage metadata and encapsulate heterogeneous tools, enabling training-free integration of new tools without additional training or framework refinement. (2) The planner governs both high-level and low-level planning to address the global objective and refine actions step by step. (3) The executor instantiates tool calls by generating executable commands and save structured results in the context. The final answer is summarized from the full trajectory in the context. Furthermore, the task-specific toolset optimization algorithm learns a beneficial subset of tools for downstream tasks.
We validate OctoTools’ generality across 16 diverse tasks (including MathVista, MMLU-Pro, MedQA, and GAIA-Text), achieving substantial average accuracy gains of 9.3% over GPT-4o. Furthermore, OctoTools also outperforms AutoGen, GPT-Functions and LangChain by up to 10.6% when given the same set of tools.
Supported LLM Engines
We support a broad range of LLM engines, including GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, and more.
| Model Family | Engines (Multi-modal) | Engines (Text-Only) | Official Model List |
|---|---|---|---|
| OpenAI | gpt-4-turbo, gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o1, o3, o1-pro, o4-mini (soon) |
gpt-3.5-turbo, gpt-4, o1-mini, o3-mini |
OpenAI Models |
| Anthropic | claude-3-haiku-20240307, claude-3-sonnet-20240229, claude-3-opus-20240229, claude-3-5-sonnet-20240620, claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022, claude-3-7-sonnet-20250219 |
Anthropic Models | |
| TogetherAI | Most multi-modal models, including meta-llama/Llama-4-Scout-17B-16E-Instruct, Qwen/QwQ-32B, Qwen/Qwen2-VL-72B-Instruct |
Most text-only models, including meta-llama/Llama-3-70b-chat-hf, Qwen/Qwen2-72B-Instruct |
TogetherAI Models |
| DeepSeek | deepseek-chat, deepseek-reasoner |
DeepSeek Models |
Installation
Create a conda environment from the conda.yaml file:
conda env create -f conda.yaml
Activate the environment and install requirements:
conda activate octotools
pip install -e .
Make .env file, and set OPENAI_API_KEY, GOOGLE_API_KEY, GOOGLE_CX, etc. For example:
# The content of the .env file
# Used for GPT-4o-powered tools
OPENAI_API_KEY=<your-api-key-here>
# Used for the Google Search tool
GOOGLE_API_KEY=<your-api-key-here>
GOOGLE_CX=<your-cx-here>
# Used for the Advanced Object Detector tool (Optional)
DINO_KEY=<your-dino-key-here>
Obtain a Google API Key and Google CX according to the Google Custom Search API documation.
Install parallel for running benchmark experiments in parallel:
sudo apt-get update
sudo apt-get install parallel
Test tools in the toolbox
Using Python_Code_Generator_Tool as an example, test the availability of the tool by running the following:
cd octotools/tools/python_code_generator
python tool.py
Expected output:
Execution Result: {'printed_output': 'The sum of all the numbers in the list is: 15', 'variables': {'numbers': [1, 2, 3, 4, 5], 'total_sum': 15}}
You can also test all tools available in the toolbox by running the following:
cd octotools/tools
source test_all_tools.sh
Expected testing log:
Testing advanced_object_detector...
✅ advanced_object_detector passed
Testing arxiv_paper_searcher...
✅ arxiv_paper_searcher passed
...
Testing wikipedia_knowledge_searcher...
✅ wikipedia_knowledge_searcher passed
Done testing all tools
Failed: 0
Run inference on benchmarks
Using CLEVR-Math as an example, run inference on a benchmark by:
cd octotools/tasks
# Run inference from clevr-math using GPT-4 only
source clevr-math/run_gpt4o.sh
# Run inference from clevr-math using the base tool
source clevr-math/run_octotool_base.sh
# Run inference from clevr-math using Octotools with an optimized toolset
source clevr-math/run_octotools.sh
More benchmarks are available in the tasks.
Experiments
Main results
To demonstrate the generality of our OctoTools framework, we conduct comprehensive evaluations on 16 diverse benchmarks spanning two modalities, five domains, and four reasoning types. These benchmarks encompass a wide range of complex reasoning tasks, including visual understanding, numerical calculation, knowledge retrieval, and multi-step reasoning.
More results are available in the paper or at the project page.
In-depth analysis
We provide a set of in-depth analyses to help you understand the framework. For instance, we visualize the tool usage of OctoTools and its baselines from 16 tasks. It turns out that OctoTools takes advantage of different external tools to address task-specific challenges. Explore more findings at our paper or the project page.
Example visualizations
We provide a set of example visualizations to help you understand the framework. Explore them at the project page.
Customize OctoTools
The design of each tool card is modular relative to the OctoTools framework, enabling users to integrate diverse tools without modifying the underlying framework or agent logic. New tool cards can be added, replaced, or updated with minimal effort, making OctoTools robust and extensible as tasks grow in complexity.
To customize OctoTools for your own tasks:
-
Add a new tool card: Implement your tool following the structure in existing tools.
-
Replace or update existing tools: You can replace or update tools in the toolbox. For example, we provide the
Object_Detector_Toolto detect objects in images using an open-source model. We also provide an alternative tool called theAdvanced_Object_Detector_Toolto detect objects in images using API calls. -
Enable tools for your tasks: You can enable the whole toolset or a subset of tools for your own tasks by setting the
enabled_toolsargument in tasks/solve.py.
Resources
Inspiration
This project draws inspiration from several remarkable projects:
- 📕 Chameleon – Chameleon is an early attempt that augments LLMs with tools, which is a major source of inspiration. A journey of a thousand miles begins with a single step.
- 📘 TextGrad – We admire and appreciate TextGrad for its innovative and elegant framework design.
- 📗 AutoGen – A trending project that excels in building agentic systems.
- 📙 LangChain – A powerful framework for constructing agentic systems, known for its rich functionalities.
Citation
@article{lu2025octotools,
title={OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning},
author={Lu, Pan and Chen, Bowen and Liu, Sheng and Thapa, Rahul and Boen, Joseph and Zou, James},
journal = {arXiv preprint arXiv:2502.11271},
year={2025}
}
Our Team
|
Pan Lu |
Bowen Chen |
Sheng Liu |
Rahul Thapa |
Joseph Boen |
James Zou |
Contributors
We are trully looking forward to the open-source contributions to OctoTools! If you are interested in contributing, collaborating, or reporting issues, don't hesitate to contact us!
We are also looking forward to your feedback and suggestions!
Star History
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file octotools_test_001-0.0.2.tar.gz.
File metadata
- Download URL: octotools_test_001-0.0.2.tar.gz
- Upload date:
- Size: 56.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c5bb54380367a02605b62639414ace3e469eec0205174fdd5ebe7c219145ce0
|
|
| MD5 |
fb1e830c32bbcc0a8e1d5684a319fe17
|
|
| BLAKE2b-256 |
3f3d1b84acc5502c46db0c3ef3edcaf7367ab0be2ed019aa988a36ea2f7248b4
|
File details
Details for the file octotools_test_001-0.0.2-py3-none-any.whl.
File metadata
- Download URL: octotools_test_001-0.0.2-py3-none-any.whl
- Upload date:
- Size: 69.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0282b7db8640a37636d6d6ed7cdd4819088164dfffe3c0fc5765f631ef39916
|
|
| MD5 |
7869fd764726501278fe07c04f12cf32
|
|
| BLAKE2b-256 |
b1fbd4843e85cbd6cfdcbba31fa4ea1eeda29aad8f037f0ca1d51f453769c15e
|