Alita with Sandbox and Dynamic MCP Box
Project description
CAlita: Adaptive LLM-based Iterative Task Automation
Inspired by the paper "Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution".
CAlita is an intelligent meta-agent system that automatically invents Python scripts as tools to solve complex tasks through an iterative CodeReAct (Code Reasoning and Acting) loop. The system can analyze natural language requirements, detect capability gaps, search for external resources, generate and register executable code, and manage isolated execution environments between tasks. CAlita refer to OpenAlita๏ผrebuild with E2B sandbox and dynamic McpBox.
๐ Key Features
- Intelligent Task Analysis: Uses LLM-powered brainstorming to analyze tasks and detect capability gaps
- Dynamic Code Generation: Automatically generates self-contained Python scripts based on task specifications
- Sandbox Execution: Use E2B sandbox execute script and code
- External Resource Integration: Searches and incorporates web resources when needed
- Iterative Refinement: Learns from execution failures and refines solutions automatically
- MCP Registry: Stores and reuses successful Model Context Protocols (MCPBox)
- Comprehensive Benchmarking: Supports evaluation on GAIA, MathVista, and PathVQA datasets
๐๏ธ Architecture
CAlita consists of several core components:
Core Modules
- MangerAgent: Central orchestrates WebAgent, McpToolAgent, McpCreationAgent
- McpCreationAgentPro: More Simple and Efficient McpCreationAgent
- McpCreationAgent: Central coordinator that orchestrates the entire pipeline
- MCPBrainstorm: Analyzes tasks and generates tool specifications using LLM
- ResearchAgent: Performs intelligent information retrieval using LangGraph and MCP tools
- ScriptGenerator: Generates executable Python scripts from specifications
- CodeRunner: Executes scripts in E2B Sandbox
- MCPRegistry: Persistent storage for successful Model Context Protocols Tool to McpBox
- Benchmark: Evaluation framework for multiple datasets
Detailed Process Flow:
- Task Analysis: Analyze input task and detect capability gaps
- Resource Gathering: Search external resources if gaps are detected
- Script Generation: Generate self-contained Python script
- Execution: Run script and capture output
- Registration: Store successful scripts as reusable MCPs
- Iteration: Refine based on feedback if execution fails
๐ Prerequisites
- Python 3.13+
- Required Python packages (see installation section)
๐ ๏ธ Installation
-
Clone the repository:
git clone <repository-url> cd CAlita_repo
-
Install dependencies:
uv sync --python 3.13
-
Set up configuration:
- Copy
config.yaml.exampletoconfig.yamland update the API keys:
api: litellm_api_key: "your-actual-litellm-api-key-here" openai_api_key: "your-actual-openai-api-key-here" anthropic_api_key: "your-actual-anthropic-api-key-here" # If using Anthropic models
- Copy
-
Config OS ENV:
export E2B_API_KEY=XX export E2B_ACCESS_TOKEN=XX export LITELLM_API_KEY=XX
-
Run Calita App:
uv run calita
-
Run McpBox Server:
uv run calita-mcpbox
โ๏ธ Configuration
The system is configured through config.yaml. Key configuration sections:
Agent Configuration
agent:
primary_llm: "openai/qwen3-235b-a22b" # Primary LLM model
coder_llm: "openai/qwen3-coder-480b-a35b-instruct" # Coder model
reason_llm: "openai/qwen3-235b-a22b-thinking-2507" # Reason model
script_gen_prompt_template: "templates/script_template.txt"
API Configuration
api:
litellm_api_key: "<YOUR_LITELLM_API_KEY_HERE>" # LITELLM OpenSource API key
litellm_api_url: "https://dashscope.aliyuncs.com/compatible-mode/v1" # OpenSource API endpoint
openai_api_key: "<YOUR_OPENAI_API_KEY_HERE>" # OpenAI API key
openai_api_url: "https://api.openai.com/v1" # OpenAI API endpoint
anthropic_api_key: "<YOUR_ANTHROPIC_API_KEY_HERE>" # Anthropic API key
anthropic_base_url: "https://api.anthropic.com" # Anthropic API endpoint (optional)
Benchmark Configuration
benchmark:
gaia:
dataset_path: "data/gaia.json"
mathvista:
sample_size: 100
dataset_path: "data/mathvista.json"
pathvqa:
sample_size: 100
dataset_path: "data/pathvqa.json"
๐ Usage
Single Task Mode
Run CAlita on a single natural language task:
uv run calita
Then enter your task when prompted:
Enter a natural language query/task: Calculate the fibonacci sequence up to 100
Benchmark Mode
Run evaluation on benchmark datasets:
uv run calita
This will evaluate the system on GAIA, MathVista, and PathVQA datasets and output metrics including pass@1 and pass@3 scores.
Programmatic Usage
from calita.manager_agent import ManagerAgent
from calita.utils.utils import get_global_config, setup_logging
# Load configuration
config = get_global_config("config.yaml")
setup_logging(config)
# Initialize the agent
manager = ManagerAgent(config)
# Process a task
result = manager.generate("Create a function to sort a list of numbers")
print(result)
๐ Project Structure
CAlita_repo/
โโโ calita/ # src, all code
โ โโโ manager_sub_agents # manager coodinate sub agents code
โ โโโ mcp_creation # mcp tool create and run code
โ โโโ tools # pypi eg.. tools code
โ โโโ utils # utility code
โโโ examples/ # example code
โโโ mcp_config/ # MCP Server config
โโโ templates/ # Prompt templates
โ โโโ brain_storm_template.txt # analysis task
โ โโโ script_template.txt # ScriptGenerator create mcp tool script
โ โโโ final_result_template.txt # FinalResultAgent evaluate result and formate result
โ โโโ mcp_tool_fetch_template.txt # McpToolAgent fetch mcp tools
โ โโโ task_plan_template.txt # TaskPlanAgent plan task
โโโ data/ # Dataset files (create this directory)
โ โโโ gaia.json
โ โโโ mathvista.json
โ โโโ pathvqa.json
โโโ logs/ # Log files (auto-created)
โโโ CAlita.log
๐ Evaluation Metrics
The system supports comprehensive evaluation with the following metrics:
- Pass@1: Success rate on first attempt
- Pass@3: Success rate within 3 attempts
- Dataset-specific metrics:
- GAIA: Breakdown by difficulty levels (Level 1, 2, 3)
- MathVista: Mathematical reasoning accuracy
- PathVQA: Medical image question answering accuracy
๐ Logging
Logs are automatically generated in logs/CAlita.log. Configure logging level in config.yaml:
logging:
level: "INFO" # DEBUG, INFO, WARNING, ERROR
log_file: "logs/CAlita.log"
Inspiration and Credits
This project is inspired by the CAlita project by CharlesQ9 and the concepts presented in the research paper "CAlita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution".
Original CAlita Project: CharlesQ9/CAlita on GitHub Research Paper: CAlita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution (arXiv:2505.20286) Full credits to the authors and contributors of these works for the foundational architecture and ideas.
๐ค Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- OpenAI for providing the LLM API
- The research community for benchmark datasets (GAIA, MathVista, PathVQA)
- Contributors and maintainers of the open-source libraries used
๐ Support
For questions, issues, or contributions, please:
- Open an issue on GitHub
- Check the logs in
logs/CAlita.logfor debugging - Ensure your OpenAI API key is properly configured
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file calita-0.4.3.tar.gz.
File metadata
- Download URL: calita-0.4.3.tar.gz
- Upload date:
- Size: 873.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3113c615dbd9e9cfb6275d4e04ed93483075ac9f86c61a42c86f54b13ff38ad
|
|
| MD5 |
927ee9bdb9eb967e79aada0ce7b701b5
|
|
| BLAKE2b-256 |
7351e2a65f3d12d6b4aa605077e7ed7a9b2bc022de105f9585dc178b73c78cb1
|
File details
Details for the file calita-0.4.3-py3-none-any.whl.
File metadata
- Download URL: calita-0.4.3-py3-none-any.whl
- Upload date:
- Size: 48.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79ccd6720a4042c3e146f77d6f7aea02ce5046bf99ccfd739b668dcd95d87e20
|
|
| MD5 |
519bb747f45eddcca3edc6055bd631d3
|
|
| BLAKE2b-256 |
a5cde58a2e472adc4dde3971faf6902a46dfb60568720de686f6bcc511463d51
|