Skip to main content

Alita with Sandbox and Dynamic MCP Box

Project description

CAlita: Adaptive LLM-based Iterative Task Automation

Inspired by the paper "Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution".

CAlita is an intelligent meta-agent system that automatically invents Python scripts as tools to solve complex tasks through an iterative CodeReAct (Code Reasoning and Acting) loop. The system can analyze natural language requirements, detect capability gaps, search for external resources, generate and register executable code, and manage isolated execution environments between tasks. CAlita refer to OpenAlita๏ผŒrebuild with E2B sandbox and dynamic McpBox.

๐Ÿš€ Key Features

  • Intelligent Task Analysis: Uses LLM-powered brainstorming to analyze tasks and detect capability gaps
  • Dynamic Code Generation: Automatically generates self-contained Python scripts based on task specifications
  • Sandbox Execution: Use E2B sandbox execute script and code
  • External Resource Integration: Searches and incorporates web resources when needed
  • Iterative Refinement: Learns from execution failures and refines solutions automatically
  • MCP Registry: Stores and reuses successful Model Context Protocols (MCPBox)
  • Comprehensive Benchmarking: Supports evaluation on GAIA, MathVista, and PathVQA datasets

๐Ÿ—๏ธ Architecture

CAlita consists of several core components:

Core Modules

  • MangerAgent: Central orchestrates WebAgent, McpToolAgent, McpCreationAgent
  • McpCreationAgentPro: More Simple and Efficient McpCreationAgent
  • McpCreationAgent: Central coordinator that orchestrates the entire pipeline
  • MCPBrainstorm: Analyzes tasks and generates tool specifications using LLM
  • ResearchAgent: Performs intelligent information retrieval using LangGraph and MCP tools
  • ScriptGenerator: Generates executable Python scripts from specifications
  • CodeRunner: Executes scripts in E2B Sandbox
  • MCPRegistry: Persistent storage for successful Model Context Protocols Tool to McpBox
  • Benchmark: Evaluation framework for multiple datasets

Workflow

flowchart TD
    A["๐ŸŽฏ Input Task"] --> B["๐Ÿง  McpCreationAgent.generate()"]
    B --> C["๐Ÿ“Š MCPBrainstorm.brainstorm()"]
    C --> D{"๐Ÿ” Capability Gap Detected?"}
    
    D -->|Yes| E["๐ŸŒ ResearchAgent.search()"]
    D -->|No| F["๐Ÿ“ ScriptGenerator.generate_script()"]
    
    E --> G["๐Ÿ”— ResearchAgent.retrieve()"]
    G --> H["๐Ÿ“š Collect External Resources"]
    H --> F
    
    F --> I["๐Ÿ—๏ธ EnvironmentManager.create_environment()"]
    I --> J["๐Ÿ“ฆ EnvironmentManager.install_dependencies()"]
    J --> K["โ–ถ๏ธ CodeRunner.run_script()"]
    
    K --> L{"โœ… Execution Successful?"}
    
    L -->|Yes| M["๐Ÿ’พ MCPRegistry.register_mcp()"]
    L -->|No| N{"๐Ÿ”„ Max Iterations Reached?"}
    
    N -->|No| O["๐Ÿ“ Update Context with Error"]
    O --> C
    N -->|Yes| P["โŒ Return Failure"]
    
    M --> Q["โœจ Return Success Result"]
    
    style A fill:#e1f5fe
    style B fill:#f3e5f5
    style C fill:#fff3e0
    style E fill:#e8f5e8
    style F fill:#fff8e1
    style K fill:#fce4ec
    style M fill:#e0f2f1
    style Q fill:#e8f5e8
    style P fill:#ffebee

Detailed Process Flow:

  1. Task Analysis: Analyze input task and detect capability gaps
  2. Resource Gathering: Search external resources if gaps are detected
  3. Script Generation: Generate self-contained Python script
  4. Execution: Run script and capture output
  5. Registration: Store successful scripts as reusable MCPs
  6. Iteration: Refine based on feedback if execution fails

๐Ÿ“‹ Prerequisites

  • Python 3.13+
  • Required Python packages (see installation section)

๐Ÿ› ๏ธ Installation

  1. Clone the repository:

    git clone <repository-url>
    cd CAlita_repo
    
  2. Install dependencies:

    uv sync --python 3.13
    
  3. Set up configuration:

    • Copy config.yaml.example to config.yaml and update the API keys:
    api:
      litellm_api_key: "your-actual-litellm-api-key-here"
      openai_api_key: "your-actual-openai-api-key-here"
      anthropic_api_key: "your-actual-anthropic-api-key-here"  # If using Anthropic models
    
  4. Run Calita App:

      uv run calita
    
  5. Run McpBox Server:

      uv run calita-mcpbox
    

โš™๏ธ Configuration

The system is configured through config.yaml. Key configuration sections:

Agent Configuration

agent:
  primary_llm: "openai/qwen3-235b-a22b"                   # Primary LLM model
  coder_llm: "openai/qwen3-coder-480b-a35b-instruct"      # Coder model
  reason_llm: "openai/qwen3-235b-a22b-thinking-2507"      # Reason model
  script_gen_prompt_template: "templates/script_template.txt"

API Configuration

api:
  litellm_api_key: "<YOUR_LITELLM_API_KEY_HERE>"  # LITELLM OpenSource API key
  litellm_api_url: "https://dashscope.aliyuncs.com/compatible-mode/v1"   # OpenSource API endpoint
  openai_api_key: "<YOUR_OPENAI_API_KEY_HERE>"  # OpenAI API key
  openai_api_url: "https://api.openai.com/v1"   # OpenAI API endpoint
  anthropic_api_key: "<YOUR_ANTHROPIC_API_KEY_HERE>"  # Anthropic API key
  anthropic_base_url: "https://api.anthropic.com"  # Anthropic API endpoint (optional)
exa:
  exa_api_key: "<YOUR_EXA_API_KEY_HERE>"        # Exa API key for semantic search

Benchmark Configuration

benchmark:
  gaia:
    dataset_path: "data/gaia.json"
  mathvista:
    sample_size: 100
    dataset_path: "data/mathvista.json"
  pathvqa:
    sample_size: 100
    dataset_path: "data/pathvqa.json"

๐Ÿš€ Usage

Single Task Mode

Run CAlita on a single natural language task:

uv run calita

Set the experiment mode in config.yaml:

misc:
  experiment_mode: "single_task"

Then enter your task when prompted:

Enter a natural language query/task: Calculate the fibonacci sequence up to 100

Benchmark Mode

Run evaluation on benchmark datasets:

uv run calita

Set the experiment mode in config.yaml:

misc:
  experiment_mode: "benchmark"

This will evaluate the system on GAIA, MathVista, and PathVQA datasets and output metrics including pass@1 and pass@3 scores.

Programmatic Usage

from calita.manager_agent import ManagerAgent
from calita.utils.utils import get_global_config

# Load configuration
config = get_global_config("config.yaml")

# Initialize the agent
manager = ManagerAgent(config)

# Process a task
result = manager.generate("Create a function to sort a list of numbers")
print(result)

๐Ÿ“ Project Structure

CAlita_repo/
โ”œโ”€โ”€ calita/            # src, all code 
โ”‚   โ”œโ”€โ”€ manager_sub_agents # manager coodinate sub agents code
โ”‚   โ””โ”€โ”€ mcp_creation # mcp tool create and run code 
โ”‚   โ””โ”€โ”€ tools # pypi eg.. tools code
โ”‚   โ””โ”€โ”€ utils # utility code
โ”œโ”€โ”€ examples/      # example code 
โ”œโ”€โ”€ mcp_config/      # MCP Server config 
โ”œโ”€โ”€ templates/            # Prompt templates 
โ”‚   โ”œโ”€โ”€ brain_storm_template.txt # analysis task
โ”‚   โ””โ”€โ”€ script_template.txt  # ScriptGenerator create mcp tool script
โ”‚   โ””โ”€โ”€ final_result_template.txt  # FinalResultAgent evaluate result and formate result
โ”‚   โ””โ”€โ”€ mcp_tool_fetch_template.txt   # McpToolAgent fetch mcp tools
โ”‚   โ””โ”€โ”€ task_plan_template.txt  # TaskPlanAgent plan task
โ”œโ”€โ”€ data/                 # Dataset files (create this directory)
โ”‚   โ”œโ”€โ”€ gaia.json
โ”‚   โ”œโ”€โ”€ mathvista.json
โ”‚   โ””โ”€โ”€ pathvqa.json
โ””โ”€โ”€ logs/                 # Log files (auto-created)
    โ””โ”€โ”€ CAlita.log

๐Ÿ“Š Evaluation Metrics

The system supports comprehensive evaluation with the following metrics:

  • Pass@1: Success rate on first attempt
  • Pass@3: Success rate within 3 attempts
  • Dataset-specific metrics:
    • GAIA: Breakdown by difficulty levels (Level 1, 2, 3)
    • MathVista: Mathematical reasoning accuracy
    • PathVQA: Medical image question answering accuracy

๐Ÿ” Logging

Logs are automatically generated in logs/CAlita.log. Configure logging level in config.yaml:

logging:
  level: "INFO"              # DEBUG, INFO, WARNING, ERROR
  log_file: "logs/CAlita.log"

Inspiration and Credits

This project is inspired by the CAlita project by CharlesQ9 and the concepts presented in the research paper "CAlita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution".

Original CAlita Project: CharlesQ9/CAlita on GitHub Research Paper: CAlita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution (arXiv:2505.20286) Full credits to the authors and contributors of these works for the foundational architecture and ideas.

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • OpenAI for providing the LLM API
  • The research community for benchmark datasets (GAIA, MathVista, PathVQA)
  • Contributors and maintainers of the open-source libraries used

๐Ÿ“ž Support

For questions, issues, or contributions, please:

  • Open an issue on GitHub
  • Check the logs in logs/CAlita.log for debugging
  • Ensure your OpenAI API key is properly configured

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

calita-0.1.2.tar.gz (872.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

calita-0.1.2-py3-none-any.whl (49.8 kB view details)

Uploaded Python 3

File details

Details for the file calita-0.1.2.tar.gz.

File metadata

  • Download URL: calita-0.1.2.tar.gz
  • Upload date:
  • Size: 872.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for calita-0.1.2.tar.gz
Algorithm Hash digest
SHA256 eaac71373acbb6becc95a0a17ffd8f26735790e062dcd916c64f44ba0ba9fa09
MD5 c2a1a388e1c195614c2953d14b1b52fd
BLAKE2b-256 5ad09b79b12433e54a547e25815fd0f8c703b5c1d8395c811f9370675cbd919c

See more details on using hashes here.

File details

Details for the file calita-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: calita-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 49.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for calita-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8a8fe482a5b9ecf981bfb19d56ca0400bb634ccb62529e4c1cbe05b94ca99580
MD5 295a578c2b30ce88ba6576ae05952835
BLAKE2b-256 fa3640061c24f65400ace17df5d38e26462797a88cca06410006319e80a31e51

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page