AI web automation agent swarm with self-cloning capabilities
Project description
Kagebunshin ๐ฅ
Kagebunshin is a web-browsing, research-focused agent swarm with self-cloning capabilities. Built on the foundation of advanced language models, this system enables economically viable parallel web automation.
Q&A
Q: What does it do?
It works very similar to how ChatGPT agent functions. On top of it, it comes with additional features:
- cloning itself and navigate multiple branches simultaneously
- โ communicating with each other with the group chat feature: agents can โpostโ what they are working on their internal group chat, so that there is no working on the same thing, and encourage emergent behaviors.
Q: Why now?
While everyone is focusing on GPT-5โs performance, I looked at GPT-5-nanoโs. It matches or even outperforms previous gpt-4.1-mini, at the x5-10 less cost. This means we can use 5 parallel agents with nano with the same cost of running 1 agent with 4.1 mini. As far as I know, GPT agent runs on gpt-4.1-mini (now they must have updated it, right?). This implies, this can be extremely useful when you need quantity over quality, such as data collection, scraping, etc.
Q: Limitations?
- it is a legion of โdumberโ agents. While it can do dumb stuff like aggregating and collecting data, but coming up with novel conclusion must not be done by this guy. We can instead let smarter GPT to do the synthesis.
- Scalability: On my laptop it works just as fine. However, we donโt know what kind of devils are hiding in the details if we want to scale this up. I have set up comprehensive bot detection evasion, but it might not be enough when it becomes a production level scale.
Please let me know if you have any questions or comments. Thank you!
Features
- Self-cloning (Hence the name, lol) for parallelized execution
- "Agent Group Chat" for communication between clones, mitigating duplicated work & encouraging emergent behavior
- Tool-augmented agent loop via LangGraph
- Human-like delays, typing, scrolling
- Browser fingerprint and stealth adjustments
- Tab management and PDF handling
Installation
From PyPI (Recommended)
# Using uv (recommended)
uv add kagebunshin
uv run playwright install chromium
# Or using pip
pip install kagebunshin
playwright install chromium
Development Installation
For development or to get the latest features:
# Using uv
git clone https://github.com/SiwooBae/kagebunshin.git
cd kagebunshin
uv python install 3.13
uv venv -p 3.13
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv sync
uv run playwright install chromium
# Using pip
git clone https://github.com/SiwooBae/kagebunshin.git
cd kagebunshin
pip install -e .
playwright install chromium
Environment Setup
Set your API key in your environment:
export OPENAI_API_KEY="your-openai-api-key"
# or for Anthropic (if configured)
export ANTHROPIC_API_KEY="your-anthropic-api-key"
Usage
Command Line Interface
# Run the agent (using uv)
uv run -m kagebunshin "Your task description"
# Run with interactive REPL mode
uv run -m kagebunshin --repl
# Reference a markdown file as the task
uv run -m kagebunshin -r @kagebunshin/config/prompts/useful_query_templates/literature_review.md
# Combine custom query with markdown file reference
uv run -m kagebunshin "Execute this task" -r @path/to/template.md
# Available query templates:
# - @kagebunshin/config/prompts/useful_query_templates/literature_review.md
# - @kagebunshin/config/prompts/useful_query_templates/E2E_testing.md
# Or if installed with pip
kagebunshin "Your task"
kagebunshin --repl
kagebunshin -r @path/to/file.md
Programmatic Usage
Simple API (Recommended)
The simplified Agent class provides comprehensive configuration without needing to edit settings files:
import asyncio
from kagebunshin import Agent
# Simplest usage - uses intelligent defaults
async def main():
agent = Agent(task="Find me some desk toys")
result = await agent.run()
print(result)
asyncio.run(main())
With Custom LLM
from langchain.chat_models import ChatOpenAI
async def main():
agent = Agent(
task="Find repo stars and analyze trends",
llm=ChatOpenAI(model="gpt-4o-mini", temperature=0)
)
result = await agent.run()
print(result)
asyncio.run(main())
Full Configuration Example
agent = Agent(
task="Complex research with multiple steps",
# LLM Configuration
llm_model="gpt-5", # Model name
llm_provider="openai", # "openai" or "anthropic"
llm_reasoning_effort="high", # "minimal", "low", "medium", "high"
llm_temperature=0.1, # Temperature (0.0-2.0)
# Summarizer Configuration
summarizer_model="gpt-5-nano", # Cheaper model for summaries
enable_summarization=True, # Enable action summaries
# Browser Configuration
headless=False, # Visible browser
viewport_width=1280, # Browser viewport width
viewport_height=1280, # Browser viewport height
browser_executable_path="/path/chrome", # Custom browser
user_data_dir="~/chrome-profile", # Persistent profile
# Workflow Configuration
recursion_limit=200, # Max recursion depth
max_iterations=150, # Max iterations
timeout=120, # Timeout per operation
# Multi-agent Configuration
group_room="research_team", # Group chat room
username="lead_researcher" # Agent name
)
result = await agent.run()
Available Parameters
LLM Configuration:
llm: Pre-configured LLM instance (optional)llm_model: Model name (default: "gpt-5-mini")llm_provider: "openai" or "anthropic" (default: "openai")llm_reasoning_effort: "minimal", "low", "medium", "high" (default: "low")llm_temperature: Temperature 0.0-2.0 (default: 1.0)
Summarizer Configuration:
summarizer_model: Model for summaries (default: "gpt-5-nano")summarizer_provider: Provider for summarizer (default: "openai")enable_summarization: Enable action summaries (default: False)
Browser Configuration:
headless: Run in headless mode (default: False)viewport_width: Browser width (default: 1280)viewport_height: Browser height (default: 1280)browser_executable_path: Custom browser path (default: auto-detect)user_data_dir: Persistent profile directory (default: temporary)
Workflow Configuration:
recursion_limit: Max recursion depth (default: 150)max_iterations: Max iterations per task (default: 100)timeout: Timeout per operation in seconds (default: 60)
Multi-agent Configuration:
group_room: Group chat room name (default: "lobby")username: Agent name (default: auto-generated)
Advanced API
For more control over the browser lifecycle, use the lower-level KageBunshinAgent:
from kagebunshin import KageBunshinAgent
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch()
context = await browser.new_context()
orchestrator = await KageBunshinAgent.create(context)
async for chunk in orchestrator.astream("Your task"):
print(chunk)
await browser.close()
BrowseComp eval
Evaluate Kagebunshin on OpenAI's BrowseComp benchmark.
Prereqs:
- Ensure Playwright browsers are installed (see Installation). If using Chromium:
uv run playwright install chromium. - Set
OPENAI_API_KEYfor the grader model.
Quick start (uv):
uv run -m evals.run_browsercomp --headless --num-examples 20 --grader-model gpt-5 --grader-provider openai
Quick start (pip):
python -m evals.run_browsercomp --headless --num-examples 20 --grader-model gpt-5 --grader-provider openai
Options:
--num-examples N: sample N problems from the test set. When provided,--n-repeatsmust remain 1.--n-repeats N: repeat each example N times (only when running the full set).--headless: run the browser without a visible window.--browser {chromium,chrome}: choose Playwright Chromium or your local Chrome.--grader-model,--grader-provider: LLM used for grading (defaultgpt-5onopenai).--report PATH: path to save the HTML report (defaults toruns/browsecomp-report-<timestamp>.html).
Output:
- Prints aggregate metrics (e.g., accuracy) to stdout.
- Saves a standalone HTML report with prompts, responses, and per-sample scores.
Configuration
Edit kagebunshin/config/settings.py to customize:
- LLM Settings: Model/provider, temperature, reasoning effort
- Browser Settings: Executable path, user data directory, permissions
- Stealth Features: Fingerprint profiles, human behavior simulation
- Group Chat: Redis connection settings for agent coordination
- Performance: Concurrency limits, timeouts, delays
Development
Setting up for development
git clone https://github.com/SiwooBae/kagebunshin.git
cd kagebunshin
uv sync --all-extras
uv run playwright install chromium
Code Quality
The project includes tools for maintaining code quality:
# Format code
uv run black .
uv run isort .
# Lint code
uv run flake8 kagebunshin/
# Type checking
uv run mypy kagebunshin/
Testing
Kagebunshin includes a comprehensive unit test suite following TDD (Test-Driven Development) principles:
# Run all tests
uv run pytest
# Run tests with verbose output
uv run pytest -v
# Run specific test module
uv run pytest tests/core/test_agent.py
# Run tests with coverage report
uv run pytest --cov=kagebunshin
# Run tests in watch mode (requires pytest-watch)
ptw -- --testmon
Test Structure
The test suite covers all major components with 155 comprehensive tests:
tests/
โโโ conftest.py # Shared fixtures and test configuration
โโโ core/ # Core functionality tests (63 tests)
โ โโโ test_agent.py # KageBunshinAgent initialization & workflow (15 tests)
โ โโโ test_state.py # State models and validation (14 tests)
โ โโโ test_state_manager.py # Browser operations & page management (34 tests)
โโโ tools/ # Agent tools tests (11 tests)
โ โโโ test_delegation.py # Shadow clone delegation system
โโโ communication/ # Group chat tests (17 tests)
โ โโโ test_group_chat.py # Redis-based communication
โโโ utils/ # Utility function tests (35 tests)
โ โโโ test_formatting.py # Text/HTML formatting & normalization (27 tests)
โ โโโ test_naming.py # Agent name generation (8 tests)
โโโ automation/ # Browser automation tests (29 tests)
โโโ test_behavior.py # Human behavior simulation
# Configuration files (in project root):
pytest.ini # Pytest configuration with asyncio support
Project Structure
Kagebunshin features a clean, modular architecture optimized for readability and extensibility:
kagebunshin/
โโโ core/ # ๐ง Core agent functionality
โ โโโ agent.py # Main KageBunshinAgent orchestrator
โ โโโ state.py # State models and data structures
โ โโโ state_manager.py # Browser state operations
โ
โโโ automation/ # ๐ค Browser automation & stealth
โ โโโ behavior.py # Human behavior simulation
โ โโโ fingerprinting.py # Browser fingerprint evasion
โ โโโ browser/ # Browser-specific utilities
โ
โโโ tools/ # ๐ง Agent tools & capabilities
โ โโโ delegation.py # Agent cloning and delegation
โ
โโโ communication/ # ๐ฌ Agent coordination
โ โโโ group_chat.py # Redis-based group chat
โ
โโโ cli/ # ๐ฅ๏ธ Command-line interface
โ โโโ runner.py # CLI runner and REPL
โ โโโ ui/ # Future UI components
โ
โโโ config/ # โ๏ธ Configuration management
โ โโโ settings.py # All configuration settings
โ โโโ prompts/ # System prompts and query templates
โ โโโ kagebunshin_system_prompt.md # Main system prompt
โ โโโ kagebunshin_system_prompt_v2.md # Alternative system prompt
โ โโโ tell_the_cur_state.md # State description prompt
โ โโโ useful_query_templates/ # Pre-built query templates
โ โโโ literature_review.md # Academic literature review
โ โโโ E2E_testing.md # End-to-end testing
โ
โโโ utils/ # ๐ ๏ธ Shared utilities
โโโ formatting.py # HTML/text formatting for LLM
โโโ logging.py # Logging utilities
โโโ naming.py # Agent name generation
Key Components
- ๐ง Core Agent: Orchestrates web automation tasks using LangGraph
- ๐ค Automation: Human-like behavior simulation and stealth browsing
- ๐ง Tools: Agent delegation system for parallel task execution
- ๐ฌ Communication: Redis-based group chat for agent coordination
- ๐ฅ๏ธ CLI: Interactive command-line interface with streaming updates
Contributing
We welcome contributions! Please read CONTRIBUTING.md for guidelines on how to contribute to this project.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built with LangGraph for agent orchestration
- Uses Playwright for browser automation
- Inspired by the need for cost-effective parallel web automation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kagebunshin-0.1.5.tar.gz.
File metadata
- Download URL: kagebunshin-0.1.5.tar.gz
- Upload date:
- Size: 337.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
420520d124eb9c23a5bc94084928e7168b65ef8af9e2ddc965e57875425c9ba1
|
|
| MD5 |
6791a30673bf24aff053486809a8e71a
|
|
| BLAKE2b-256 |
b833fc81a39905767fe0c45870c3853e40714fa154409dcfee823065fb789aaf
|
File details
Details for the file kagebunshin-0.1.5-py3-none-any.whl.
File metadata
- Download URL: kagebunshin-0.1.5-py3-none-any.whl
- Upload date:
- Size: 162.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d37145c3db4d9b856d0c878e0902f079831b965799d2d7bb50a6445057faa8e7
|
|
| MD5 |
e20ee67434ea4836012807aaaba3be1a
|
|
| BLAKE2b-256 |
28c3c21889d50fe61630d5991b44bcc0c273f97f08000bb4f0bf5e2fc55e61c0
|