Let LLMs interact with websites through a simple interface
Project description
Let LLMs interact with websites through a simple interface.
Short Example
pip install browser-use
from browser_use import Agent
from langchain_openai import ChatOpenAI
agent = Agent(
task='Go to hackernews on show hn and give me top 10 post titels, their points and hours. Calculate for each the ratio of points per hour.',
llm=ChatOpenAI(model='gpt-4o'),
)
await agent.run()
Demo
Prompt: Go to hackernews on show hn and give me top 10 post titels, their points and hours. Calculate for each the ratio of points per hour. (1x speed)
Prompt: Search the top 3 AI companies 2024 and find what out what concrete hardware each is using for their model. (1x speed)
Prompt: Go to kayak.com and find a one-way flight from Zürich to San Francisco on 12 January 2025. (2.5x speed)
Prompt: Opening new tabs and searching for images for these people: Albert Einstein, Oprah Winfrey, Steve Jobs. (2.5x speed)
Local Setup
- Create a virtual environment and install dependencies:
# I recommend using uv
pip install .
- Add your API keys to the
.env
file:
cp .env.example .env
You can use any LLM model supported by LangChain by adding the appropriate environment variables. See langchain models for available options.
Features
- Universal LLM Support - Works with any Language Model
- Interactive Element Detection - Automatically finds interactive elements
- Multi-Tab Management - Seamless handling of browser tabs
- XPath Extraction for scraping functions - No more manual DevTools inspection
- Vision Model Support - Process visual page information
- Customizable Actions - Add your own browser interactions (e.g. add data to database which the LLM can use)
- Handles dynamic content - dont worry about cookies or changing content
- Chain-of-thought prompting with memory - Solve long-term tasks
- Self-correcting - If the LLM makes a mistake, the agent will self-correct its actions
Advanced Examples
Chain of Agents
You can persist the browser across multiple agents and chain them together.
from langchain_anthropic import ChatAnthropic
from browser_use import Agent, Controller
# Persist browser state across agents
controller = Controller()
# Initialize browser agent
agent1 = Agent(
task='Open 5 VCs websites in the New York area.',
llm=ChatAnthropic(model_name='claude-3-sonnet', timeout=25, stop=None, temperature=0.3),
controller=controller,
)
agent2 = Agent(
task='Give me the names of the founders of the companies in all tabs.',
llm=ChatAnthropic(model_name='claude-3-sonnet', timeout=25, stop=None, temperature=0.3),
controller=controller,
)
await agent1.run()
founders, history = await agent2.run()
print(founders)
You can use the history
to run the agents again deterministically.
Command Line Usage
Run examples directly from the command line:
python examples/try.py "Your query here" --provider [openai|anthropic]
Anthropic
You need to add ANTHROPIC_API_KEY
to your environment variables. Example usage:
python examples/try.py "Search the top 3 AI companies 2024 and find out in 3 new tabs what hardware each is using for their models" --provider anthropic
OpenAI
You need to add OPENAI_API_KEY
to your environment variables. Example usage:
python examples/try.py "Go to hackernews on show hn and give me top 10 post titels, their points and hours. Calculate for each the ratio of points per hour. " --provider anthropic
🤖 Supported Models
All LangChain chat models are supported. Tested with:
- GPT-4o
- GPT-4o Mini
- Claude 3.5 Sonnet
- LLama 3.1 405B
Limitations
- When extracting page content, the message length increases and the LLM gets slower.
- Currently one agent costs about 0.01$
- Sometimes it tries to repeat the same task over and over again.
- Some elements might not be extracted which you want to interact with.
- What should we focus on the most?
- Robustness
- Speed
- Cost reduction
Roadmap
- Save agent actions and execute them deterministically
- Pydantic forced output
- Third party SERP API for faster Google Search results
- Multi-step action execution to increase speed
- Test on mind2web dataset
- Add more browser actions
Contributing
Contributions are welcome! Feel free to open issues for bugs or feature requests.
Feel free to join the Discord for discussions and support.
Made with ❤️ by the Browser-Use team
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file browser_use-0.1.0.tar.gz
.
File metadata
- Download URL: browser_use-0.1.0.tar.gz
- Upload date:
- Size: 23.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49761da1adfafe1b301f585aecf7e072b8485d4abce530d9d9bec642a8311b05 |
|
MD5 | 62cc1ad1044ae1ddbc071f8d2bf459dc |
|
BLAKE2b-256 | e93d246ad57b18780305522e79014a09baf342a085b308d060e32e9bae38fc20 |
File details
Details for the file browser_use-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: browser_use-0.1.0-py3-none-any.whl
- Upload date:
- Size: 24.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9839f0e22cdb00cfdc0962bc6ea03d4b2c51ef00c9021b490ac95cab756f06d |
|
MD5 | aac7eb3d799170cdf62aacceb06d3808 |
|
BLAKE2b-256 | 4888c81d99ccf2b922cb6fe835c6fb4841eb85e007aaf90e55e0b73bfb838560 |