Skip to main content

Let LLMs interact with websites through a simple interface

Project description

🌐 Browser-Use

Open-Source Web Automation with LLMs

GitHub stars License: MIT Python 3.11+ Discord

Let LLMs interact with websites through a simple interface.

Short Example

pip install browser-use
from browser_use import Agent
from langchain_openai import ChatOpenAI

agent = Agent(
    task='Go to hackernews on show hn and give me top 10 post titels, their points and hours. Calculate for each the ratio of points per hour.',
    llm=ChatOpenAI(model='gpt-4o'),
)

await agent.run()

Demo

Prompt: Go to hackernews on show hn and give me top 10 post titels, their points and hours. Calculate for each the ratio of points per hour. (1x speed)

Prompt: Search the top 3 AI companies 2024 and find what out what concrete hardware each is using for their model. (1x speed)

Kayak flight search demo

Prompt: Go to kayak.com and find a one-way flight from Zürich to San Francisco on 12 January 2025. (2.5x speed)

Photos search demo

Prompt: Opening new tabs and searching for images for these people: Albert Einstein, Oprah Winfrey, Steve Jobs. (2.5x speed)

Local Setup

  1. Create a virtual environment and install dependencies:
# I recommend using uv
pip install .
  1. Add your API keys to the .env file:
cp .env.example .env

You can use any LLM model supported by LangChain by adding the appropriate environment variables. See langchain models for available options.

Features

  • Universal LLM Support - Works with any Language Model
  • Interactive Element Detection - Automatically finds interactive elements
  • Multi-Tab Management - Seamless handling of browser tabs
  • XPath Extraction for scraping functions - No more manual DevTools inspection
  • Vision Model Support - Process visual page information
  • Customizable Actions - Add your own browser interactions (e.g. add data to database which the LLM can use)
  • Handles dynamic content - dont worry about cookies or changing content
  • Chain-of-thought prompting with memory - Solve long-term tasks
  • Self-correcting - If the LLM makes a mistake, the agent will self-correct its actions

Advanced Examples

Chain of Agents

You can persist the browser across multiple agents and chain them together.

from langchain_anthropic import ChatAnthropic
from browser_use import Agent, Controller

# Persist browser state across agents
controller = Controller()

# Initialize browser agent
agent1 = Agent(
	task='Open 5 VCs websites in the New York area.',
	llm=ChatAnthropic(model_name='claude-3-sonnet', timeout=25, stop=None, temperature=0.3),
	controller=controller,
)
agent2 = Agent(
	task='Give me the names of the founders of the companies in all tabs.',
	llm=ChatAnthropic(model_name='claude-3-sonnet', timeout=25, stop=None, temperature=0.3),
	controller=controller,
)

await agent1.run()
founders, history = await agent2.run()

print(founders)

You can use the history to run the agents again deterministically.

Command Line Usage

Run examples directly from the command line:

python examples/try.py "Your query here" --provider [openai|anthropic]

Anthropic

You need to add ANTHROPIC_API_KEY to your environment variables. Example usage:

python examples/try.py "Search the top 3 AI companies 2024 and find out in 3 new tabs what hardware each is using for their models" --provider anthropic

OpenAI

You need to add OPENAI_API_KEY to your environment variables. Example usage:

python examples/try.py "Go to hackernews on show hn and give me top 10 post titels, their points and hours. Calculate for each the ratio of points per hour. " --provider anthropic

🤖 Supported Models

All LangChain chat models are supported. Tested with:

  • GPT-4o
  • GPT-4o Mini
  • Claude 3.5 Sonnet
  • LLama 3.1 405B

Limitations

  • When extracting page content, the message length increases and the LLM gets slower.
  • Currently one agent costs about 0.01$
  • Sometimes it tries to repeat the same task over and over again.
  • Some elements might not be extracted which you want to interact with.
  • What should we focus on the most?
    • Robustness
    • Speed
    • Cost reduction

Roadmap

  • Save agent actions and execute them deterministically
  • Pydantic forced output
  • Third party SERP API for faster Google Search results
  • Multi-step action execution to increase speed
  • Test on mind2web dataset
  • Add more browser actions

Contributing

Contributions are welcome! Feel free to open issues for bugs or feature requests.

Feel free to join the Discord for discussions and support.


Star ⭐ this repo if you find it useful!
Made with ❤️ by the Browser-Use team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browser_use-0.1.0.tar.gz (23.6 kB view details)

Uploaded Source

Built Distribution

browser_use-0.1.0-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file browser_use-0.1.0.tar.gz.

File metadata

  • Download URL: browser_use-0.1.0.tar.gz
  • Upload date:
  • Size: 23.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for browser_use-0.1.0.tar.gz
Algorithm Hash digest
SHA256 49761da1adfafe1b301f585aecf7e072b8485d4abce530d9d9bec642a8311b05
MD5 62cc1ad1044ae1ddbc071f8d2bf459dc
BLAKE2b-256 e93d246ad57b18780305522e79014a09baf342a085b308d060e32e9bae38fc20

See more details on using hashes here.

File details

Details for the file browser_use-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: browser_use-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for browser_use-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d9839f0e22cdb00cfdc0962bc6ea03d4b2c51ef00c9021b490ac95cab756f06d
MD5 aac7eb3d799170cdf62aacceb06d3808
BLAKE2b-256 4888c81d99ccf2b922cb6fe835c6fb4841eb85e007aaf90e55e0b73bfb838560

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page