Skip to main content

Make websites accessible for AI agents

Project description

Browser Use Logo

GitHub stars License: MIT Python 3.11+ Discord

Make websites accessible for AI agents 🤖.

Browser use is the easiest way to connect your AI agents with the browser. If you have used Browser Use for your project feel free to show it off in our Discord.

Quick start

With pip:

pip install browser-use

(optional) install playwright:

playwright install

Spin up your agent:

from langchain_openai import ChatOpenAI
from browser_use import Agent
import asyncio

async def main():
    agent = Agent(
        task="Find a one-way flight from Bali to Oman on 12 January 2025 on Google Flights. Return me the cheapest option.",
        llm=ChatOpenAI(model="gpt-4o"),
    )
    result = await agent.run()
    print(result)

asyncio.run(main())

And don't forget to add your API keys to your .env file.

OPENAI_API_KEY=
ANTHROPIC_API_KEY=

Demos

Prompt: Read my CV & find ML jobs, save them to a file, and then start applying for them in new tabs, if you need help, ask me.' (8x speed)

https://github.com/user-attachments/assets/171fb4d6-0355-46f2-863e-edb04a828d04

Prompt: Find flights on kayak.com from Zurich to Beijing from 25.12.2024 to 02.02.2025. (8x speed)

flight search 8x 10fps

Prompt: Solve the captcha. (2x speed)
Solving Captcha
Prompt: Look up models with a license of cc-by-sa-4.0 and sort by most likes on Hugging face, save top 5 to file. (1x speed)

https://github.com/user-attachments/assets/de73ee39-432c-4b97-b4e8-939fd7f323b3

Features ⭐

  • Vision + html extraction
  • Automatic multi-tab management
  • Extract clicked elements XPaths and repeat exact LLM actions
  • Add custom actions (e.g. save to file, push to database, notify me, get human input)
  • Self-correcting
  • Use any LLM supported by LangChain (e.g. gpt4o, gpt4o mini, claude 3.5 sonnet, llama 3.1 405b, etc.)
  • Parallelize as many agents as you want

Register custom actions

If you want to add custom actions your agent can take, you can register them like this:

You can use BOTH sync or async functions.

from browser_use.agent.service import Agent
from browser_use.browser.service import Browser
from browser_use.controller.service import Controller

# Initialize controller first
controller = Controller()

@controller.action('Ask user for information')
def ask_human(question: str, display_question: bool) -> str:
	return input(f'\n{question}\nInput: ')

Or define your parameters using Pydantic

class JobDetails(BaseModel):
  title: str
  company: str
  job_link: str
  salary: Optional[str] = None

@controller.action('Save job details which you found on page', param_model=JobDetails, requires_browser=True)
async def save_job(params: JobDetails, browser: Browser):
	print(params)

  # use the browser normally
  page = browser.get_current_page()
	page.go_to(params.job_link)

and then run your agent:

model = ChatAnthropic(model_name='claude-3-5-sonnet-20240620', timeout=25, stop=None, temperature=0.3)
agent = Agent(task=task, llm=model, controller=controller)

await agent.run()

Parallelize agents

In 99% cases you should use 1 Browser instance and parallelize the agents with 1 context per agent. You can also reuse the context after the agent finishes.

browser = Browser()
for i in range(10):
    # This create a new context and automatically closes it after the agent finishes (with `__aexit__`)
    async with browser.new_context() as context:
        agent = Agent(task=f"Task {i}", llm=model, browser_context=context)

        # ... reuse context

If you would like to learn more about how this works under the hood you can learn more at playwright browser-context.

Context vs Browser

If you don't specify a browser or browser_context the agent will create a new browser instance and context.

Get XPath history

To get the entire history of everything the agent has done, you can use the output of the run method:

history: list[AgentHistory] = await agent.run()

print(history)

Browser configuration

You can configure the browser using the BrowserConfig and BrowserContextConfig classes.

The most important options are:

  • headless: Whether to run the browser in headless mode
  • keep_open: Whether to keep the browser open after the script finishes
  • disable_security: Whether to disable browser security features (very useful if dealing with cross-origin requests like iFrames)
  • cookies_file: Path to a cookies file for persistence
  • minimum_wait_page_load_time: Minimum time to wait before getting the page state for the LLM input
  • wait_for_network_idle_page_load_time: Time to wait for network requests to finish before getting the page state
  • maximum_wait_page_load_time: Maximum time to wait for the page to load before proceeding anyway

More examples

For more examples see the examples folder or join the Discord and show off your project.

Telemetry

We collect anonymous usage data to help us understand how the library is being used and to identify potential issues. There is no privacy risk, as no personal information is collected. We collect data with PostHog.

You can opt out of telemetry by setting the ANONYMIZED_TELEMETRY=false environment variable.

Contributing

Contributions are welcome! Feel free to open issues for bugs or feature requests.

Local Setup

  1. Create a virtual environment and install dependencies:
# To install all dependencies including dev
pip install . ."[dev]"
  1. Add your API keys to the .env file:
cp .env.example .env

or copy the following to your .env file:

OPENAI_API_KEY=
ANTHROPIC_API_KEY=

You can use any LLM model supported by LangChain by adding the appropriate environment variables. See langchain models for available options.

Building the package

hatch build

Feel free to join the Discord for discussions and support.


Star ⭐ this repo if you find it useful!
Made with ❤️ by the Browser-Use team

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browser_use-0.1.16.tar.gz (227.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

browser_use-0.1.16-py3-none-any.whl (54.5 kB view details)

Uploaded Python 3

File details

Details for the file browser_use-0.1.16.tar.gz.

File metadata

  • Download URL: browser_use-0.1.16.tar.gz
  • Upload date:
  • Size: 227.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for browser_use-0.1.16.tar.gz
Algorithm Hash digest
SHA256 723230699457d286a5e2322fc3f1f7393b2a82636a59369653bfb3854e2ca846
MD5 5820915ecbb3dbd1ba9e9a2b647f9228
BLAKE2b-256 29c998fb25cca10cc92f3813612ba379e7235f73050429fcd66d8ff697c185a3

See more details on using hashes here.

File details

Details for the file browser_use-0.1.16-py3-none-any.whl.

File metadata

  • Download URL: browser_use-0.1.16-py3-none-any.whl
  • Upload date:
  • Size: 54.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for browser_use-0.1.16-py3-none-any.whl
Algorithm Hash digest
SHA256 f2a5a3cc035ae37d36697520152c4cb55dafc19a6640dfe816340e673e5e390c
MD5 8773609b71d27ecc0448c72bd55b1afb
BLAKE2b-256 e0a4fb78791ce1145d8a14e4564729d63556d942dd5a8e9bb2b95639145acc3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page