Skip to main content

A library for automating web tasks

Project description

Web-operator

Recently, OpenAI has released an operator agent that can browse the web and perform actions based on prompts. My approach differs from these agents and the browser-use library because my approach has the potential to automate all sorts of computer tasks. As part of the development, I built a basic library called web-operator that performs minimal computer tasks.

Disclaimer: It is important to note that this is a starting point and requires further development to handle more complex scenarios.

A specific use case I envision is web search. For example, if we need to perform a web search on a specific topic, we often use Selenium or Playwright for web scraping. However, we frequently get blocked, especially by Google sites. If we can search the way a human does, these sites may not detect scraping.

Disclaimer: Currently, the library can navigate sites based on instructions, but data scraping is not yet implemented. Design

Architecture of the library

alt text

At the heart of this library lies the Qwen 2.5 VL 7B Instruct model. This powerful, 7-billion-parameter vision-language model can understand both image and text content. It goes beyond simple object detection by analyzing the structure and content of images, including text, charts, and layouts. This capability makes it well-suited for automating computer tasks, where understanding the visual context of a screen or interface is crucial.

Requirement

This required an Nvidia GPU with 12GB of VRAM to run the Hugging Face model locally. It also required the Gemini API.

Installation

  1. Setup conda enviornment with python 3.12

  2. Web-operator and other software installation

    conda config --add channels pytorch
    conda config --add channels conda-forge
    conda config --add channels nvidia
    
    python -m pip install --upgrade web-operator
    python -m pip install git+https://github.com/huggingface/transformers@f3f6c86582611976e72be054675e2bf0abb5f775
    
    
    
  3. Environment Setup

    This guide explains how to set up and manage environment variables for your project using python-dotenv.

    a. Install the python-dotenv library using pip:

    pip install python-dotenv
    

    b. Create a .env file in your project's root directory with the following structure:

    GEMINI_API_KEY=your_openai_api_key
    

    c. Add .env to your .gitignore file to prevent accidentally committing sensitive information:

    d. code for load environment variables

    from dotenv import load_dotenv
    import os
    load_dotenv()
    

How to change the basic config

  1. print config
print(supervisor.config)

#Typical output
{'debug': False, 'step_creation_model': 'gemini-2.0-pro-exp-02-05', 'computer_use_model': 'Qwen/Qwen2.5-VL-7B-Instruct'}
  1. modify
supervisor.config["debug"] = True
print(supervisor.config)

#Typical output
{'debug': False, 'step_creation_model': 'gemini-2.0-pro-exp-02-05', 'computer_use_model': 'Qwen/Qwen2.5-VL-7B-Instruct'}

Example

from web_operator.supervisor import Supervisor
from dotenv import load_dotenv

load_dotenv()  

query = """
action 1: open the firfox web browser
action 2: click on the address bar
action 3: type scholar.google.com 
action 4: press enter for search
action 5: type openai in the search box of google scholar
action 6: press enter for search
action 7: close the browser
"""
user_query = query

supervisor = Supervisor()

supervisor.config["debug"] = True

# Make sure the config is changed before the configure function call.
supervisor.configure()
supervisor.run(user_query)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

web_operator-1.0.1.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

web_operator-1.0.1-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file web_operator-1.0.1.tar.gz.

File metadata

  • Download URL: web_operator-1.0.1.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for web_operator-1.0.1.tar.gz
Algorithm Hash digest
SHA256 d253ea14c813504c69b9b0f82e301a1e72a858b1192e6f4350b8bafc4209e727
MD5 3740db21b04a04ebd81d66b5f084024c
BLAKE2b-256 cf3b93caf21367d1e3365d1e8de971a8ba86225245b40fb3be72c83568ec1bb6

See more details on using hashes here.

File details

Details for the file web_operator-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: web_operator-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for web_operator-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f05082b2b545c5ddc7f78b71a94755d5dfe4b00db8af2d65f7163f4e280695a3
MD5 8913817548c8e3916f3a985c239d6621
BLAKE2b-256 a98a8ef19e589286b220618dd9ac967c9af2ffe8c72c673ba0cf410a91dd3f85

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page