Skip to main content

Automate computer tasks in Python

Project description

🤖 AskUI Vision Agent

⚡ Automate computer tasks in Python ⚡

Release Notes PyPI - License

Join the AskUI Discord.

🔧 Setup

1. Install AskUI Agent OS

Agent OS is a device controller that allows agents to take screenshots, move the mouse, click, and type on the keyboard across any operating system.

Windows
AMD64

AskUI Installer for AMD64

ARM64

AskUI Installer for ARM64

Linux

⚠️ Warning: Agent OS currently does not work on Wayland. Switch to XOrg to use it.

AMD64
curl -o /tmp/AskUI-Suite-24.9.1-User-Installer-Linux-x64-Full.run https://files.askui.com/releases/Installer/24.9.1/AskUI-Suite-24.9.1-User-Installer-Linux-x64-Full.run
bash /tmp/AskUI-Suite-24.9.1-User-Installer-Linux-x64-Full.run
ARM64
curl -o /tmp/AskUI-Suite-24.9.1-User-Installer-Linux-ARM64-Full.run https://files.askui.com/releases/Installer/24.9.1/AskUI-Suite-24.9.1-User-Installer-Linux-ARM64-Full.run
bash /tmp/AskUI-Suite-24.9.1-User-Installer-Linux-ARM64-Full.run
MacOS
curl -o /tmp/AskUI-Suite-24.9.1-User-Installer-MacOS-ARM64-Full.run https://files.askui.com/releases/Installer/24.9.1/AskUI-Suite-24.9.1-User-Installer-MacOS-ARM64-Full.run
bash /tmp/AskUI-Suite-24.9.1-User-Installer-MacOS-ARM64-Full.run

2. Install vision-agent in your Python environment

pip install askui

Note: Requires Python version >=3.10.

3. Authenticate with an Automation Model Provider

AskUI INFO Anthropic INFO
ENV Variables ASKUI_WORKSPACE_ID, ASKUI_TOKEN ANTHROPIC_API_KEY
Supported Commands click() click(), get(), act()
Description Faster Inference, European Server, Enterprise Ready Supports complex actions

To get started, set the environment variables required to authenticate with your chosen model provider.

How to set an environment variable?

Linux & MacOS

Use export to set an evironment variable:

export ANTHROPIC_API_KEY=<your-api-key-here>
Windows PowerShell

Set an environment variable with $env:

$env:ANTHROPIC_API_KEY="<your-api-key-here>"

▶️ Start Building

from askui import VisionAgent

# Initialize your agent context manager
with VisionAgent() as agent:
    # Use the webbrowser tool to start browsing
    agent.tools.webbrowser.open_new("http://www.google.com")

    # Start to automate individual steps
    agent.click("url bar")
    agent.type("http://www.google.com")
    agent.keyboard("enter")

    # Extract information from the screen
    datetime = agent.get("What is the datetime at the top of the screen?")
    print(datetime)

    # Or let the agent work on its own
    agent.act("search for a flight from Berlin to Paris in January")

🎛️ Model Selection

Instead of relying on the default model for the entire automation script, you can specify a model for each click command using the model_name parameter.

AskUI Anthropic
click() askui-combo, askui-pta, askui-ocr anthropic-claude-3-5-sonnet-20241022

Example: agent.click("Preview", model_name="askui-combo")

🛠️ Direct Tool Use

Under the hood agents are using a set of tools. You can also directly access these tools.

Agent OS

The controller for the operating system.

agent.tools.os.click("left", 2) # clicking
agent.tools.os.mouse(100, 100) # mouse movement
agent.tools.os.keyboard_tap("v", modifier_keys=["control"]) # Paste
# and many more

Web browser

The webbrowser tool powered by webbrowser allows you to directly access webbrowsers in your environment.

agent.tools.webbrowser.open_new("http://www.google.com")
# also check out open and open_new_tab

Clipboard

The clipboard tool powered by pyperclip allows you to interact with the clipboard.

agent.tools.clipboard.copy("...")
result = agent.tools.clipboard.paste()

📜 Logging

You want a better understanding of what you agent is doing? Set the log_level to DEBUG.

import logging

with VisionAgent(log_level=logging.DEBUG) as agent:
    agent...

🖥️ Multi-Monitor Support

You have multiple monitors? Choose which one to automate by setting display to 1 or 2.

with VisionAgent(display=1) as agent:
    agent...

What is AskUI Vision Agent?

AskUI Vision Agent is a versatile AI powered framework that enables you to automate computer tasks in Python.

It connects Agent OS with powerful computer use models like Anthropic's Claude Sonnet 3.5 v2 and the AskUI Prompt-to-Action series. It is your entry point for building complex automation scenarios with detailed instructions or let the agent explore new challenges on its own.

image

Agent OS is a custom-built OS controller designed to enhance your automation experience.

It offers powerful features like

  • multi-screen support,
  • support for all major operating systems (incl. Windows, MacOS and Linux),
  • process visualizations,
  • real Unicode character typing

and more exciting features like application selection, in background automation and video streaming are to be released soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

askui-0.1.6.tar.gz (161.7 kB view details)

Uploaded Source

Built Distribution

askui-0.1.6-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file askui-0.1.6.tar.gz.

File metadata

  • Download URL: askui-0.1.6.tar.gz
  • Upload date:
  • Size: 161.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: pdm/2.20.1 CPython/3.10.12 Linux/6.5.0-1025-azure

File hashes

Hashes for askui-0.1.6.tar.gz
Algorithm Hash digest
SHA256 ef4ce02c5f21210eecf8c5836037ad5f0938c6fc8476c05c9ccd9e5f2c832bd4
MD5 d069faa001400ace34929455902c5630
BLAKE2b-256 c2f646a6d2f766d14503e96fc11d66e1c11f5979d1dbfb47929e656d700db39f

See more details on using hashes here.

File details

Details for the file askui-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: askui-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: pdm/2.20.1 CPython/3.10.12 Linux/6.5.0-1025-azure

File hashes

Hashes for askui-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 d1c6df1c32b37bebe3075f6a3c221a802473bcc6d6de3e42d89d62e7eca66093
MD5 b4a88ed618e3b712c3526a6a096c7e96
BLAKE2b-256 17e27e283d5b9ad9bf47731f4816122727e497dc4019e93a43ad1be94cf0801f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page