Skip to main content

LLM-first vision toolkit for GUI grounding, OCR, and more

Project description

Lookit

LLM-first vision toolkit for GUI grounding, OCR, and more. Built with LangChain and Qwen3-VL. Outputs minimal plain text optimized for token efficiency.

Quick Start

Choose your setup method:

Method Use Case
Skills Setup Claude Code, DeepAgents, or other agent frameworks (recommended)
CLI Installation Standalone command-line usage

Skills Setup

Skills are self-contained and auto-download the binary on first run. No Python or dependencies required.

Skill Description
computer-use GUI grounding for desktop screenshots
mobile-use GUI grounding for mobile screenshots
ocr Text extraction from screenshots

Prerequisites

Choose ONE backend option:

Option A: Ollama Cloud (recommended, no local setup)
  1. Create an Ollama account at ollama.com
  2. Go to ollama.com/settings/keys
  3. Click "Create new key" and copy the API key

You'll use these settings:

LOOKIT_API_KEY=your-api-key-here
LOOKIT_MODEL=qwen3-vl:235b-cloud
LOOKIT_BASE_URL=https://ollama.com/v1
Option B: Ollama Local (requires local setup)
  1. Install Ollama: ollama.com/download
  2. Pull the model:
    ollama pull qwen3-vl
    
  3. Start Ollama (runs automatically after install, or run ollama serve)

You'll use these settings:

LOOKIT_API_KEY=ollama
LOOKIT_MODEL=qwen3-vl
LOOKIT_BASE_URL=http://localhost:11434/v1
Option C: LM Studio (local GUI app)
  1. Download LM Studio: lmstudio.ai
  2. Search and download a Qwen3-VL model (e.g., qwen/qwen3-vl-8b)
  3. Start the local server (Server tab → Start Server)

You'll use these settings (model name uses owner/model format):

LOOKIT_API_KEY=lmstudio
LOOKIT_MODEL=qwen/qwen3-vl-8b
LOOKIT_BASE_URL=http://127.0.0.1:1234/v1

Install Skills

Claude Code

Step 1: Download skills

# Set version to download
VERSION="0.1.1"

# Create skill directories
mkdir -p ~/.claude/skills/{computer-use,mobile-use,ocr}/{bin,config}

# Download computer-use skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/SKILL.md" -o ~/.claude/skills/computer-use/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/bin/lookit" -o ~/.claude/skills/computer-use/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/config/lookit.env.example" -o ~/.claude/skills/computer-use/config/lookit.env.example
chmod +x ~/.claude/skills/computer-use/bin/lookit

# Download mobile-use skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/SKILL.md" -o ~/.claude/skills/mobile-use/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/bin/lookit" -o ~/.claude/skills/mobile-use/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/config/lookit.env.example" -o ~/.claude/skills/mobile-use/config/lookit.env.example
chmod +x ~/.claude/skills/mobile-use/bin/lookit

# Download ocr skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/SKILL.md" -o ~/.claude/skills/ocr/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/bin/lookit" -o ~/.claude/skills/ocr/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/config/lookit.env.example" -o ~/.claude/skills/ocr/config/lookit.env.example
chmod +x ~/.claude/skills/ocr/bin/lookit

# Create config files from examples
cp ~/.claude/skills/computer-use/config/lookit.env.example ~/.claude/skills/computer-use/config/lookit.env
cp ~/.claude/skills/mobile-use/config/lookit.env.example ~/.claude/skills/mobile-use/config/lookit.env
cp ~/.claude/skills/ocr/config/lookit.env.example ~/.claude/skills/ocr/config/lookit.env

Step 2: Configure API settings

Edit each config file with your API settings from the Prerequisites section:

# Edit each config (use any text editor: nano, vim, code, etc.)
nano ~/.claude/skills/computer-use/config/lookit.env
nano ~/.claude/skills/mobile-use/config/lookit.env
nano ~/.claude/skills/ocr/config/lookit.env

Example config for Ollama Cloud:

LOOKIT_API_KEY=your-api-key-here
LOOKIT_MODEL=qwen3-vl:235b-cloud
LOOKIT_BASE_URL=https://ollama.com/v1

Step 3: Verify setup

# Test the skill (downloads binary on first run)
~/.claude/skills/computer-use/bin/lookit --help
DeepAgents CLI

DeepAgents is an agent framework built on LangChain and LangGraph.

Step 1: Download skills

# Install deepagents CLI
pip install deepagents-cli

# Set version to download
VERSION="0.1.1"

# Create skill directories
mkdir -p ~/.deepagents/default/skills/{computer-use,mobile-use,ocr}/{bin,config}

# Download computer-use skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/SKILL.md" -o ~/.deepagents/default/skills/computer-use/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/bin/lookit" -o ~/.deepagents/default/skills/computer-use/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/config/lookit.env.example" -o ~/.deepagents/default/skills/computer-use/config/lookit.env.example
chmod +x ~/.deepagents/default/skills/computer-use/bin/lookit

# Download mobile-use skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/SKILL.md" -o ~/.deepagents/default/skills/mobile-use/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/bin/lookit" -o ~/.deepagents/default/skills/mobile-use/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/config/lookit.env.example" -o ~/.deepagents/default/skills/mobile-use/config/lookit.env.example
chmod +x ~/.deepagents/default/skills/mobile-use/bin/lookit

# Download ocr skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/SKILL.md" -o ~/.deepagents/default/skills/ocr/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/bin/lookit" -o ~/.deepagents/default/skills/ocr/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/config/lookit.env.example" -o ~/.deepagents/default/skills/ocr/config/lookit.env.example
chmod +x ~/.deepagents/default/skills/ocr/bin/lookit

# Create config files from examples
cp ~/.deepagents/default/skills/computer-use/config/lookit.env.example ~/.deepagents/default/skills/computer-use/config/lookit.env
cp ~/.deepagents/default/skills/mobile-use/config/lookit.env.example ~/.deepagents/default/skills/mobile-use/config/lookit.env
cp ~/.deepagents/default/skills/ocr/config/lookit.env.example ~/.deepagents/default/skills/ocr/config/lookit.env

Step 2: Configure API settings

Edit each config file with your API settings from the Prerequisites section:

# Edit each config (use any text editor: nano, vim, code, etc.)
nano ~/.deepagents/default/skills/computer-use/config/lookit.env
nano ~/.deepagents/default/skills/mobile-use/config/lookit.env
nano ~/.deepagents/default/skills/ocr/config/lookit.env

Example config for Ollama Cloud:

LOOKIT_API_KEY=your-api-key-here
LOOKIT_MODEL=qwen3-vl:235b-cloud
LOOKIT_BASE_URL=https://ollama.com/v1

Step 3: Verify setup

# Test the skill (downloads binary on first run)
~/.deepagents/default/skills/computer-use/bin/lookit --help

# Verify skills are detected
deepagents skills list
Programmatic Integration

For integrating skills into your own LangChain agents, see deepagents PR #611 (WIP).

# Install in your project
pip install lookit

# Create .env file with your API settings from Prerequisites section
cat << 'EOF' > .env
LOOKIT_API_KEY=your-api-key-here
LOOKIT_MODEL=qwen3-vl:235b-cloud
LOOKIT_BASE_URL=https://ollama.com/v1
EOF
from deepagents import create_deep_agent
from deepagents.backends.filesystem import FilesystemBackend
from deepagents.middleware import SkillsMiddleware

# Create backend and skills middleware
backend = FilesystemBackend()
skills_middleware = SkillsMiddleware(
    backend=backend,
    registries=[
        {"path": "/skills/user/", "name": "user"},
        {"path": "/skills/project/", "name": "project"},
    ],
)

# Create agent with skills middleware
agent = create_deep_agent(
    model="openai:gpt-4o",
    middleware=[skills_middleware],
)

# Agent will automatically discover and use lookit skills
result = agent.invoke({
    "messages": [{"role": "user", "content": "Click the submit button in screenshot.png"}]
})

CLI Installation

For standalone command-line usage (requires Python). First complete the Prerequisites to get your API settings.

Step 1: Install

pip install lookit

Step 2: Configure

Add to your shell profile (~/.zshrc on macOS, ~/.bashrc on Linux):

For Ollama Cloud (recommended, see Prerequisites for API key):

export LOOKIT_API_KEY="your-api-key-here"
export LOOKIT_MODEL="qwen3-vl:235b-cloud"
export LOOKIT_BASE_URL="https://ollama.com/v1"

For Ollama Local (see Prerequisites for setup):

export LOOKIT_API_KEY="ollama"
export LOOKIT_MODEL="qwen3-vl"
export LOOKIT_BASE_URL="http://localhost:11434/v1"

Then reload: source ~/.zshrc

Step 3: Verify

lookit --help

Usage

Same screenshot, different modes and prompts = different results:

OCR Mode Computer Mode
lookit "extract the transaction history" -s screenshot.png --mode ocr lookit "click search" -s screenshot.png --mode computer
Screenshot Result
Max Now Pte. Ltd.
Singapore SG
24 Dec 2025 10:07:13
SGD 70.85
140 points
Pending
...
left_click 2910,365

Output Format

Action Modes (computer/mobile)

left_click 960,324
type "hello world"
swipe 500,800 to 500,200
key Control+c
scroll -100

OCR Mode

Returns extracted text directly.

Arguments

Argument Description
query Natural language instruction
-s, --screenshot Path to screenshot (required)
-m, --mode computer, mobile, or ocr (required)
--debug Debug mode (for humans): print info to stderr, save annotated image

Actions

Computer

left_click, right_click, double_click, type, key, scroll, mouse_move

Mobile

click, long_press, swipe, type, system_button

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lookit-0.1.1.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lookit-0.1.1-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file lookit-0.1.1.tar.gz.

File metadata

  • Download URL: lookit-0.1.1.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lookit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 99738ac73a4e860944056c4e25883c1dff10f6a311b5c73d281f153f02985d6d
MD5 9fd25d4386b2807be20f8f98d2a79052
BLAKE2b-256 0ce95d9f01245499b1f65581dd8769e1073c54f50391fcb9aceba9dca6499105

See more details on using hashes here.

Provenance

The following attestation bundles were made for lookit-0.1.1.tar.gz:

Publisher: publish.yml on atom2ueki/lookit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lookit-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: lookit-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lookit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7cef9be0b49145ce221582020cf2901bacfc91e96c38615dc5a2ad47a6a08c9f
MD5 1654c99713dcdcd465d3d0ad1a77579d
BLAKE2b-256 9f33cb47460aeabf30bbcbfbaded2f2aa7f0e70ab6eaf08a51396b2cd49e2d94

See more details on using hashes here.

Provenance

The following attestation bundles were made for lookit-0.1.1-py3-none-any.whl:

Publisher: publish.yml on atom2ueki/lookit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page