Skip to main content

LLM-first vision toolkit for GUI grounding, OCR, and more

Project description

Lookit

LLM-first vision toolkit for GUI grounding, OCR, and more. Built with LangChain and Qwen3-VL. Outputs minimal plain text optimized for token efficiency.

Installation

1. Install CLI

pip install lookit

2. Configure Environment

Works with any OpenAI-compatible API (Ollama, vLLM, OpenRouter, etc.):

macOS / Linux:

# Add to ~/.zshrc (macOS) or ~/.bashrc (Linux)
cat << 'EOF' >> ~/.zshrc
# Ollama local
export LOOKIT_API_KEY="ollama"
export LOOKIT_MODEL="qwen3-vl"
export LOOKIT_BASE_URL="http://localhost:11434/v1"
EOF

source ~/.zshrc

For better results, use Ollama Cloud with the larger model (get API key):

export LOOKIT_API_KEY="your-ollama-api-key"
export LOOKIT_MODEL="qwen3-vl:235b-cloud"
export LOOKIT_BASE_URL="https://ollama.com/v1"

Skills Setup

Skills follow the Agent Skills specification.

Skill Description
computer-use GUI grounding for desktop screenshots
mobile-use GUI grounding for mobile screenshots
ocr Text extraction from screenshots
Claude Code
mkdir -p ~/.claude/skills/{computer-use,mobile-use,ocr}

curl -sL https://raw.githubusercontent.com/atom2ueki/lookit/main/skills/computer-use/SKILL.md -o ~/.claude/skills/computer-use/SKILL.md
curl -sL https://raw.githubusercontent.com/atom2ueki/lookit/main/skills/mobile-use/SKILL.md -o ~/.claude/skills/mobile-use/SKILL.md
curl -sL https://raw.githubusercontent.com/atom2ueki/lookit/main/skills/ocr/SKILL.md -o ~/.claude/skills/ocr/SKILL.md
DeepAgents CLI

DeepAgents is an agent framework built on LangChain and LangGraph.

# Install deepagents CLI
pip install deepagents-cli

# Create skill directories
mkdir -p ~/.deepagents/default/skills/{computer-use,mobile-use,ocr}

# Download skills
curl -sL https://raw.githubusercontent.com/atom2ueki/lookit/main/skills/computer-use/SKILL.md -o ~/.deepagents/default/skills/computer-use/SKILL.md
curl -sL https://raw.githubusercontent.com/atom2ueki/lookit/main/skills/mobile-use/SKILL.md -o ~/.deepagents/default/skills/mobile-use/SKILL.md
curl -sL https://raw.githubusercontent.com/atom2ueki/lookit/main/skills/ocr/SKILL.md -o ~/.deepagents/default/skills/ocr/SKILL.md

# Verify skills installed
deepagents skills list
Programmatic Integration

For integrating skills into your own LangChain agents, see deepagents PR #611 (WIP).

from deepagents import create_deep_agent
from deepagents.backends.filesystem import FilesystemBackend
from deepagents.middleware import SkillsMiddleware

# Create backend and skills middleware
backend = FilesystemBackend()
skills_middleware = SkillsMiddleware(
    backend=backend,
    registries=[
        {"path": "/skills/user/", "name": "user"},
        {"path": "/skills/project/", "name": "project"},
    ],
)

# Create agent with skills middleware
agent = create_deep_agent(
    model="openai:gpt-4o",
    middleware=[skills_middleware],
)

# Agent will automatically discover and use lookit skills
result = agent.invoke({
    "messages": [{"role": "user", "content": "Click the submit button in screenshot.png"}]
})

Usage

Same screenshot, different modes and prompts = different results:

OCR Mode Computer Mode
lookit "extract the transaction history" -s screenshot.png --mode ocr lookit "click search" -s screenshot.png --mode computer
Screenshot Result
Max Now Pte. Ltd.
Singapore SG
24 Dec 2025 10:07:13
SGD 70.85
140 points
Pending
...
left_click 2910,365

Output Format

Action Modes (computer/mobile)

left_click 960,324
type "hello world"
swipe 500,800 to 500,200
key Control+c
scroll -100

OCR Mode

Returns extracted text directly.

Arguments

Argument Description
query Natural language instruction
-s, --screenshot Path to screenshot (required)
--mode computer, mobile, or ocr (required)
--debug Debug mode (for humans): print info to stderr, save annotated image

Actions

Computer

left_click, right_click, double_click, type, key, scroll, mouse_move

Mobile

click, long_press, swipe, type, system_button

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lookit-0.1.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lookit-0.1.0-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file lookit-0.1.0.tar.gz.

File metadata

  • Download URL: lookit-0.1.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lookit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 11d7be591040a355021bafb969fb2292e4eb654e41ebdc389cc2384b8bf2e718
MD5 19940fe4cb2615494478af4a4fdc432f
BLAKE2b-256 b3288c4ba2d1481be9d7af0e1419eb50a20d23e2fbd2c258dd3eebb00aa024a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for lookit-0.1.0.tar.gz:

Publisher: publish.yml on atom2ueki/lookit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lookit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lookit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lookit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5fcbcf2f91bc1e838579d525ca78dd2cca95fa8979d8499ac9300e22fa512fee
MD5 89e64d1304ef1594979f7758ca5a3ae8
BLAKE2b-256 512c6cb0bbcd8b5266b4e7312af8d836a20a81a6dc4bb4ddbc764b4399170707

See more details on using hashes here.

Provenance

The following attestation bundles were made for lookit-0.1.0-py3-none-any.whl:

Publisher: publish.yml on atom2ueki/lookit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page