LLM-first vision toolkit for GUI grounding, OCR, and more
Project description
Lookit
LLM-first vision toolkit for GUI grounding, OCR, and more. Built with LangChain and Qwen3-VL. Outputs minimal plain text optimized for token efficiency.
Quick Start
Choose your setup method:
| Method | Use Case |
|---|---|
| Skills Setup | Claude Code, DeepAgents, or other agent frameworks (recommended) |
| CLI Installation | Standalone command-line usage |
Skills Setup
Skills are self-contained and auto-download the binary on first run. No Python or dependencies required.
| Skill | Description |
|---|---|
computer-use |
GUI grounding for desktop screenshots |
mobile-use |
GUI grounding for mobile screenshots |
ocr |
Text extraction from screenshots |
Prerequisites
Choose ONE backend option:
Option A: Ollama Cloud (recommended, no local setup)
- Create an Ollama account at ollama.com
- Go to ollama.com/settings/keys
- Click "Create new key" and copy the API key
You'll use these settings:
LOOKIT_API_KEY=your-api-key-here
LOOKIT_MODEL=qwen3-vl:235b-cloud
LOOKIT_BASE_URL=https://ollama.com/v1
Option B: Ollama Local (requires local setup)
- Install Ollama: ollama.com/download
- Pull the model:
ollama pull qwen3-vl
- Start Ollama (runs automatically after install, or run
ollama serve)
You'll use these settings:
LOOKIT_API_KEY=ollama
LOOKIT_MODEL=qwen3-vl
LOOKIT_BASE_URL=http://localhost:11434/v1
Option C: LM Studio (local GUI app)
- Download LM Studio: lmstudio.ai
- Search and download a Qwen3-VL model (e.g.,
qwen/qwen3-vl-8b) - Start the local server (Server tab → Start Server)
You'll use these settings (model name uses owner/model format):
LOOKIT_API_KEY=lmstudio
LOOKIT_MODEL=qwen/qwen3-vl-8b
LOOKIT_BASE_URL=http://127.0.0.1:1234/v1
Install Skills
Claude Code
Step 1: Download skills
# Set version to download
VERSION="0.1.1"
# Create skill directories
mkdir -p ~/.claude/skills/{computer-use,mobile-use,ocr}/{bin,config}
# Download computer-use skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/SKILL.md" -o ~/.claude/skills/computer-use/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/bin/lookit" -o ~/.claude/skills/computer-use/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/config/lookit.env.example" -o ~/.claude/skills/computer-use/config/lookit.env.example
chmod +x ~/.claude/skills/computer-use/bin/lookit
# Download mobile-use skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/SKILL.md" -o ~/.claude/skills/mobile-use/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/bin/lookit" -o ~/.claude/skills/mobile-use/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/config/lookit.env.example" -o ~/.claude/skills/mobile-use/config/lookit.env.example
chmod +x ~/.claude/skills/mobile-use/bin/lookit
# Download ocr skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/SKILL.md" -o ~/.claude/skills/ocr/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/bin/lookit" -o ~/.claude/skills/ocr/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/config/lookit.env.example" -o ~/.claude/skills/ocr/config/lookit.env.example
chmod +x ~/.claude/skills/ocr/bin/lookit
# Create config files from examples
cp ~/.claude/skills/computer-use/config/lookit.env.example ~/.claude/skills/computer-use/config/lookit.env
cp ~/.claude/skills/mobile-use/config/lookit.env.example ~/.claude/skills/mobile-use/config/lookit.env
cp ~/.claude/skills/ocr/config/lookit.env.example ~/.claude/skills/ocr/config/lookit.env
Step 2: Configure API settings
Edit each config file with your API settings from the Prerequisites section:
# Edit each config (use any text editor: nano, vim, code, etc.)
nano ~/.claude/skills/computer-use/config/lookit.env
nano ~/.claude/skills/mobile-use/config/lookit.env
nano ~/.claude/skills/ocr/config/lookit.env
Example config for Ollama Cloud:
LOOKIT_API_KEY=your-api-key-here
LOOKIT_MODEL=qwen3-vl:235b-cloud
LOOKIT_BASE_URL=https://ollama.com/v1
Step 3: Verify setup
# Test the skill (downloads binary on first run)
~/.claude/skills/computer-use/bin/lookit --help
DeepAgents CLI
DeepAgents is an agent framework built on LangChain and LangGraph.
Step 1: Download skills
# Install deepagents CLI
pip install deepagents-cli
# Set version to download
VERSION="0.1.1"
# Create skill directories
mkdir -p ~/.deepagents/default/skills/{computer-use,mobile-use,ocr}/{bin,config}
# Download computer-use skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/SKILL.md" -o ~/.deepagents/default/skills/computer-use/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/bin/lookit" -o ~/.deepagents/default/skills/computer-use/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/computer-use/config/lookit.env.example" -o ~/.deepagents/default/skills/computer-use/config/lookit.env.example
chmod +x ~/.deepagents/default/skills/computer-use/bin/lookit
# Download mobile-use skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/SKILL.md" -o ~/.deepagents/default/skills/mobile-use/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/bin/lookit" -o ~/.deepagents/default/skills/mobile-use/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/mobile-use/config/lookit.env.example" -o ~/.deepagents/default/skills/mobile-use/config/lookit.env.example
chmod +x ~/.deepagents/default/skills/mobile-use/bin/lookit
# Download ocr skill
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/SKILL.md" -o ~/.deepagents/default/skills/ocr/SKILL.md
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/bin/lookit" -o ~/.deepagents/default/skills/ocr/bin/lookit
curl -sL "https://raw.githubusercontent.com/atom2ueki/lookit/$VERSION/skills/ocr/config/lookit.env.example" -o ~/.deepagents/default/skills/ocr/config/lookit.env.example
chmod +x ~/.deepagents/default/skills/ocr/bin/lookit
# Create config files from examples
cp ~/.deepagents/default/skills/computer-use/config/lookit.env.example ~/.deepagents/default/skills/computer-use/config/lookit.env
cp ~/.deepagents/default/skills/mobile-use/config/lookit.env.example ~/.deepagents/default/skills/mobile-use/config/lookit.env
cp ~/.deepagents/default/skills/ocr/config/lookit.env.example ~/.deepagents/default/skills/ocr/config/lookit.env
Step 2: Configure API settings
Edit each config file with your API settings from the Prerequisites section:
# Edit each config (use any text editor: nano, vim, code, etc.)
nano ~/.deepagents/default/skills/computer-use/config/lookit.env
nano ~/.deepagents/default/skills/mobile-use/config/lookit.env
nano ~/.deepagents/default/skills/ocr/config/lookit.env
Example config for Ollama Cloud:
LOOKIT_API_KEY=your-api-key-here
LOOKIT_MODEL=qwen3-vl:235b-cloud
LOOKIT_BASE_URL=https://ollama.com/v1
Step 3: Verify setup
# Test the skill (downloads binary on first run)
~/.deepagents/default/skills/computer-use/bin/lookit --help
# Verify skills are detected
deepagents skills list
Programmatic Integration
For integrating skills into your own LangChain agents, see deepagents PR #611 (WIP).
# Install in your project
pip install lookit
# Create .env file with your API settings from Prerequisites section
cat << 'EOF' > .env
LOOKIT_API_KEY=your-api-key-here
LOOKIT_MODEL=qwen3-vl:235b-cloud
LOOKIT_BASE_URL=https://ollama.com/v1
EOF
from deepagents import create_deep_agent
from deepagents.backends.filesystem import FilesystemBackend
from deepagents.middleware import SkillsMiddleware
# Create backend and skills middleware
backend = FilesystemBackend()
skills_middleware = SkillsMiddleware(
backend=backend,
registries=[
{"path": "/skills/user/", "name": "user"},
{"path": "/skills/project/", "name": "project"},
],
)
# Create agent with skills middleware
agent = create_deep_agent(
model="openai:gpt-4o",
middleware=[skills_middleware],
)
# Agent will automatically discover and use lookit skills
result = agent.invoke({
"messages": [{"role": "user", "content": "Click the submit button in screenshot.png"}]
})
CLI Installation
For standalone command-line usage (requires Python). First complete the Prerequisites to get your API settings.
Step 1: Install
pip install lookit
Step 2: Configure
Add to your shell profile (~/.zshrc on macOS, ~/.bashrc on Linux):
For Ollama Cloud (recommended, see Prerequisites for API key):
export LOOKIT_API_KEY="your-api-key-here"
export LOOKIT_MODEL="qwen3-vl:235b-cloud"
export LOOKIT_BASE_URL="https://ollama.com/v1"
For Ollama Local (see Prerequisites for setup):
export LOOKIT_API_KEY="ollama"
export LOOKIT_MODEL="qwen3-vl"
export LOOKIT_BASE_URL="http://localhost:11434/v1"
Then reload: source ~/.zshrc
Step 3: Verify
lookit --help
Usage
Same screenshot, different modes and prompts = different results:
| OCR Mode | Computer Mode |
|---|---|
lookit "extract the transaction history" -s screenshot.png --mode ocr |
lookit "click search" -s screenshot.png --mode computer |
Max Now Pte. Ltd.Singapore SG24 Dec 2025 10:07:13SGD 70.85140 pointsPending... |
left_click 2910,365 |
Output Format
Action Modes (computer/mobile)
left_click 960,324
type "hello world"
swipe 500,800 to 500,200
key Control+c
scroll -100
OCR Mode
Returns extracted text directly.
Arguments
| Argument | Description |
|---|---|
query |
Natural language instruction |
-s, --screenshot |
Path to screenshot (required) |
-m, --mode |
computer, mobile, or ocr (required) |
--debug |
Debug mode (for humans): print info to stderr, save annotated image |
Actions
Computer
left_click, right_click, double_click, type, key, scroll, mouse_move
Mobile
click, long_press, swipe, type, system_button
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lookit-0.1.1.tar.gz.
File metadata
- Download URL: lookit-0.1.1.tar.gz
- Upload date:
- Size: 16.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99738ac73a4e860944056c4e25883c1dff10f6a311b5c73d281f153f02985d6d
|
|
| MD5 |
9fd25d4386b2807be20f8f98d2a79052
|
|
| BLAKE2b-256 |
0ce95d9f01245499b1f65581dd8769e1073c54f50391fcb9aceba9dca6499105
|
Provenance
The following attestation bundles were made for lookit-0.1.1.tar.gz:
Publisher:
publish.yml on atom2ueki/lookit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lookit-0.1.1.tar.gz -
Subject digest:
99738ac73a4e860944056c4e25883c1dff10f6a311b5c73d281f153f02985d6d - Sigstore transparency entry: 797959965
- Sigstore integration time:
-
Permalink:
atom2ueki/lookit@d7d4e137411f7af0950d7e2258ff7d8b4ea445e1 -
Branch / Tag:
refs/tags/0.1.1 - Owner: https://github.com/atom2ueki
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d7d4e137411f7af0950d7e2258ff7d8b4ea445e1 -
Trigger Event:
release
-
Statement type:
File details
Details for the file lookit-0.1.1-py3-none-any.whl.
File metadata
- Download URL: lookit-0.1.1-py3-none-any.whl
- Upload date:
- Size: 18.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7cef9be0b49145ce221582020cf2901bacfc91e96c38615dc5a2ad47a6a08c9f
|
|
| MD5 |
1654c99713dcdcd465d3d0ad1a77579d
|
|
| BLAKE2b-256 |
9f33cb47460aeabf30bbcbfbaded2f2aa7f0e70ab6eaf08a51396b2cd49e2d94
|
Provenance
The following attestation bundles were made for lookit-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on atom2ueki/lookit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lookit-0.1.1-py3-none-any.whl -
Subject digest:
7cef9be0b49145ce221582020cf2901bacfc91e96c38615dc5a2ad47a6a08c9f - Sigstore transparency entry: 797959967
- Sigstore integration time:
-
Permalink:
atom2ueki/lookit@d7d4e137411f7af0950d7e2258ff7d8b4ea445e1 -
Branch / Tag:
refs/tags/0.1.1 - Owner: https://github.com/atom2ueki
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d7d4e137411f7af0950d7e2258ff7d8b4ea445e1 -
Trigger Event:
release
-
Statement type: