LLM-first vision toolkit for GUI grounding, OCR, and more
Project description
Lookit
LLM-first vision toolkit for GUI grounding, OCR, and more. Built with LangChain and Qwen3-VL. Outputs minimal plain text optimized for token efficiency.
Installation
1. Install CLI
pip install lookit
2. Configure Environment
Works with any OpenAI-compatible API (Ollama, vLLM, OpenRouter, etc.):
macOS / Linux:
# Add to ~/.zshrc (macOS) or ~/.bashrc (Linux)
cat << 'EOF' >> ~/.zshrc
# Ollama local
export LOOKIT_API_KEY="ollama"
export LOOKIT_MODEL="qwen3-vl"
export LOOKIT_BASE_URL="http://localhost:11434/v1"
EOF
source ~/.zshrc
For better results, use Ollama Cloud with the larger model (get API key):
export LOOKIT_API_KEY="your-ollama-api-key"
export LOOKIT_MODEL="qwen3-vl:235b-cloud"
export LOOKIT_BASE_URL="https://ollama.com/v1"
Skills Setup
Skills follow the Agent Skills specification.
| Skill | Description |
|---|---|
computer-use |
GUI grounding for desktop screenshots |
mobile-use |
GUI grounding for mobile screenshots |
ocr |
Text extraction from screenshots |
Claude Code
mkdir -p ~/.claude/skills/{computer-use,mobile-use,ocr}
curl -sL https://raw.githubusercontent.com/atom2ueki/lookit/main/skills/computer-use/SKILL.md -o ~/.claude/skills/computer-use/SKILL.md
curl -sL https://raw.githubusercontent.com/atom2ueki/lookit/main/skills/mobile-use/SKILL.md -o ~/.claude/skills/mobile-use/SKILL.md
curl -sL https://raw.githubusercontent.com/atom2ueki/lookit/main/skills/ocr/SKILL.md -o ~/.claude/skills/ocr/SKILL.md
DeepAgents CLI
DeepAgents is an agent framework built on LangChain and LangGraph.
# Install deepagents CLI
pip install deepagents-cli
# Create skill directories
mkdir -p ~/.deepagents/default/skills/{computer-use,mobile-use,ocr}
# Download skills
curl -sL https://raw.githubusercontent.com/atom2ueki/lookit/main/skills/computer-use/SKILL.md -o ~/.deepagents/default/skills/computer-use/SKILL.md
curl -sL https://raw.githubusercontent.com/atom2ueki/lookit/main/skills/mobile-use/SKILL.md -o ~/.deepagents/default/skills/mobile-use/SKILL.md
curl -sL https://raw.githubusercontent.com/atom2ueki/lookit/main/skills/ocr/SKILL.md -o ~/.deepagents/default/skills/ocr/SKILL.md
# Verify skills installed
deepagents skills list
Programmatic Integration
For integrating skills into your own LangChain agents, see deepagents PR #611 (WIP).
from deepagents import create_deep_agent
from deepagents.backends.filesystem import FilesystemBackend
from deepagents.middleware import SkillsMiddleware
# Create backend and skills middleware
backend = FilesystemBackend()
skills_middleware = SkillsMiddleware(
backend=backend,
registries=[
{"path": "/skills/user/", "name": "user"},
{"path": "/skills/project/", "name": "project"},
],
)
# Create agent with skills middleware
agent = create_deep_agent(
model="openai:gpt-4o",
middleware=[skills_middleware],
)
# Agent will automatically discover and use lookit skills
result = agent.invoke({
"messages": [{"role": "user", "content": "Click the submit button in screenshot.png"}]
})
Usage
Same screenshot, different modes and prompts = different results:
| OCR Mode | Computer Mode |
|---|---|
lookit "extract the transaction history" -s screenshot.png --mode ocr |
lookit "click search" -s screenshot.png --mode computer |
Max Now Pte. Ltd.Singapore SG24 Dec 2025 10:07:13SGD 70.85140 pointsPending... |
left_click 2910,365 |
Output Format
Action Modes (computer/mobile)
left_click 960,324
type "hello world"
swipe 500,800 to 500,200
key Control+c
scroll -100
OCR Mode
Returns extracted text directly.
Arguments
| Argument | Description |
|---|---|
query |
Natural language instruction |
-s, --screenshot |
Path to screenshot (required) |
--mode |
computer, mobile, or ocr (required) |
--debug |
Debug mode (for humans): print info to stderr, save annotated image |
Actions
Computer
left_click, right_click, double_click, type, key, scroll, mouse_move
Mobile
click, long_press, swipe, type, system_button
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lookit-0.1.0.tar.gz.
File metadata
- Download URL: lookit-0.1.0.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11d7be591040a355021bafb969fb2292e4eb654e41ebdc389cc2384b8bf2e718
|
|
| MD5 |
19940fe4cb2615494478af4a4fdc432f
|
|
| BLAKE2b-256 |
b3288c4ba2d1481be9d7af0e1419eb50a20d23e2fbd2c258dd3eebb00aa024a2
|
Provenance
The following attestation bundles were made for lookit-0.1.0.tar.gz:
Publisher:
publish.yml on atom2ueki/lookit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lookit-0.1.0.tar.gz -
Subject digest:
11d7be591040a355021bafb969fb2292e4eb654e41ebdc389cc2384b8bf2e718 - Sigstore transparency entry: 785063967
- Sigstore integration time:
-
Permalink:
atom2ueki/lookit@95014d63a819e7c34c2ac717528db4b94d427e24 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/atom2ueki
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@95014d63a819e7c34c2ac717528db4b94d427e24 -
Trigger Event:
release
-
Statement type:
File details
Details for the file lookit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lookit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5fcbcf2f91bc1e838579d525ca78dd2cca95fa8979d8499ac9300e22fa512fee
|
|
| MD5 |
89e64d1304ef1594979f7758ca5a3ae8
|
|
| BLAKE2b-256 |
512c6cb0bbcd8b5266b4e7312af8d836a20a81a6dc4bb4ddbc764b4399170707
|
Provenance
The following attestation bundles were made for lookit-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on atom2ueki/lookit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lookit-0.1.0-py3-none-any.whl -
Subject digest:
5fcbcf2f91bc1e838579d525ca78dd2cca95fa8979d8499ac9300e22fa512fee - Sigstore transparency entry: 785064033
- Sigstore integration time:
-
Permalink:
atom2ueki/lookit@95014d63a819e7c34c2ac717528db4b94d427e24 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/atom2ueki
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@95014d63a819e7c34c2ac717528db4b94d427e24 -
Trigger Event:
release
-
Statement type: