Skip to main content

OmniScout CLI (harness): local-first browser automation, semantic search, and research for AI agents

Project description

scout — OmniScout CLI

Local-first browser automation, semantic search, and research for AI agents. No cloud APIs, no hosted browser sessions, no MCP, no SDK.

The CLI is the interface.

Install

Requires Python 3.11+ and Google Chrome (already installed on most macOS machines at /Applications/Google Chrome.app).

Recommended: install as a global tool

cd cli
uv tool install --editable .   # creates ~/.local/bin/omniscout on PATH
scout install              # verifies Chrome + prefetches embedding model

After this, scout works from any directory. Edits to source files are picked up live (editable install).

omniscout and harness remain available as compatibility aliases.

Alternative: project venv

If you prefer not to install globally:

cd cli
uv venv --python 3.11 .venv
uv pip install -e ".[dev]"
source .venv/bin/activate
scout install

If you don't have Chrome installed, add --bundled to also download Playwright's bundled Chromium (~190MB).

scout install also prefetches the local sentence-transformers model into OmniScout's app data directory so later commands do not need to fetch it again. Use --no-model to skip model prefetch.

Quickstart

# Search the web (DuckDuckGo HTML + local embedding rerank)
scout search "local-first browser agents"
# same command via alias:
scout search "local-first browser agents"

# Extract a URL to clean Markdown
scout extract https://example.com

# Capture a screenshot of a real page using your installed Chrome
scout browser screenshot https://example.com --out page.png

# Run a multi-step research pipeline (search -> crawl -> extract -> rerank -> summarize)
scout research "state of local AI agents in 2026"

# Manage persistent browser profiles (cookies, logins persist across runs)
scout profile create work
scout browser open https://news.ycombinator.com --profile work --headful

# Long-lived browser sessions (other tools can attach via CDP)
scout session start --headful
scout session list
scout session kill --all

JSON output (for agents)

Every command emits structured JSON when invoked with --json (or with HARNESS_JSON=1 in the environment). Logs always go to stderr; stdout is reserved for the structured result.

HARNESS_JSON=1 scout search "robotics simulators" --limit 5

Architecture

harness/
  app.py              # Typer root
  commands/           # CLI sub-commands (thin)
  engines/
    browser.py        # Playwright + system Chrome
    extractor.py      # trafilatura + markdownify
    crawler.py        # async httpx + Chrome fallback
    search/
      ddg.py          # DuckDuckGo HTML
      embed.py        # sentence-transformers (all-MiniLM-L6-v2)
      index.py        # embedded Qdrant on-disk
      rerank.py       # cosine rerank
      pipeline.py     # ddg | index | hybrid
    research.py       # full pipeline (search -> crawl -> extract -> rerank -> summarize)
  store/
    cache.py          # SQLite + content-hashed HTML cache
    sessions.py       # SQLite registry of browser sessions
  models.py           # pydantic result types (the JSON contract)

On-disk state lives under ~/Library/Application Support/harness/ (macOS) / $XDG_DATA_HOME/harness/ (Linux):

Path Purpose
profiles/ Persistent Chrome user-data-dirs
qdrant/ Embedded vector index
sessions.sqlite Registry of long-lived browser sessions
cache/pages/ Content-hashed HTML cache used by extract+crawler

Override via HARNESS_DATA_DIR, HARNESS_CONFIG_DIR, HARNESS_CACHE_DIR, or settings in ~/Library/Application Support/harness/config.toml.

Configuration

config.toml example:

default_source = "ddg"           # search source default
search_limit = 10
research_results = 8
request_throttle_seconds = 1.0   # per-host throttle in the crawler
embedding_model = "sentence-transformers/all-MiniLM-L6-v2"
embedding_local_only = true         # default; never fetch model files at query time
browser_channel = "chrome"       # uses installed Google Chrome
# browser_executable = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
summary_sentences = 6

Set HARNESS_EMBED_LOCAL_ONLY=0 to allow runtime Hugging Face fetches.

Testing

.venv/bin/pytest

Tests run offline — search, extract, and the research pipeline are all exercised against saved HTML fixtures and patched network seams.

Why local Chrome?

Using your system Chrome (channel = "chrome") gives you:

  • Real cookies, login state, extensions, and font rendering
  • No extra ~190MB Chromium download
  • The same user-agent fingerprint as your daily browsing
  • Cleaner integration with omniscout session start for long-lived sessions that other tools can attach to over CDP

If Chrome isn't available, the engine transparently falls back to Playwright's bundled Chromium.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omniscout-0.1.0.tar.gz (82.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omniscout-0.1.0-py3-none-any.whl (106.6 kB view details)

Uploaded Python 3

File details

Details for the file omniscout-0.1.0.tar.gz.

File metadata

  • Download URL: omniscout-0.1.0.tar.gz
  • Upload date:
  • Size: 82.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for omniscout-0.1.0.tar.gz
Algorithm Hash digest
SHA256 73ada61fe4296bd0cc50a78c1401e19d8508303e3c6acb9f942c38b5390afc43
MD5 8d7193bdf7f8ac347c65f2fce3c4eb51
BLAKE2b-256 f281c43a8ff3e5d7ca0425fecb3eb52e42e58b5733108f49dda1ebff159f724a

See more details on using hashes here.

Provenance

The following attestation bundles were made for omniscout-0.1.0.tar.gz:

Publisher: pypi-publish.yml on sriramramnath/omniscout

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file omniscout-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: omniscout-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 106.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for omniscout-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ed07fe82e2eadbdcd763908a9b8071e7c775b937f6277d4887530d928e1cdf3
MD5 efd8eeaf2542a707e893079925f14fc1
BLAKE2b-256 cd39203350ce8cad5ab257a9f0c47e044e68dc0f89828cca7558ec49f916161c

See more details on using hashes here.

Provenance

The following attestation bundles were made for omniscout-0.1.0-py3-none-any.whl:

Publisher: pypi-publish.yml on sriramramnath/omniscout

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page