Skip to main content

Local-only role discovery for finding high-fit startup opportunities.

Project description

PathScout

PathScout is a local-only role discovery CLI for finding high-fit startup opportunities before they become obvious job posts.

It fetches broad signals, scores them against a personal fit profile, stores deduped observations in SQLite, and emits a canonical JSON artifact plus a readable Markdown digest.

What PathScout Is

  • A local-only CLI for monitoring companies, careers pages, RSS feeds, portfolio lists, and manual notes.
  • A fit-profile engine for surfacing target roles, hidden-search hypotheses, and weaker watch signals.
  • An explainable findings scanner: every surfaced item includes score, tier, reasons, flags, source metadata, and suppression state.

What PathScout Is Not

  • It is not a hosted marketplace.
  • It is not a recruiting CRM.
  • It is not a general-purpose job board scraper.
  • It does not provide hosted storage, sync, or remote persistence.

Install

From GitHub:

pipx install git+https://github.com/ckoglmeier/pathscout.git

From a local checkout:

pipx install .

For development:

python3 -m pathscout doctor
python3 -m pathscout run --dry-run --format both

Quick Start

pathscout start
pathscout next
pathscout init
pathscout setup
pathscout doctor
pathscout run --format both

pathscout start is a read-only startup checklist. It shows what exists, what is missing, and the next recommended command without creating or editing files.

pathscout next prints only the next recommended action. /next is also accepted as an alias.

pathscout setup is an interactive guided setup flow. It walks through environment, role/function, locations, avoid terms, background, proof points, constraints, and network context in order, saving answers into local JSON files as it goes.

During init, PathScout asks two onboarding questions in this order:

  1. What is the right environment for you?
  2. What is the right role for you?

For scripted setup, pass answers directly:

pathscout init \
  --environment "Remote AI startups" \
  --role "Founding Product Lead"

Use --no-input to create default sample config without prompts.

Outputs:

  • data/pathscout.sqlite: local state and dedupe history.
  • outputs/latest.json: canonical machine-readable findings artifact.
  • outputs/latest.md: human-readable digest rendered from the JSON findings.
  • outputs/packages/: optional portable opportunity packages created from findings.
  • config/profile.json: personal fit profile.
  • config/background.sample.json: tracked example candidate context.
  • config/background.local.json: private candidate context and proof points.
  • config/sources.json: source adapter configuration.
  • config/watchlist.json: curated company list.
  • config/suppressions.json: structured ignored findings.

Configuration

PathScout uses schema-versioned JSON files.

config/profile.json is the personal fit model. It contains target roles, stages, domains, excluded domains, location preferences, travel constraints, authority terms, and scoring thresholds.

config/sources.json describes inputs. Each source uses this adapter contract:

{
  "id": "watchlist_careers",
  "type": "watchlist_careers",
  "name": "Watchlist careers pages",
  "enabled": true,
  "config": {
    "path": "config/watchlist.json"
  }
}

id is stable and scriptable. name is display-only. type selects the adapter. config is adapter-specific.

Network resilience

The watchlist_careers, web_page, and rss adapters share a single network chokepoint (pathscout.fetchers.http_get) that:

  • Retries transient network failures (timeouts, connection errors) with jittered exponential backoff before giving up.
  • Honors ETag/Last-Modified response headers via an injectable ResponseCache, reusing the cached body on a 304 Not Modified response instead of re-parsing a fresh one.
  • Logs fetch failures through the standard logging module (logging.getLogger("pathscout.fetchers")) instead of swallowing them silently — attach a handler to observe what failed and why.

watchlist_careers additionally supports a per-host rate_limit_seconds config field, enforcing a minimum delay between requests to the same host (independent of the source's overall max_elapsed_seconds run budget):

{
  "id": "watchlist_careers",
  "type": "watchlist_careers",
  "name": "Watchlist careers pages",
  "enabled": true,
  "config": {
    "path": "config/watchlist.json",
    "timeout_seconds": 3,
    "candidate_paths": ["careers", "jobs"],
    "max_elapsed_seconds": 300,
    "rate_limit_seconds": 1
  }
}

ResponseCache and the per-host rate limiter are constructor-injectable (not global state), so a long-lived caller — e.g. a scheduled worker running fetches for many users — can supply persistent implementations instead of the default in-memory, one-per-run behavior the CLI uses.

config/suppressions.json stores structured ignores:

{
  "schema_version": 1,
  "suppressions": [
    {
      "id": "finding-content-hash",
      "scope": "finding",
      "reason": "Not a fit",
      "expires_at": "2026-12-31",
      "created_at": "2026-06-29"
    }
  ]
}

Suppressions affect output visibility. They do not delete observations from SQLite.

Source Types

The v0.2 runner supports standard-library fetches for:

  • manual: config-entered notes for companies or opportunities you want tracked.
  • watchlist: turns every active watchlist company into a hidden-search observation.
  • watchlist_careers: probes active watchlist companies' careers pages for posted role evidence.
  • portfolio: turns companies from config/portfolio.json into relationship-context observations.
  • web_page: fetches a single web page.
  • rss: fetches an RSS or Atom feed.

radar_portfolio remains as a deprecated alias for one release. Use portfolio for new config.

Commands

pathscout start
pathscout next
pathscout init
pathscout setup
pathscout doctor
pathscout watchlist
pathscout portfolio
pathscout review
pathscout explain <finding-id>
pathscout notes <finding-id> --add "Question to verify before outreach"
pathscout thesis <finding-id>
pathscout package <finding-id>
pathscout suppress <finding-id> --reason "Not a fit"
pathscout run --format json
pathscout run --format markdown
pathscout run --format both

Useful paths can be overridden:

pathscout run \
  --profile config/profile.json \
  --sources config/sources.json \
  --watchlist config/watchlist.json \
  --suppressions config/suppressions.json \
  --db data/pathscout.sqlite \
  --json-out outputs/latest.json \
  --out outputs/latest.md

Digest Tiers

  • Act Now: explicit target role or recruiter-visible mandate with strong fit signals.
  • Hidden Search Hypothesis: no role posted, but company signals suggest a likely hiring need.
  • Watch Signal: weaker signal, lower-level posting, or incomplete evidence.
  • Filtered: captured for history but excluded from the main digest.

Review And Suppress

Use review to scan findings from the latest JSON artifact without opening the file:

pathscout review --limit 10
pathscout review --tier "Act Now"

Use explain to inspect why a finding surfaced:

pathscout explain <finding-id>

Use notes to keep local judgment attached to a finding or company:

pathscout notes <finding-id> --add "Ask a former employee whether this team is still founder-led"
pathscout notes --company "Northstar Robotics"

Use thesis to generate a local role-thesis package from a finding. Copy config/background.sample.json to config/background.local.json first if you want the thesis to include private candidate context:

pathscout thesis <finding-id>

Thesis packages are written to outputs/theses/ and are generated from the same JSON finding objects used by review and Markdown digests. They include the company moment, problem map, proposed function, fit argument, 90-180 day wedge, notes, and evidence gaps. They are thinking artifacts, not generated job descriptions or send-ready outreach.

Use suppress to hide a finding from later Markdown digests while keeping the raw observation in SQLite and the finding marked in JSON:

pathscout suppress <finding-id> --reason "Not a fit" --expires 2026-12-31

Careers pages are parsed into separate role findings when PathScout can identify role-title rows. If a page does not expose clear role titles, PathScout falls back to one page-level finding.

Package Exports

Use package to create a portable, human-readable and agent-readable opportunity package from a finding in outputs/latest.json:

pathscout package <finding-id>

Each package includes a manifest, a human Markdown brief, agent instructions, and canonical JSON data under outputs/packages/. See docs/artifacts.md for the artifact contract.

config/background.local.json, legacy config/background.json, data/notes.json, outputs/theses/, and outputs/packages/ are ignored by default because they may contain private candidate context.

See DATA_CONTRACT.md and docs/source_of_truth.md for the local-only storage boundary and agent-readable artifact contract. Network source fetches collect evidence for local runs; they are not hosted storage or sync.

Design Borrowed From

PathScout follows scanner-style findings: stable IDs, evidence, severity-like tiers, reasons, flags, and suppressions.

The config split borrows from dbt-style separation of personal profile from project config. Source IDs follow the pre-commit convention: stable machine IDs plus human names. Suppressions borrow from security scanners: structured ignores with reasons and optional expiration dates.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pathscout-0.4.0.tar.gz (50.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pathscout-0.4.0-py3-none-any.whl (41.3 kB view details)

Uploaded Python 3

File details

Details for the file pathscout-0.4.0.tar.gz.

File metadata

  • Download URL: pathscout-0.4.0.tar.gz
  • Upload date:
  • Size: 50.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pathscout-0.4.0.tar.gz
Algorithm Hash digest
SHA256 a55dd9f5c01384e1553977259b5b7c9d1315a103b735ee5dd816448d0e601411
MD5 90649f927d536893394094160964c2f7
BLAKE2b-256 bda7a6810c2da6a1b7670c60113675a2caca12a14da0ec308d984866b3ae8577

See more details on using hashes here.

Provenance

The following attestation bundles were made for pathscout-0.4.0.tar.gz:

Publisher: release.yml on ckoglmeier/pathscout

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pathscout-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: pathscout-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 41.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pathscout-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 753ff9af206c2aeb77c2587b0c58dcdaff0efa3071fa20f5dddae17b4b209e6b
MD5 51a94aceeaf309fd53f82a71f19ec542
BLAKE2b-256 ef03a26d2b6f9d5657a2f133e334e8860db80cf5a6f392aaec58583302c28ac2

See more details on using hashes here.

Provenance

The following attestation bundles were made for pathscout-0.4.0-py3-none-any.whl:

Publisher: release.yml on ckoglmeier/pathscout

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page