Skip to main content

Generate sales ice-breakers for a prospect from public-data signals.

Project description

IceBreak

A CLI that turns a free-text prospect prompt into a sales briefing. You type something like:

"Patrick McKenzie at Stripe -- pitching observability"

and IceBreak resolves the candidate, walks you through a field-by-field verification panel (so you catch wrong anchors before they poison the brief), fans out to ~9 public sources in parallel (GitHub, Hacker News, Google News, Wikipedia, the company website, your prospect's personal site, LinkedIn previews, web-search snippets, Crunchbase), and asks an LLM to synthesize a dossier and a handful of opener candidates you can read in 30 seconds.

It's a single-process Python CLI. No server, no signup beyond your own API keys, all output goes to your terminal. Cache is local SQLite.


Setup (one-time)

You need uv installed.

uv sync
cp .env.example .env

Then edit .env. The minimum to be useful:

Key Why
One LLM key (GROQ_API_KEY is free) Required. Runs the prompt parser, dossier, and briefing.
TAVILY_API_KEY (free, no CC) Strongly recommended. Without it, identity enrichment can't auto-find LinkedIn URL / Twitter handle / company website from just a name.
GITHUB_TOKEN Strongly recommended. Without it, GitHub API caps you at 60 req/hour.

That's it. Crunchbase, LinkedIn previews, Wikipedia, HN, Google News, and the company website all run unauthenticated.

Run it

uv run icebreaker "Patrick McKenzie at Stripe -- pitching observability"

You'll see:

  1. Prompt parse — what name + company + scenario IceBreak extracted.
  2. Identity enrichment — what GitHub / LinkedIn / Twitter / website it resolved.
  3. Field-by-field verification — for each of the 8 identity fields (name, company, website, personal site, linkedin, github, twitter, bio_snippet) the panel shows the resolved value, where it came from, AND any other candidates that were considered. Per field:
    • [a]ccept — keep current value, advance
    • [e]dit — type a new value
    • [1]/[2]/[3]... — pick one of the other candidates the picker considered (rendered with title + snippet so you can compare — e.g., choose between several "Shivank Prajapati" LinkedIn URLs)
    • [r]e-search — fire a per-field tailored Tavily query (site:linkedin.com for LinkedIn, site:github.com for GitHub, etc.). Merges new candidates into the list. ONE Tavily call per re-search. For prompt-derived fields (name, company), [r] falls back to full re-enrichment.
    • [c]lear — set the field to None
    • [q]uit — abort the run
  4. Connector preview — every connector lists what URL or query it'll fire (github → github.com/janedoe, hackernews → "Jane Doe" "Acme"). Type s <name> to skip a source you don't want to run, then go.
  5. Connector fan-out — N sources in parallel, each bounded by the per-source timeout.
  6. Aggregation → dossier → briefing — facts deduped, profile synthesized, openers generated.
  7. Brief — rendered to your terminal.

Useful flags

  • --inspect — render per-source facts in tables and dump structured JSON to .cache/inspect-<ts>.json. Skips the final LLM briefing call. Use this when iterating on a connector to see what it actually contributed.
  • --raw — also capture the raw response per source (HTML, JSON, RSS) to .cache/raw-<ts>.json. Combine with --inspect for full debugging.
  • --browse — launch Microsoft Edge with a persistent profile so Crunchbase can be scraped past Cloudflare. Solve the challenge once interactively; cookies persist in .cache/edge-profile/.
  • --github <user> / --linkedin-url <url> — override identity enrichment when you already know the handles.
  • --no-verify — skip the human-in-the-loop confirmation. Required for non-interactive runs.

What each source contributes

Source Best for Failure mode
github Engineers, OSS maintainers — bio, pinned repos (when token set, falls back to top-by-stars), READMEs, recent commits, repo homepages → personal site discovery 60 req/hr without token
hackernews Technical founders, infra/devtools — stories + comments None worth mentioning
googlenews Recent press, funding, launches Rate limit on the RSS endpoint
wikipedia Famous people / public companies Long-tail prospects → empty
website Company description, team page, founding year Marketing-only sites are thin
personal_site Engineer's blog or portfolio (auto-discovered from GitHub user.blog or repo homepage fields) Skipped when no personal URL discovered
linkedin (preview) Headline + short bio (no login needed) Authwall (status 999) on most pages
crunchbase Funding stage, founding date, HQ Cloudflare blocks plain HTTP — use --browse
web_search Synthetic source: re-uses the bio snippet from Tavily/Brave web search Skipped if no bio_snippet was discovered

The Orchestrator runs them all in parallel with a per-source timeout. A slow or failing source never blocks the brief — it just shows up in sources_missed in the output.

Caching

A local SQLite cache (.cache/icebreaker.db) sits in front of the two cost-bearing surfaces:

  • Web search (Tavily / Brave): 24h TTL, keyed by (provider, query, max_results).
  • GitHub: 6h TTL on the full per-username response bundle (profile + events + repos + READMEs).

Re-running the same prompt within these windows pays nothing for those calls. Force a refresh by deleting the DB:

rm .cache/icebreaker.db

The other connectors aren't cached — they're cheap public endpoints and their freshness matters more than the saved roundtrip.

Tests

uv run pytest

147 tests, runs in ~6s. The suite uses respx to mock HTTP, so it doesn't hit live APIs. Each test gets an isolated cache via tests/conftest.py.

Evals (development-only)

There's a small rubric-based eval harness in evals/ for regression testing during development — it's not part of the production runtime and doesn't need to be deployed.

uv run python -m evals.run                  # facts + dossier only (cheap)
uv run python -m evals.run --with-briefing  # also runs Summarizer

It scores each prospect in evals/prospects.yaml against expected fields (name resolved, min facts, dossier mentions specific topic, etc.) and writes a JSONL artifact under .cache/evals-<ts>.jsonl. Use it after non-trivial pipeline changes (picker tweaks, prompt edits, LLM-provider swaps) to confirm a known-good prospect still produces the expected brief.

End-users running uv run icebreaker "..." never touch evals.

Optional: pre-push regression hook

There's a checked-in .githooks/pre-push that runs the eval rubric before each git push, blocking the push when the overall pass rate drops below 50% (configurable). Skips automatically when:

  • only docs / non-code files changed in the commits being pushed
  • .env is missing (no API keys to run with)
  • uv isn't on PATH

One-time setup per clone:

git config core.hooksPath .githooks

That's it — no chmod needed on Windows + Git Bash; on Linux/Mac run chmod +x .githooks/pre-push once.

Tweak the threshold without editing the script:

ICEBREAK_EVAL_MIN=0.8 git push     # require 80% pass rate
ICEBREAK_EVAL_MIN=1.0 git push     # strict — every check must pass

Bypass for a one-off WIP push:

git push --no-verify

Architecture

ProspectInput → PromptParser → IdentityResolver → IdentityEnricher → Orchestrator (fan-out) → Aggregator → DossierBuilder → Summarizer → Brief.

Full data-flow walkthrough: docs/ARCHITECTURE.md.

Troubleshooting

Quickest answers in docs/TROUBLESHOOTING.md. Common ones:

Symptom Likely cause
No LLM provider configured Add a *_API_KEY to .env (Groq is free)
Identity has no LinkedIn / X / website TAVILY_API_KEY missing
GitHub 403 rate-limit Add GITHUB_TOKEN to .env
Crunchbase always misses Cloudflare-blocked. Re-run with --browse
LinkedIn shows outcome: authwall in --raw Expected — public preview endpoint refuses many profiles
Brief feels generic Run with --inspect to see which sources actually contributed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

icebreak_cli-0.0.1.tar.gz (167.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

icebreak_cli-0.0.1-py3-none-any.whl (79.8 kB view details)

Uploaded Python 3

File details

Details for the file icebreak_cli-0.0.1.tar.gz.

File metadata

  • Download URL: icebreak_cli-0.0.1.tar.gz
  • Upload date:
  • Size: 167.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for icebreak_cli-0.0.1.tar.gz
Algorithm Hash digest
SHA256 653d1abaa68f75a79f4b1f61dba8fbe162c050ecc31037423ea780b0d46e88e3
MD5 28f002b662a93bb0857df77c25e4efc9
BLAKE2b-256 ffcfcd196babeb1d8e51d867b5f8f065bf62f3d884c390b90fb62d1773b738ff

See more details on using hashes here.

File details

Details for the file icebreak_cli-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: icebreak_cli-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 79.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for icebreak_cli-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3a6791bde87f1e067495eda59213ca4351fa2512b7365c68f3534377df5b1f88
MD5 0618f4452f75dfe598590a2b0cc7fe69
BLAKE2b-256 4061b087a9838e2c8d144a8d1735dfa47f16430d269441c34872c92ef2f93498

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page