Generate sales ice-breakers for a prospect from public-data signals.
Project description
IceBreak
A CLI that turns a free-text prospect prompt into a sales briefing. You type something like:
"Patrick McKenzie at Stripe -- pitching observability"
and IceBreak resolves the candidate, walks you through a field-by-field verification panel (so you catch wrong anchors before they poison the brief), fans out to ~9 public sources in parallel (GitHub, Hacker News, Google News, Wikipedia, the company website, your prospect's personal site, LinkedIn previews, web-search snippets, Crunchbase), and asks an LLM to synthesize a dossier and a handful of opener candidates you can read in 30 seconds.
It's a single-process Python CLI. No server, no signup beyond your own API keys, all output goes to your terminal. Cache is local SQLite.
Setup (one-time)
You need uv installed.
uv sync
cp .env.example .env
Then edit .env. The minimum to be useful:
| Key | Why |
|---|---|
One LLM key (GROQ_API_KEY is free) |
Required. Runs the prompt parser, dossier, and briefing. |
TAVILY_API_KEY (free, no CC) |
Strongly recommended. Without it, identity enrichment can't auto-find LinkedIn URL / Twitter handle / company website from just a name. |
GITHUB_TOKEN |
Strongly recommended. Without it, GitHub API caps you at 60 req/hour. |
That's it. Crunchbase, LinkedIn previews, Wikipedia, HN, Google News, and the company website all run unauthenticated.
Run it
uv run icebreaker "Patrick McKenzie at Stripe -- pitching observability"
You'll see:
- Prompt parse — what name + company + scenario IceBreak extracted.
- Identity enrichment — what GitHub / LinkedIn / Twitter / website it resolved.
- Field-by-field verification — for each of the 8 identity fields
(name, company, website, personal site, linkedin, github, twitter,
bio_snippet) the panel shows the resolved value, where it came from,
AND any other candidates that were considered. Per field:
[a]ccept— keep current value, advance[e]dit— type a new value[1]/[2]/[3]...— pick one of the other candidates the picker considered (rendered with title + snippet so you can compare — e.g., choose between several "Shivank Prajapati" LinkedIn URLs)[r]e-search— fire a per-field tailored Tavily query (site:linkedin.comfor LinkedIn,site:github.comfor GitHub, etc.). Merges new candidates into the list. ONE Tavily call per re-search. For prompt-derived fields (name, company),[r]falls back to full re-enrichment.[c]lear— set the field to None[q]uit— abort the run
- Connector preview — every connector lists what URL or query it'll
fire (
github → github.com/janedoe,hackernews → "Jane Doe" "Acme"). Types <name>to skip a source you don't want to run, thengo. - Connector fan-out — N sources in parallel, each bounded by the per-source timeout.
- Aggregation → dossier → briefing — facts deduped, profile synthesized, openers generated.
- Brief — rendered to your terminal.
Useful flags
--inspect— render per-source facts in tables and dump structured JSON to.cache/inspect-<ts>.json. Skips the final LLM briefing call. Use this when iterating on a connector to see what it actually contributed.--raw— also capture the raw response per source (HTML, JSON, RSS) to.cache/raw-<ts>.json. Combine with--inspectfor full debugging.--browse— launch Microsoft Edge with a persistent profile so Crunchbase can be scraped past Cloudflare. Solve the challenge once interactively; cookies persist in.cache/edge-profile/.--github <user>/--linkedin-url <url>— override identity enrichment when you already know the handles.--no-verify— skip the human-in-the-loop confirmation. Required for non-interactive runs.
What each source contributes
| Source | Best for | Failure mode |
|---|---|---|
github |
Engineers, OSS maintainers — bio, pinned repos (when token set, falls back to top-by-stars), READMEs, recent commits, repo homepages → personal site discovery | 60 req/hr without token |
hackernews |
Technical founders, infra/devtools — stories + comments | None worth mentioning |
googlenews |
Recent press, funding, launches | Rate limit on the RSS endpoint |
wikipedia |
Famous people / public companies | Long-tail prospects → empty |
website |
Company description, team page, founding year | Marketing-only sites are thin |
personal_site |
Engineer's blog or portfolio (auto-discovered from GitHub user.blog or repo homepage fields) |
Skipped when no personal URL discovered |
linkedin (preview) |
Headline + short bio (no login needed) | Authwall (status 999) on most pages |
crunchbase |
Funding stage, founding date, HQ | Cloudflare blocks plain HTTP — use --browse |
web_search |
Synthetic source: re-uses the bio snippet from Tavily/Brave web search | Skipped if no bio_snippet was discovered |
The Orchestrator runs them all in parallel with a per-source timeout. A
slow or failing source never blocks the brief — it just shows up in
sources_missed in the output.
Caching
A local SQLite cache (.cache/icebreaker.db) sits in front of the two
cost-bearing surfaces:
- Web search (Tavily / Brave): 24h TTL, keyed by
(provider, query, max_results). - GitHub: 6h TTL on the full per-username response bundle (profile + events + repos + READMEs).
Re-running the same prompt within these windows pays nothing for those calls. Force a refresh by deleting the DB:
rm .cache/icebreaker.db
The other connectors aren't cached — they're cheap public endpoints and their freshness matters more than the saved roundtrip.
Tests
uv run pytest
147 tests, runs in ~6s. The suite uses respx to mock HTTP, so it doesn't
hit live APIs. Each test gets an isolated cache via tests/conftest.py.
Evals (development-only)
There's a small rubric-based eval harness in evals/ for regression
testing during development — it's not part of the production runtime
and doesn't need to be deployed.
uv run python -m evals.run # facts + dossier only (cheap)
uv run python -m evals.run --with-briefing # also runs Summarizer
It scores each prospect in evals/prospects.yaml against expected fields
(name resolved, min facts, dossier mentions specific topic, etc.) and
writes a JSONL artifact under .cache/evals-<ts>.jsonl. Use it after
non-trivial pipeline changes (picker tweaks, prompt edits, LLM-provider
swaps) to confirm a known-good prospect still produces the expected brief.
End-users running uv run icebreaker "..." never touch evals.
Optional: pre-push regression hook
There's a checked-in .githooks/pre-push that runs the eval rubric
before each git push, blocking the push when the overall pass rate
drops below 50% (configurable). Skips automatically when:
- only docs / non-code files changed in the commits being pushed
.envis missing (no API keys to run with)uvisn't onPATH
One-time setup per clone:
git config core.hooksPath .githooks
That's it — no chmod needed on Windows + Git Bash; on Linux/Mac run
chmod +x .githooks/pre-push once.
Tweak the threshold without editing the script:
ICEBREAK_EVAL_MIN=0.8 git push # require 80% pass rate
ICEBREAK_EVAL_MIN=1.0 git push # strict — every check must pass
Bypass for a one-off WIP push:
git push --no-verify
Architecture
ProspectInput → PromptParser → IdentityResolver → IdentityEnricher → Orchestrator (fan-out) → Aggregator → DossierBuilder → Summarizer → Brief.
Full data-flow walkthrough: docs/ARCHITECTURE.md.
Troubleshooting
Quickest answers in docs/TROUBLESHOOTING.md. Common ones:
| Symptom | Likely cause |
|---|---|
No LLM provider configured |
Add a *_API_KEY to .env (Groq is free) |
| Identity has no LinkedIn / X / website | TAVILY_API_KEY missing |
| GitHub 403 rate-limit | Add GITHUB_TOKEN to .env |
| Crunchbase always misses | Cloudflare-blocked. Re-run with --browse |
LinkedIn shows outcome: authwall in --raw |
Expected — public preview endpoint refuses many profiles |
| Brief feels generic | Run with --inspect to see which sources actually contributed |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file icebreak_cli-0.0.1.tar.gz.
File metadata
- Download URL: icebreak_cli-0.0.1.tar.gz
- Upload date:
- Size: 167.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
653d1abaa68f75a79f4b1f61dba8fbe162c050ecc31037423ea780b0d46e88e3
|
|
| MD5 |
28f002b662a93bb0857df77c25e4efc9
|
|
| BLAKE2b-256 |
ffcfcd196babeb1d8e51d867b5f8f065bf62f3d884c390b90fb62d1773b738ff
|
File details
Details for the file icebreak_cli-0.0.1-py3-none-any.whl.
File metadata
- Download URL: icebreak_cli-0.0.1-py3-none-any.whl
- Upload date:
- Size: 79.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a6791bde87f1e067495eda59213ca4351fa2512b7365c68f3534377df5b1f88
|
|
| MD5 |
0618f4452f75dfe598590a2b0cc7fe69
|
|
| BLAKE2b-256 |
4061b087a9838e2c8d144a8d1735dfa47f16430d269441c34872c92ef2f93498
|