Skip to main content

Agent-first company and contact discovery for outbound lead generation.

Project description

Leads

An agent-first, memory-first company and contact research engine. Strict JSON specs drive deterministic memory retrieval, focused Exa searches, structured LLM evaluation, targeted official-site enrichment, persistence, and reviewable CSV/Markdown/JSON artifacts.

Design and rebuild notes live in NOTES/.

Install

The canonical install path is pipx. The package is published as leads-cli because leads is already taken on PyPI, but it still installs the leads command. The installer scripts are thin convenience wrappers around pipx install leads-cli or pipx upgrade leads-cli, followed by leads init.

macOS and Linux

curl -fsSL https://raw.githubusercontent.com/paoloauletta/leads/main/install.sh | bash

Windows PowerShell

irm https://raw.githubusercontent.com/paoloauletta/leads/main/install.ps1 | iex

Direct pipx install

pipx install leads-cli
leads init

Use LEADS_SKIP_INIT=1 with either installer when you want to install first and run onboarding later.

Onboarding

Run:

leads init

The wizard creates one local workspace, stores config and secrets, initializes the SQLite database, and installs bundled skills into the agent targets you choose, such as Codex, Claude Code, or OpenCode. After setup, use one of those agents to create a spec, run discovery, and summarize the selected leads.

Runtime data defaults to the OS-appropriate Leads application data folder. Override it with LEADS_HOME=/path/to/data when needed.

Leads supports OpenAI-compatible providers, Anthropic Claude, and Google Gemini from onboarding. LLM_RESPONSE_FORMAT=auto uses strict JSON Schema with OpenAI, native structured-output APIs with Anthropic/Gemini, and validated JSON Object mode with DeepSeek or other compatible providers. Override it only when a provider documents support for a different mode.

Workspace Layout

leads init creates one workspace root with these top-level directories:

backups/
config/
data/
logs/
runs/
skills/
specs/

config/ contains local settings, secrets, and runtime metadata. data/company_memory.db is the SQLite memory database. specs/companies/ and specs/contacts/ are where agent-created specs belong. runs/ contains discovery and enrichment artifacts. backups/ stores migration and reset backups. skills/ stores bundled skill copies and install metadata. logs/leads.log is a CLI diagnostic log for troubleshooting; it is not lead evidence or a run artifact.

Commands

leads init
leads doctor
leads init-db
leads version
leads update --check
leads migrate --check
leads config show
leads skills status
leads companies discover --spec company_search_spec.json
leads companies enrich DISCOVERY_RUN_ID
leads companies show-run RUN_ID
leads companies inspect RUN_ID --domain example.com
leads companies export RUN_ID
leads companies rerun RUN_ID
leads companies show-enrichment ENRICHMENT_RUN_ID
leads companies inspect-enrichment ENRICHMENT_RUN_ID --domain example.com
leads companies export-enrichment ENRICHMENT_RUN_ID
leads contacts validate-spec --spec contact_search_spec.json
leads contacts discover --spec contact_search_spec.json
leads contacts enrich CONTACT_DISCOVERY_RUN_ID
leads contacts show-run CONTACT_DISCOVERY_RUN_ID
leads contacts inspect CONTACT_DISCOVERY_RUN_ID --person "Jane Smith"
leads contacts export CONTACT_DISCOVERY_RUN_ID
leads contacts show-enrichment CONTACT_ENRICHMENT_RUN_ID
leads contacts inspect-enrichment CONTACT_ENRICHMENT_RUN_ID --person "Jane Smith"
leads contacts export-enrichment CONTACT_ENRICHMENT_RUN_ID

leads init-db creates company_memory.db and its schema. If the database already exists, it asks before resetting it. An accepted reset moves the existing runs/ directory to a timestamped archive such as runs-previousdb-20260622T184500Z/, then creates a new empty runs/ directory.

leads migrate --check is read-only. leads migrate --apply creates a timestamped backup before supported structural schema changes and refuses unknown migration paths.

Use --verbose on discover to print generated queries and candidate-level decisions.

Development Setup

python -m venv .venv
.venv/bin/pip install -e '.[dev]'
.venv/bin/leads init

For a local smoke test, create or copy a company spec, configure provider keys during onboarding, then run:

leads companies discover --spec company_search_spec.json

Multiple verticals

Use verticals to request OR semantics: companies may match construction, healthcare, or engineering; they do not need to match all three. Each vertical gets an independent memory scan, gap calculation, Exa query plan, and evaluation lane.

Each vertical now uses one simple shape: key, label, and optional query hints. Use search_terms when the label alone is too broad or niche, and exclude_terms when a vertical needs a few search-time negatives. Old specs that still contain mode, seed_terms, or anti_terms remain readable and normalize to the new shape.

balance_mode controls final selection. soft (the default) fills an equal quality-gated floor per vertical, then reallocates unused slots to good companies from stronger lanes. strict keeps equal caps and may return fewer companies. none selects good companies in discovery order.

The legacy single vertical object remains accepted for existing specs.

Memory policy

novelty_mode controls whether saved companies can enter a run:

  • unused_memory (default) searches memory first and only considers companies never selected before.
  • only_new skips memory candidates and removes externally rediscovered domains already in memory.
  • full_memory searches all matching memory, including companies selected in previous runs.

Old prefer_new and allow_known specs remain readable and normalize to unused_memory and full_memory, respectively.

Enrichment

Enrichment is always a separate command run after discovery completes:

leads companies discover --spec company_search_spec.json
leads companies enrich DISCOVERY_RUN_ID

It consumes selected companies directly from the completed discovery run. It retains company name, root domain, target vertical, geography, employee estimate, ownership type, and discovery evidence, then finds only the missing LinkedIn company profile, phone, complete in-scope address, and independence status.

Each enrichment execution gets a random run ID such as company-enrich-a1b2c3d4e5f6. That ID is used both for CLI follow-up commands and the enrich artifact folder under the source discovery run.

Fresh enrichment facts are reused by company/domain before any website request. The bounded website pass reads the homepage and best contact/location/about pages; unresolved fields can use a narrow Exa corroboration search. Output is split into enriched.csv, review.csv, and blocked.csv, while the enrichment run.json keeps field provenance, conflicts, and the per-company trace.

LinkedIn enrichment first checks company-profile links exposed by the official website, including footer icon links. Only /company/... URLs are accepted; personal profiles, jobs, and posts are discarded. If the official site has no profile link, enrichment performs a narrow LinkedIn company search. The normalized URL and its source page are saved in enrichment memory and exported as linkedin_url.

By default, complete profiles with unknown independence remain in review. Add --allow-unknown-independence only when that uncertainty is acceptable. Generic values such as privately_held never count as proof of independence.

To exclude family businesses during enrichment, add this to the discovery spec:

"exclude": {
  "structured": {"ownership_signals": ["family_owned"]}
}

Enrichment still records the company as independent, but sends it to blocked.csv with a fit_conflict and excluded_family_owned flag. The ownership signal is retained in enrichment memory, so the same rule applies when a later run reuses fresh facts.

Contact Discovery

Contact discovery is a separate phase after company enrichment. It starts from a completed company-enrich-<id>, uses only its ready companies by default, and finds current people matching structured role targets.

cp examples/contact_search_spec.json contact_search_spec.json
leads contacts validate-spec --spec contact_search_spec.json
leads contacts discover --spec contact_search_spec.json

For every company and role, the command reuses accepted contact memory from the last 30 days, then uses one Exa people-index query plus one official-domain evidence query for each remaining per-company gap. The LLM evaluates identity, current employment at the exact target company, and requested-title fit. A model cannot force an acceptance when those explicit checks are not satisfied.

Artifacts are split into accepted.csv, review.csv, and rejected.csv. All three use the same client-facing columns:

company_name, company_domain, contact_name, title, linkedin_url,
email, phone, status, notes

email and phone are intentionally blank during discovery. Full queries, raw Exa results, evidence, role keys, verdict details, and memory/live source decisions are retained in run.json.

Contact Enrichment

Contact enrichment is a separate Apollo-backed command after contact discovery:

leads contacts enrich contact-discover-a1b2c3d4e5f6

Only accepted contacts enter enrichment. Live-web discovery remains authoritative for the person's identity, current company, title, role, and LinkedIn URL; Apollo can add email and phone channels but cannot overwrite those facts. Exact identity and company/email-domain checks classify each person as ready, review, or blocked, with raw Apollo trace and flags retained in run.json.

Apollo bulk requests are sent in groups of 10. Phone and waterfall requests are asynchronous, so the default email-and-phone command requires APOLLO_WEBHOOK_URL and polls Apollo's request result until completion. Use --no-phone for an email-only run when a webhook is unavailable. Fresh Apollo results are reused for 14 days unless --refresh is supplied.

Artifacts live below the source contact run in contacts/contact-discover-<id>/enrich/contact-enrich-<id>/ and retain the same compact client columns used by discovery. Output is split into ready.csv, review.csv, and blocked.csv.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

leads_cli-0.1.1.tar.gz (87.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

leads_cli-0.1.1-py3-none-any.whl (124.9 kB view details)

Uploaded Python 3

File details

Details for the file leads_cli-0.1.1.tar.gz.

File metadata

  • Download URL: leads_cli-0.1.1.tar.gz
  • Upload date:
  • Size: 87.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.6

File hashes

Hashes for leads_cli-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a951c49cac514fd7b0ea7902636825845f3e158d2a75fe290635122afd6e2f0f
MD5 40092d03d9c30ff035fd2d47df101c57
BLAKE2b-256 4f9f5e4e9840787262580527ac8afb921e5eb6c8b802c6e0d895b0794ee3e9cd

See more details on using hashes here.

File details

Details for the file leads_cli-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: leads_cli-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 124.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.6

File hashes

Hashes for leads_cli-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 17770fd442e0584deada4a17ee349655f418f702e8bf3520bf86504f29826316
MD5 d9effe96d5aa63463841284841b4d5fc
BLAKE2b-256 3a292e57320783e38081cc9c1e97c364b42087dc54697ffd9dd3d864cc274af5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page