Skip to main content

CLI for extracting emails and sending customer.io events

Project description

   _____ _                   ________                    
  / ___/(_)___ _____  ____ _/ / ____/___  _________ ____ 
  \__ \/ / __ `/ __ \/ __ `/ / /_  / __ \/ ___/ __ `/ _ \
 ___/ / / /_/ / / / / /_/ / / __/ / /_/ / /  / /_/ /  __/
/____/_/\__, /_/ /_/\__,_/_/_/    \____/_/   \__, /\___/ 
       /____/                               /____/       

Mine developer signals. Enrich with emails. Fire into your CRM.

๐ŸŸฃ PyPI v0.6.0 ย |ย  ๐Ÿ Python 3.11+ ย |ย  ๐Ÿ“„ MIT ย |ย  โšก Built on Backboard.io ย |ย  ๐Ÿ“ฌ Customer.io ย |ย  ๐Ÿ† Devpost

SignalForge scrapes Devpost hackathons, GitHub forks, and RB2B visitor exports โ€” enriches every lead with real emails โ€” then fires them straight into Customer.io. One command. Hundreds of warm leads.


What's inside

Command What it does
signalforge-devpost-search Search Devpost by keyword โ†’ enrich with emails โ†’ export CSV
signalforge-participants Scrape one hackathon's participants โ†’ CSV
signalforge-harvest Walk the full hackathon listing โ†’ SQLite โ†’ delta Customer.io events
signalforge-github-forks Mine fork owners from any GitHub repo โ†’ emails โ†’ SQLite
signalforge-gh-search Search GitHub repos by keyword โ†’ collect owner emails โ†’ SQLite
signalforge-rb2b Import RB2B visitor CSVs โ†’ SQLite โ†’ visited_site events
signalforge-auto Full daily scrape: RB2B today + open hackathons + all tracked GitHub repos (no emit)
signalforge-emit-all Flush every unsent event across all sources in one shot
signalforge-emit-batch Emit up to --batch-size events per source bucket โ€” cron-friendly
signalforge-auto-batch One cron command: daily scrape + emit batch in a single run
signalforge-lookup Search the DB by email, name, or username โ€” show full lead context
signalforge-assistant Interactive AI analyst REPL over your lead database

Install

pip install signalforge-cli

Or with uv (recommended for local dev):

uv sync

30-second quickstart

# 1. Copy env and fill in your keys
cp .env.example .env

# 2. Search Devpost and get a CSV of leads with emails
signalforge-devpost-search "ai agents" -o leads.csv

# 3. Scrape all open hackathons, enrich new participants, emit to Customer.io
signalforge-harvest --emit-events

How it works

  Devpost / GitHub / RB2B
         โ”‚
         โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚   fast scan / search โ”‚  (no enrichment yet โ€” just IDs)
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
             โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚   SQLite upsert      โ”‚  detect NEW rows only (delta)
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
             โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚   email enrichment   โ”‚  GitHub API โ†’ profile walking โ†’ regex
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
             โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚   Customer.io emit   โ”‚  identify + track  (once per lead, ever)
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Delta logic: on re-runs, only new participants get the expensive enrichment. Already-emitted leads are never re-fired. Safe to run on a cron.


Environment

Copy .env.example โ†’ .env:

Variable Required for Notes
BACKBOARD_API_KEY signalforge Backboard account key
DEVPOST_ASSISTANT_ID auto Saved on first run, reused after
DEVPOST_SESSION signalforge-participants, signalforge-harvest _devpost cookie from browser DevTools
GITHUB_TOKEN optional PAT for 5 000 req/hr vs 60. Zero scopes needed
CUSTOMERIO_SITE_ID --emit-events Customer.io Track API
CUSTOMERIO_API_KEY --emit-events Customer.io Track API

Commands

signalforge-devpost-search โ€” Devpost project search

Search Devpost by keyword, enrich each hit with the detail page + author email, export CSV.

signalforge-devpost-search "ai agents" --output results.csv
signalforge-devpost-search "climate tech" "developer tools" -o results.csv

signalforge-participants โ€” single hackathon

Scrape one hackathon's full participant list.

# First time โ€” hand over your session cookie
signalforge-participants "https://authorizedtoact.devpost.com/participants" \
  --jwt "<_devpost cookie value>" -o participants.csv

# Subsequent runs โ€” reuses saved session
signalforge-participants "https://authorizedtoact.devpost.com/participants" -o out.csv

# Fast mode (skip email enrichment)
signalforge-participants "https://..." --no-email -o out.csv

# Enrich + emit to Customer.io in one shot
signalforge-participants "https://..." --emit-events -o out.csv

signalforge-harvest โ€” full automated pipeline

Walks the Devpost hackathon listing, scrapes every participant, stores in SQLite, and emits Customer.io events for net-new leads.

# Standard run โ€” open hackathons, 3 pages, enrich + emit
signalforge-harvest --emit-events

# Bulk first scrape without enrichment (fast)
signalforge-harvest --pages 5 --no-email

# Catch up: emit all unsent leads already in the DB (no scraping needed)
signalforge-harvest --emit-unsent

# Re-scan for new joiners, enrich + emit delta
signalforge-harvest --rescrape --emit-events

# Include ended hackathons too
signalforge-harvest --status open --status ended --pages 5

# Export everyone who has a LinkedIn URL but no email yet (CSV for manual outreach)
signalforge-harvest --export-linkedin -o linkedin_leads.csv

Flags

Flag Default Description
--pages N 3 Hackathon listing pages (9 hackathons each)
--hackathons N 0 (all) Stop after the first N hackathons
--max-participants N 0 (unlimited) Cap per hackathon
--jwt TOKEN .env Devpost _devpost session cookie
--db PATH devpost_harvest.db SQLite path
--status open open / ended / upcoming (repeatable)
--no-email off Skip enrichment entirely
--emit-events off Emit Customer.io events for delta participants
--emit-unsent off Just emit โ€” no scraping
--rescrape off Re-scrape already-seen hackathons
--export-linkedin off Export CSV of all leads with LinkedIn but no email
--output / -o PATH stdout Output path for --export-linkedin

SQLite schema

  • hackathons โ€” url, title, org, state, dates, registrations, prize, themes, last_scraped_at
  • participants โ€” (hackathon_url, username) PK + enrichment fields + first_seen_at, last_seen_at, event_emitted_at

Customer.io events

Event name depends on how old the hackathon is:

Condition Event name
Hackathon is open, or closed within the last 30 days devpost_hackathon
Hackathon closed more than 30 days ago closed_hackathon

Email = Customer.io user ID. Payload: hackathon_url, hackathon_title, username, name, specialty, profile_url, github_url, linkedin_url.

Email templates in emails/ use {{customer.first_name}} and {{event.*}} Liquid variables.


signalforge-github-forks โ€” GitHub fork mining

Pull every fork owner from a repo, enrich with public emails, store in the same SQLite DB.

# Built-in presets
signalforge-github-forks --preset mem0 --emit-events
signalforge-github-forks --preset supermemory --no-email

# Any repo
signalforge-github-forks --repo owner/repo --limit 1000 --mode first_n
Flag Default Description
--preset โ€” mem0 or supermemory shorthand
--repo OWNER/REPO โ€” Any public GitHub repo
--limit N 2000 Max forks to process
--mode preset-dependent top_by_pushed or first_n
--no-email off Skip email lookup
--emit-events off Emit Customer.io events
--force-email off Re-enrich all forks, not just new ones

signalforge-auto โ€” full daily scrape

Runs all three scrapers in sequence, then exits without emitting events. Use this as your daily cron job; fire --emit-unsent on each source afterwards.

What it runs:

  1. signalforge-rb2b --fetch-date TODAY โ€” pulls today's RB2B visitor export
  2. signalforge-harvest --status open --pages 100 โ€” walks all open Devpost hackathons
  3. signalforge-github-forks --repo OWNER/REPO --limit 5000 โ€” for every repo already tracked in the DB
# Standard daily run
signalforge-auto

# Custom date / page depth
signalforge-auto --fetch-date 2026-03-31 --pages 50 --fork-limit 2000

# Skip email enrichment (much faster, enrich later with --force-email)
signalforge-auto --no-email

# Then flush the queue when ready
signalforge-harvest --emit-unsent
signalforge-github-forks --emit-unsent
signalforge-rb2b --emit-unsent
Flag Default Description
--db PATH devpost_harvest.db SQLite path
--pages N 100 Devpost listing pages
--fork-limit N 5000 Max forks per GitHub repo
--fetch-date YYYY-MM-DD today RB2B export date
--no-email off Skip email enrichment
--jwt TOKEN .env Devpost session cookie

signalforge-rb2b โ€” RB2B visitor import

Load RB2B daily export CSVs and fire visited_site events for identified visitors.

# Import and emit new identified visitors
signalforge-rb2b daily_2026-03-*.csv --emit-events

# Just drain the unsent queue
signalforge-rb2b --emit-unsent

signalforge-gh-search โ€” GitHub repo search

Search GitHub repos by keyword, collect owner emails via the GitHub API, and store results in the harvest DB (hackathon_url = github:search:<query-slug>).

# Search and enrich owners
signalforge-gh-search "ai memory" --max 200 --emit-events

# Top results by stars (default), forks, or recency
signalforge-gh-search "langchain rag" --sort forks --max 500

# Skip enrichment now, drain later
signalforge-gh-search "vector database" --no-email
signalforge-gh-search --emit-unsent
Flag Default Description
query โ€” GitHub search query (e.g. "AI memory")
--max N 100 Max repos to retrieve (GitHub caps at 1000)
--sort stars stars / forks / updated
--db PATH devpost_harvest.db SQLite path
--no-email off Skip email enrichment
--force-email off Re-enrich owners already in the DB
--emit-events off Emit github_search events to Customer.io
--emit-limit N 0 (all) Cap --emit-events to N owners
--emit-unsent off Skip search โ€” flush unsent queue only

signalforge-emit-all โ€” flush all unsent events

Drain the unsent queue for every source in one shot: Devpost hackathons, GitHub fork owners, GitHub search owners, and RB2B visitors.

signalforge-emit-all
signalforge-emit-all --db my_harvest.db
Flag Default Description
--db PATH devpost_harvest.db SQLite path

signalforge-auto-batch โ€” daily scrape + emit in one command

The single cron entry you actually need. Runs the full daily scrape (signalforge-auto) and then immediately emits one batch from every source bucket (signalforge-emit-batch).

# Add to crontab โ€” runs at 6 AM daily
0 6 * * * /path/to/venv/bin/signalforge-auto-batch >> /var/log/signalforge.log 2>&1

# Manual run with defaults
signalforge-auto-batch

# Smaller emit batch (useful during warm-up while the queue is large)
signalforge-auto-batch --batch-size 500

# Skip email enrichment for a faster run
signalforge-auto-batch --no-email --batch-size 1000
Flag Default Description
--batch-size N 2000 Max events to emit per source bucket after scrape
--pages N 100 Devpost listing pages to fetch
--fork-limit N 5000 Max forks per GitHub repo
--fetch-date YYYY-MM-DD today RB2B export date
--no-email off Skip email enrichment
--jwt TOKEN .env Devpost session cookie
--db PATH devpost_harvest.db SQLite path

signalforge-emit-batch โ€” batched emit (cron-friendly)

Emit up to --batch-size events from each of four source buckets in a single run. Unlike signalforge-emit-all (which drains the entire queue), this lets you pace delivery โ€” run it on a cron until the queue is empty.

Bucket Event name Source
Devpost open / recently-closed (โ‰ค30 days) devpost_hackathon harvest DB
Devpost old-closed (>30 days) closed_hackathon harvest DB
GitHub fork owners github_fork harvest DB
RB2B identified visitors visited_site harvest DB
# Emit up to 2000 per bucket (default)
signalforge-emit-batch

# Smaller batches โ€” good for warming up or rate-limiting
signalforge-emit-batch --batch-size 500

# Custom DB
signalforge-emit-batch --batch-size 1000 --db my_harvest.db
Flag Default Description
--batch-size N 2000 Max events to emit per source bucket
--db PATH devpost_harvest.db SQLite path

signalforge-lookup โ€” contact lookup

Search the SQLite DB by email address, name, or username and print the full lead record with hackathon context.

signalforge-lookup alice@example.com
signalforge-lookup "Alice Smith"
signalforge-lookup alicedev
signalforge-lookup alice --db my_harvest.db
Flag Default Description
query โ€” Email, name, or username to search
--db PATH devpost_harvest.db SQLite path

signalforge-assistant โ€” AI analyst REPL

Interactive natural-language interface to your lead database powered by a Backboard AI assistant. Ask questions, query leads, and get summaries without writing SQL.

signalforge-assistant
signalforge-assistant --db my_harvest.db
> How many participants have emails from the last 30 days?
> Show me leads from AI-themed hackathons with a GitHub URL
> Which hackathons have the most unemitted participants?
Flag Default Description
--db PATH devpost_harvest.db SQLite path

Requires BACKBOARD_API_KEY. Model defaults to gpt-4o-mini; override with BACKBOARD_MODEL and BACKBOARD_LLM_PROVIDER.


Requirements

  • Python 3.11+
  • uv (for local dev)
  • Backboard API key (for signalforge-devpost-search keyword search only)

Development

uv run python -m devpost_scraper.cli "ai agents" --output out.csv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signalforge_cli-0.6.0.tar.gz (48.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

signalforge_cli-0.6.0-py3-none-any.whl (48.2 kB view details)

Uploaded Python 3

File details

Details for the file signalforge_cli-0.6.0.tar.gz.

File metadata

  • Download URL: signalforge_cli-0.6.0.tar.gz
  • Upload date:
  • Size: 48.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for signalforge_cli-0.6.0.tar.gz
Algorithm Hash digest
SHA256 1b426ee0fb54e76de60587d032acfe62d6b91df5cba238d7b2017718baf1db02
MD5 0ba763475ac9489ead6a6632f6bfae26
BLAKE2b-256 a4dba805642e9204e5c0c7c4ef799000779180ee44b436a30caa6fd0c2ec16e2

See more details on using hashes here.

File details

Details for the file signalforge_cli-0.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for signalforge_cli-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 08df4396e58188057613c0429aeec79a0ff0f3f7a60cfa6ce2b3c4576adadfab
MD5 5863dae64854af2e14bdd9bbce9cacb8
BLAKE2b-256 ccf1e6ac438f2cf2ebb1318a2a4afec30fcbaf2f693304f4981cc411bb638e6d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page