Skip to main content

CLI for extracting emails and sending customer.io events

Project description

   _____ _                   ________                    
  / ___/(_)___ _____  ____ _/ / ____/___  _________ ____ 
  \__ \/ / __ `/ __ \/ __ `/ / /_  / __ \/ ___/ __ `/ _ \
 ___/ / / /_/ / / / / /_/ / / __/ / /_/ / /  / /_/ /  __/
/____/_/\__, /_/ /_/\__,_/_/_/    \____/_/   \__, /\___/ 
       /____/                               /____/       

Mine developer signals. Enrich with emails. Fire into your CRM.

๐ŸŸฃ PyPI v0.5.1 ย |ย  ๐Ÿ Python 3.11+ ย |ย  ๐Ÿ“„ MIT ย |ย  โšก Built on Backboard.io ย |ย  ๐Ÿ“ฌ Customer.io ย |ย  ๐Ÿ† Devpost

SignalForge scrapes Devpost hackathons, GitHub forks, and RB2B visitor exports โ€” enriches every lead with real emails โ€” then fires them straight into Customer.io. One command. Hundreds of warm leads.


What's inside

Command What it does
signalforge-devpost-search Search Devpost by keyword โ†’ enrich with emails โ†’ export CSV
signalforge-participants Scrape one hackathon's participants โ†’ CSV
signalforge-harvest Walk the full hackathon listing โ†’ SQLite โ†’ delta Customer.io events
signalforge-github-forks Mine fork owners from any GitHub repo โ†’ emails โ†’ SQLite
signalforge-rb2b Import RB2B visitor CSVs โ†’ SQLite โ†’ visited_site events
signalforge-auto Full daily scrape: RB2B today + open hackathons + all tracked GitHub repos (no emit)

Install

pip install signalforge-cli

Or with uv (recommended for local dev):

uv sync

30-second quickstart

# 1. Copy env and fill in your keys
cp .env.example .env

# 2. Search Devpost and get a CSV of leads with emails
signalforge-devpost-search "ai agents" -o leads.csv

# 3. Scrape all open hackathons, enrich new participants, emit to Customer.io
signalforge-harvest --emit-events

How it works

  Devpost / GitHub / RB2B
         โ”‚
         โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚   fast scan / search โ”‚  (no enrichment yet โ€” just IDs)
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
             โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚   SQLite upsert      โ”‚  detect NEW rows only (delta)
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
             โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚   email enrichment   โ”‚  GitHub API โ†’ profile walking โ†’ regex
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
             โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚   Customer.io emit   โ”‚  identify + track  (once per lead, ever)
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Delta logic: on re-runs, only new participants get the expensive enrichment. Already-emitted leads are never re-fired. Safe to run on a cron.


Environment

Copy .env.example โ†’ .env:

Variable Required for Notes
BACKBOARD_API_KEY signalforge Backboard account key
DEVPOST_ASSISTANT_ID auto Saved on first run, reused after
DEVPOST_SESSION signalforge-participants, signalforge-harvest _devpost cookie from browser DevTools
GITHUB_TOKEN optional PAT for 5 000 req/hr vs 60. Zero scopes needed
CUSTOMERIO_SITE_ID --emit-events Customer.io Track API
CUSTOMERIO_API_KEY --emit-events Customer.io Track API

Commands

signalforge-devpost-search โ€” Devpost project search

Search Devpost by keyword, enrich each hit with the detail page + author email, export CSV.

signalforge-devpost-search "ai agents" --output results.csv
signalforge-devpost-search "climate tech" "developer tools" -o results.csv

signalforge-participants โ€” single hackathon

Scrape one hackathon's full participant list.

# First time โ€” hand over your session cookie
signalforge-participants "https://authorizedtoact.devpost.com/participants" \
  --jwt "<_devpost cookie value>" -o participants.csv

# Subsequent runs โ€” reuses saved session
signalforge-participants "https://authorizedtoact.devpost.com/participants" -o out.csv

# Fast mode (skip email enrichment)
signalforge-participants "https://..." --no-email -o out.csv

# Enrich + emit to Customer.io in one shot
signalforge-participants "https://..." --emit-events -o out.csv

signalforge-harvest โ€” full automated pipeline

Walks the Devpost hackathon listing, scrapes every participant, stores in SQLite, and emits Customer.io events for net-new leads.

# Standard run โ€” open hackathons, 3 pages, enrich + emit
signalforge-harvest --emit-events

# Bulk first scrape without enrichment (fast)
signalforge-harvest --pages 5 --no-email

# Catch up: emit all unsent leads already in the DB (no scraping needed)
signalforge-harvest --emit-unsent

# Re-scan for new joiners, enrich + emit delta
signalforge-harvest --rescrape --emit-events

# Include ended hackathons too
signalforge-harvest --status open --status ended --pages 5

# Export everyone who has a LinkedIn URL but no email yet (CSV for manual outreach)
signalforge-harvest --export-linkedin -o linkedin_leads.csv

Flags

Flag Default Description
--pages N 3 Hackathon listing pages (9 hackathons each)
--hackathons N 0 (all) Stop after the first N hackathons
--max-participants N 0 (unlimited) Cap per hackathon
--jwt TOKEN .env Devpost _devpost session cookie
--db PATH devpost_harvest.db SQLite path
--status open open / ended / upcoming (repeatable)
--no-email off Skip enrichment entirely
--emit-events off Emit Customer.io events for delta participants
--emit-unsent off Just emit โ€” no scraping
--rescrape off Re-scrape already-seen hackathons
--export-linkedin off Export CSV of all leads with LinkedIn but no email
--output / -o PATH stdout Output path for --export-linkedin

SQLite schema

  • hackathons โ€” url, title, org, state, dates, registrations, prize, themes, last_scraped_at
  • participants โ€” (hackathon_url, username) PK + enrichment fields + first_seen_at, last_seen_at, event_emitted_at

Customer.io events

Event name: devpost_hackathon. Email = Customer.io user ID. Payload: hackathon_url, hackathon_title, username, name, specialty, profile_url, github_url, linkedin_url.

Email templates in emails/ use {{customer.first_name}} and {{event.*}} Liquid variables.


signalforge-github-forks โ€” GitHub fork mining

Pull every fork owner from a repo, enrich with public emails, store in the same SQLite DB.

# Built-in presets
signalforge-github-forks --preset mem0 --emit-events
signalforge-github-forks --preset supermemory --no-email

# Any repo
signalforge-github-forks --repo owner/repo --limit 1000 --mode first_n
Flag Default Description
--preset โ€” mem0 or supermemory shorthand
--repo OWNER/REPO โ€” Any public GitHub repo
--limit N 2000 Max forks to process
--mode preset-dependent top_by_pushed or first_n
--no-email off Skip email lookup
--emit-events off Emit Customer.io events
--force-email off Re-enrich all forks, not just new ones

signalforge-auto โ€” full daily scrape

Runs all three scrapers in sequence, then exits without emitting events. Use this as your daily cron job; fire --emit-unsent on each source afterwards.

What it runs:

  1. signalforge-rb2b --fetch-date TODAY โ€” pulls today's RB2B visitor export
  2. signalforge-harvest --status open --pages 100 โ€” walks all open Devpost hackathons
  3. signalforge-github-forks --repo OWNER/REPO --limit 5000 โ€” for every repo already tracked in the DB
# Standard daily run
signalforge-auto

# Custom date / page depth
signalforge-auto --fetch-date 2026-03-31 --pages 50 --fork-limit 2000

# Skip email enrichment (much faster, enrich later with --force-email)
signalforge-auto --no-email

# Then flush the queue when ready
signalforge-harvest --emit-unsent
signalforge-github-forks --emit-unsent
signalforge-rb2b --emit-unsent
Flag Default Description
--db PATH devpost_harvest.db SQLite path
--pages N 100 Devpost listing pages
--fork-limit N 5000 Max forks per GitHub repo
--fetch-date YYYY-MM-DD today RB2B export date
--no-email off Skip email enrichment
--jwt TOKEN .env Devpost session cookie

signalforge-rb2b โ€” RB2B visitor import

Load RB2B daily export CSVs and fire visited_site events for identified visitors.

# Import and emit new identified visitors
signalforge-rb2b daily_2026-03-*.csv --emit-events

# Just drain the unsent queue
signalforge-rb2b --emit-unsent

Requirements

  • Python 3.11+
  • uv (for local dev)
  • Backboard API key (for signalforge-devpost-search keyword search only)

Development

uv run python -m devpost_scraper.cli "ai agents" --output out.csv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signalforge_cli-0.5.1.tar.gz (36.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

signalforge_cli-0.5.1-py3-none-any.whl (37.2 kB view details)

Uploaded Python 3

File details

Details for the file signalforge_cli-0.5.1.tar.gz.

File metadata

  • Download URL: signalforge_cli-0.5.1.tar.gz
  • Upload date:
  • Size: 36.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for signalforge_cli-0.5.1.tar.gz
Algorithm Hash digest
SHA256 a8798e37bac1d861dcdc8357151a2d26704836913422cc2fe76852d5a11097c6
MD5 b34a1e1878f6f5ce66a4c5e544010b76
BLAKE2b-256 2d26dbb544b0bd673c0d4c84013fb693c0f243e0fcea1bb815412d4be2bb1211

See more details on using hashes here.

File details

Details for the file signalforge_cli-0.5.1-py3-none-any.whl.

File metadata

File hashes

Hashes for signalforge_cli-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2c8efec50cf668d05f2bdab1d4be316e5ab1a13903cdc0602b3176a71e5fab87
MD5 900079cbcabe06707a7cad90d4d81573
BLAKE2b-256 54d265f3e30c65565a941d4c1ceeed8469cf265804ade1a0148ebec2a4dbbef1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page