Skip to main content

CLI for extracting emails and sending customer.io events

Project description

   _____ _                   ________                    
  / ___/(_)___ _____  ____ _/ / ____/___  _________ ____ 
  \__ \/ / __ `/ __ \/ __ `/ / /_  / __ \/ ___/ __ `/ _ \
 ___/ / / /_/ / / / / /_/ / / __/ / /_/ / /  / /_/ /  __/
/____/_/\__, /_/ /_/\__,_/_/_/    \____/_/   \__, /\___/ 
       /____/                               /____/       

Mine developer signals. Enrich with emails. Fire into your CRM.

๐ŸŸฃ PyPI v0.4.2 ย |ย  ๐Ÿ Python 3.11+ ย |ย  ๐Ÿ“„ MIT ย |ย  โšก Built on Backboard.io ย |ย  ๐Ÿ“ฌ Customer.io ย |ย  ๐Ÿ† Devpost

SignalForge scrapes Devpost hackathons, GitHub forks, and RB2B visitor exports โ€” enriches every lead with real emails โ€” then fires them straight into Customer.io. One command. Hundreds of warm leads.


What's inside

Command What it does
signalforge Search Devpost by keyword โ†’ enrich with emails โ†’ export CSV
signalforge-participants Scrape one hackathon's participants โ†’ CSV
signalforge-harvest Walk the full hackathon listing โ†’ SQLite โ†’ delta Customer.io events
signalforge-github-forks Mine fork owners from any GitHub repo โ†’ emails โ†’ SQLite
signalforge-rb2b Import RB2B visitor CSVs โ†’ SQLite โ†’ visited_site events

Install

pip install signalforge-cli

Or with uv (recommended for local dev):

uv sync

30-second quickstart

# 1. Copy env and fill in your keys
cp .env.example .env

# 2. Search Devpost and get a CSV of leads with emails
signalforge "ai agents" -o leads.csv

# 3. Scrape all open hackathons, enrich new participants, emit to Customer.io
signalforge-harvest --emit-events

How it works

  Devpost / GitHub / RB2B
         โ”‚
         โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚   fast scan / search โ”‚  (no enrichment yet โ€” just IDs)
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
             โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚   SQLite upsert      โ”‚  detect NEW rows only (delta)
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
             โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚   email enrichment   โ”‚  GitHub API โ†’ profile walking โ†’ regex
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
             โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚   Customer.io emit   โ”‚  identify + track  (once per lead, ever)
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Delta logic: on re-runs, only new participants get the expensive enrichment. Already-emitted leads are never re-fired. Safe to run on a cron.


Environment

Copy .env.example โ†’ .env:

Variable Required for Notes
BACKBOARD_API_KEY signalforge Backboard account key
DEVPOST_ASSISTANT_ID auto Saved on first run, reused after
DEVPOST_SESSION signalforge-participants, signalforge-harvest _devpost cookie from browser DevTools
GITHUB_TOKEN optional PAT for 5 000 req/hr vs 60. Zero scopes needed
CUSTOMERIO_SITE_ID --emit-events Customer.io Track API
CUSTOMERIO_API_KEY --emit-events Customer.io Track API

Commands

signalforge โ€” Devpost project search

Search Devpost by keyword, enrich each hit with the detail page + author email, export CSV.

signalforge "ai agents" --output results.csv
signalforge "climate tech" "developer tools" -o results.csv

signalforge-participants โ€” single hackathon

Scrape one hackathon's full participant list.

# First time โ€” hand over your session cookie
signalforge-participants "https://authorizedtoact.devpost.com/participants" \
  --jwt "<_devpost cookie value>" -o participants.csv

# Subsequent runs โ€” reuses saved session
signalforge-participants "https://authorizedtoact.devpost.com/participants" -o out.csv

# Fast mode (skip email enrichment)
signalforge-participants "https://..." --no-email -o out.csv

# Enrich + emit to Customer.io in one shot
signalforge-participants "https://..." --emit-events -o out.csv

signalforge-harvest โ€” full automated pipeline

Walks the Devpost hackathon listing, scrapes every participant, stores in SQLite, and emits Customer.io events for net-new leads.

# Standard run โ€” open hackathons, 3 pages, enrich + emit
signalforge-harvest --emit-events

# Bulk first scrape without enrichment (fast)
signalforge-harvest --pages 5 --no-email

# Catch up: emit all unsent leads already in the DB (no scraping needed)
signalforge-harvest --emit-unsent

# Re-scan for new joiners, enrich + emit delta
signalforge-harvest --rescrape --emit-events

# Include ended hackathons too
signalforge-harvest --status open --status ended --pages 5

Flags

Flag Default Description
--pages N 3 Hackathon listing pages (9 hackathons each)
--hackathons N 0 (all) Stop after the first N hackathons
--max-participants N 0 (unlimited) Cap per hackathon
--jwt TOKEN .env Devpost _devpost session cookie
--db PATH devpost_harvest.db SQLite path
--status open open / ended / upcoming (repeatable)
--no-email off Skip enrichment entirely
--emit-events off Emit Customer.io events for delta participants
--emit-unsent off Just emit โ€” no scraping
--rescrape off Re-scrape already-seen hackathons

SQLite schema

  • hackathons โ€” url, title, org, state, dates, registrations, prize, themes, last_scraped_at
  • participants โ€” (hackathon_url, username) PK + enrichment fields + first_seen_at, last_seen_at, event_emitted_at

Customer.io events

Event name: devpost_hackathon. Email = Customer.io user ID. Payload: hackathon_url, hackathon_title, username, name, specialty, profile_url, github_url, linkedin_url.

Email templates in emails/ use {{customer.first_name}} and {{event.*}} Liquid variables.


signalforge-github-forks โ€” GitHub fork mining

Pull every fork owner from a repo, enrich with public emails, store in the same SQLite DB.

# Built-in presets
signalforge-github-forks --preset mem0 --emit-events
signalforge-github-forks --preset supermemory --no-email

# Any repo
signalforge-github-forks --repo owner/repo --limit 1000 --mode first_n
Flag Default Description
--preset โ€” mem0 or supermemory shorthand
--repo OWNER/REPO โ€” Any public GitHub repo
--limit N 2000 Max forks to process
--mode preset-dependent top_by_pushed or first_n
--no-email off Skip email lookup
--emit-events off Emit Customer.io events
--force-email off Re-enrich all forks, not just new ones

signalforge-rb2b โ€” RB2B visitor import

Load RB2B daily export CSVs and fire visited_site events for identified visitors.

# Import and emit new identified visitors
signalforge-rb2b daily_2026-03-*.csv --emit-events

# Just drain the unsent queue
signalforge-rb2b --emit-unsent

Requirements

  • Python 3.11+
  • uv (for local dev)
  • Backboard API key (for signalforge keyword search only)

Development

uv run python -m devpost_scraper.cli "ai agents" --output out.csv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signalforge_cli-0.4.2.tar.gz (24.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

signalforge_cli-0.4.2-py3-none-any.whl (28.3 kB view details)

Uploaded Python 3

File details

Details for the file signalforge_cli-0.4.2.tar.gz.

File metadata

  • Download URL: signalforge_cli-0.4.2.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for signalforge_cli-0.4.2.tar.gz
Algorithm Hash digest
SHA256 01465b422b7cf6cc2076319fb2482688a999445758ef919027fa5302536a2629
MD5 16326ca5f1526826338c53a22fd1c28e
BLAKE2b-256 54b701575e01c1f96d5126cd6f5e3910b18e081e2a4508f00b0f2ec488f8ee72

See more details on using hashes here.

File details

Details for the file signalforge_cli-0.4.2-py3-none-any.whl.

File metadata

File hashes

Hashes for signalforge_cli-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 385c302c2e339babea13499d22c86aa38074f5b187be0c60de9ba26c7cc765dd
MD5 58ddde70c4e3c733d97d7ea7a94c72be
BLAKE2b-256 2f7e809f9ef176aea884790a3f7531c2c387d328a4388be6ddce425dcf1c5739

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page