Skip to main content

CLI for extracting emails and sending customer.io events

Project description

SignalForge

   _____ _                   ________                    
  / ___/(_)___ _____  ____ _/ / ____/___  _________ ____ 
  \__ \/ / __ `/ __ \/ __ `/ / /_  / __ \/ ___/ __ `/ _ \
 ___/ / / /_/ / / / / /_/ / / __/ / /_/ / /  / /_/ /  __/
/____/_/\__, /_/ /_/\__,_/_/_/    \____/_/   \__, /\___/ 
       /____/                               /____/       

SignalForge is a CLI toolkit for mining developer signals from public sources, enriching them with emails, storing results in SQLite, and emitting Customer.io events.

Commands:

Command Purpose
signalforge Search Devpost projects by keyword, enrich with emails, export CSV
signalforge-participants Scrape a single hackathon's participant list, export CSV
signalforge-harvest Walk the hackathon listing, scrape all participants, store in SQLite, emit delta events
signalforge-github-forks Mine GitHub fork owners and optionally enrich with emails
signalforge-rb2b Import RB2B visitor CSVs, store in SQLite, emit visited_site events

Requirements

  • Python 3.11+
  • uv
  • A Backboard API key (for signalforge only)

Install

uv sync

Environment

Copy .env.example.env and fill in:

Variable Required for Notes
BACKBOARD_API_KEY signalforge Backboard account key
DEVPOST_ASSISTANT_ID auto Persisted on first run
DEVPOST_SESSION signalforge-participants, signalforge-harvest _devpost cookie from browser DevTools
GITHUB_TOKEN optional GitHub PAT for 5000 req/hr (vs 60). No scopes needed
CUSTOMERIO_SITE_ID --emit-events Customer.io Track API
CUSTOMERIO_API_KEY --emit-events Customer.io Track API

signalforge

Search Devpost projects by keyword, enrich each with detail page + author email, export CSV.

uv run signalforge "ai agents" --output results.csv
uv run signalforge "climate tech" "developer tools" -o results.csv

# Or via start.sh
./start.sh "ai agents" --output results.csv

signalforge-participants

Scrape a single hackathon's participant list and export to CSV.

# First time — pass session cookie
uv run signalforge-participants "https://authorizedtoact.devpost.com/participants" \
  --jwt "<_devpost cookie value>" -o participants.csv

# Reuse saved session from .env
uv run signalforge-participants "https://authorizedtoact.devpost.com/participants" -o out.csv

# Skip email enrichment
uv run signalforge-participants "https://..." --no-email -o out.csv

# Emit Customer.io events after scrape
uv run signalforge-participants "https://..." --emit-events -o out.csv

signalforge-harvest

Automated pipeline: walk the hackathon listing → scrape participants → store in SQLite → emit Customer.io events for delta (new) participants.

Basic usage

# Scrape 3 pages of open hackathons (27 hackathons), enrich new participants, emit events
uv run signalforge-harvest --emit-events

# Fast first run — scrape without email enrichment
uv run signalforge-harvest --no-email

Flags

Flag Default Description
--pages N 3 Number of hackathon listing pages to fetch (9 per page)
--hackathons N 0 (all) Only process the first N hackathons from the listing
--jwt TOKEN .env Devpost _devpost session cookie
--db PATH devpost_harvest.db SQLite database path
--status {open,ended,upcoming} open Hackathon status filter (repeatable)
--max-participants N 0 (unlimited) Cap participants scraped per hackathon
--no-email off Skip email enrichment entirely (even for new participants)
--emit-events off Emit Customer.io events for unemitted participants during scrape
--emit-unsent off Skip scraping — just emit events for all unsent participants in DB
--rescrape off Re-scrape hackathons already scraped in a previous run

How it works

Phase 1: Discover hackathons
  GET /api/hackathons?status[]=open → paginated JSON listing

Phase 2: Per hackathon
  2a. Fast scan — scrape all participant pages (no enrichment, ~1 req per 20 participants)
  2b. Upsert into SQLite → detect delta (new participants not previously in DB)
  2c. Email-enrich delta only — GitHub API + link walking (skipped with --no-email)
  2d. Emit Customer.io events for unemitted participants (only with --emit-events)

Delta logic

On subsequent runs, the fast scan re-fetches participant lists but only new participants (not previously in SQLite) get the expensive email enrichment. Already-emitted participants are never re-emitted. This makes re-runs fast and safe to repeat.

Common workflows

# Initial bulk scrape (no events yet)
uv run signalforge-harvest --pages 5

# Emit all unsent events from the DB (no scraping, no JWT needed)
uv run signalforge-harvest --emit-unsent

# Quick delta check on first hackathon only
uv run signalforge-harvest --hackathons 1 --rescrape --emit-events

# Re-scan all hackathons for new participants, enrich + emit
uv run signalforge-harvest --rescrape --emit-events

# Include ended hackathons
uv run signalforge-harvest --status open --status ended

# Fast delta scan (skip email enrichment for new participants too)
uv run signalforge-harvest --rescrape --no-email

SQLite schema

The database (devpost_harvest.db) has two tables:

  • hackathons — id, url, title, org, state, dates, registrations, prize, themes. last_scraped_at is set after participants are scraped.
  • participants — (hackathon_url, username) primary key, enrichment fields, first_seen_at, last_seen_at, event_emitted_at.

Customer.io events

Event name: devpost_hackathon. Uses participant email as the Customer.io user ID.

Event data: hackathon_url, hackathon_title, username, name, specialty, profile_url, github_url, linkedin_url.

Email templates in emails/ use {{customer.first_name}} and {{event.*}} Liquid variables.


signalforge-github-forks

Mine fork owners and enrich with emails (optional), stored in the same SQLite DB under a synthetic hackathon_url like github:forks:owner/repo.

# Presets
uv run signalforge-github-forks --preset mem0 --emit-events
uv run signalforge-github-forks --preset supermemory --no-email

# Custom repo
uv run signalforge-github-forks --repo owner/repo --limit 1000 --mode first_n

signalforge-rb2b

Import RB2B visitor exports into SQLite and emit Customer.io visited_site events for identified visitors.

# Import CSV(s) and emit events for newly added identified visitors
uv run signalforge-rb2b daily_2026-03-*.csv --emit-events

# Emit any unsent identified visitors from the DB
uv run signalforge-rb2b --emit-unsent

Development

uv run python -m devpost_scraper.cli "ai agents" --output out.csv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signalforge_cli-0.4.1.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

signalforge_cli-0.4.1-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file signalforge_cli-0.4.1.tar.gz.

File metadata

  • Download URL: signalforge_cli-0.4.1.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for signalforge_cli-0.4.1.tar.gz
Algorithm Hash digest
SHA256 35c1da2ad39f7f22ba633994898642a4d0849433e6e10837754524bb4aab77c1
MD5 99f3e1de200b2dc1f7934c9d30303e4b
BLAKE2b-256 96856c18b5ef64199c305b3eca294ef78048965b7d1ea2a3f9d8ff68432399c3

See more details on using hashes here.

File details

Details for the file signalforge_cli-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for signalforge_cli-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 97b166d9de421902ccc3ed5f1099ecbda6ae340e8922c108d0260e59233244aa
MD5 f4a95440415490d12c6ef6dcdcf6f1d9
BLAKE2b-256 9e06452fc908538912164c515f4a0a9743295fd4ea21f732d3ba797995f6bb1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page