CLI for extracting emails and sending customer.io events
Project description
_____ _ ________
/ ___/(_)___ _____ ____ _/ / ____/___ _________ ____
\__ \/ / __ `/ __ \/ __ `/ / /_ / __ \/ ___/ __ `/ _ \
___/ / / /_/ / / / / /_/ / / __/ / /_/ / / / /_/ / __/
/____/_/\__, /_/ /_/\__,_/_/_/ \____/_/ \__, /\___/
/____/ /____/
Mine developer signals. Enrich with emails. Fire into your CRM.
๐ฃ PyPI v0.5.0 ย |ย ๐ Python 3.11+ ย |ย ๐ MIT ย |ย โก Built on Backboard.io ย |ย ๐ฌ Customer.io ย |ย ๐ Devpost
SignalForge scrapes Devpost hackathons, GitHub forks, and RB2B visitor exports โ enriches every lead with real emails โ then fires them straight into Customer.io. One command. Hundreds of warm leads.
What's inside
| Command | What it does |
|---|---|
signalforge |
Search Devpost by keyword โ enrich with emails โ export CSV |
signalforge-participants |
Scrape one hackathon's participants โ CSV |
signalforge-harvest |
Walk the full hackathon listing โ SQLite โ delta Customer.io events |
signalforge-github-forks |
Mine fork owners from any GitHub repo โ emails โ SQLite |
signalforge-rb2b |
Import RB2B visitor CSVs โ SQLite โ visited_site events |
signalforge-auto |
Full daily scrape: RB2B today + open hackathons + all tracked GitHub repos (no emit) |
Install
pip install signalforge-cli
Or with uv (recommended for local dev):
uv sync
30-second quickstart
# 1. Copy env and fill in your keys
cp .env.example .env
# 2. Search Devpost and get a CSV of leads with emails
signalforge "ai agents" -o leads.csv
# 3. Scrape all open hackathons, enrich new participants, emit to Customer.io
signalforge-harvest --emit-events
How it works
Devpost / GitHub / RB2B
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ fast scan / search โ (no enrichment yet โ just IDs)
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ SQLite upsert โ detect NEW rows only (delta)
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ email enrichment โ GitHub API โ profile walking โ regex
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ Customer.io emit โ identify + track (once per lead, ever)
โโโโโโโโโโโโโโโโโโโโโโโ
Delta logic: on re-runs, only new participants get the expensive enrichment. Already-emitted leads are never re-fired. Safe to run on a cron.
Environment
Copy .env.example โ .env:
| Variable | Required for | Notes |
|---|---|---|
BACKBOARD_API_KEY |
signalforge |
Backboard account key |
DEVPOST_ASSISTANT_ID |
auto | Saved on first run, reused after |
DEVPOST_SESSION |
signalforge-participants, signalforge-harvest |
_devpost cookie from browser DevTools |
GITHUB_TOKEN |
optional | PAT for 5 000 req/hr vs 60. Zero scopes needed |
CUSTOMERIO_SITE_ID |
--emit-events |
Customer.io Track API |
CUSTOMERIO_API_KEY |
--emit-events |
Customer.io Track API |
Commands
signalforge โ Devpost project search
Search Devpost by keyword, enrich each hit with the detail page + author email, export CSV.
signalforge "ai agents" --output results.csv
signalforge "climate tech" "developer tools" -o results.csv
signalforge-participants โ single hackathon
Scrape one hackathon's full participant list.
# First time โ hand over your session cookie
signalforge-participants "https://authorizedtoact.devpost.com/participants" \
--jwt "<_devpost cookie value>" -o participants.csv
# Subsequent runs โ reuses saved session
signalforge-participants "https://authorizedtoact.devpost.com/participants" -o out.csv
# Fast mode (skip email enrichment)
signalforge-participants "https://..." --no-email -o out.csv
# Enrich + emit to Customer.io in one shot
signalforge-participants "https://..." --emit-events -o out.csv
signalforge-harvest โ full automated pipeline
Walks the Devpost hackathon listing, scrapes every participant, stores in SQLite, and emits Customer.io events for net-new leads.
# Standard run โ open hackathons, 3 pages, enrich + emit
signalforge-harvest --emit-events
# Bulk first scrape without enrichment (fast)
signalforge-harvest --pages 5 --no-email
# Catch up: emit all unsent leads already in the DB (no scraping needed)
signalforge-harvest --emit-unsent
# Re-scan for new joiners, enrich + emit delta
signalforge-harvest --rescrape --emit-events
# Include ended hackathons too
signalforge-harvest --status open --status ended --pages 5
# Export everyone who has a LinkedIn URL but no email yet (CSV for manual outreach)
signalforge-harvest --export-linkedin -o linkedin_leads.csv
Flags
| Flag | Default | Description |
|---|---|---|
--pages N |
3 |
Hackathon listing pages (9 hackathons each) |
--hackathons N |
0 (all) |
Stop after the first N hackathons |
--max-participants N |
0 (unlimited) |
Cap per hackathon |
--jwt TOKEN |
.env |
Devpost _devpost session cookie |
--db PATH |
devpost_harvest.db |
SQLite path |
--status |
open |
open / ended / upcoming (repeatable) |
--no-email |
off | Skip enrichment entirely |
--emit-events |
off | Emit Customer.io events for delta participants |
--emit-unsent |
off | Just emit โ no scraping |
--rescrape |
off | Re-scrape already-seen hackathons |
--export-linkedin |
off | Export CSV of all leads with LinkedIn but no email |
--output / -o PATH |
stdout | Output path for --export-linkedin |
SQLite schema
hackathonsโ url, title, org, state, dates, registrations, prize, themes,last_scraped_atparticipantsโ(hackathon_url, username)PK + enrichment fields +first_seen_at,last_seen_at,event_emitted_at
Customer.io events
Event name: devpost_hackathon. Email = Customer.io user ID.
Payload: hackathon_url, hackathon_title, username, name, specialty, profile_url, github_url, linkedin_url.
Email templates in emails/ use {{customer.first_name}} and {{event.*}} Liquid variables.
signalforge-github-forks โ GitHub fork mining
Pull every fork owner from a repo, enrich with public emails, store in the same SQLite DB.
# Built-in presets
signalforge-github-forks --preset mem0 --emit-events
signalforge-github-forks --preset supermemory --no-email
# Any repo
signalforge-github-forks --repo owner/repo --limit 1000 --mode first_n
| Flag | Default | Description |
|---|---|---|
--preset |
โ | mem0 or supermemory shorthand |
--repo OWNER/REPO |
โ | Any public GitHub repo |
--limit N |
2000 |
Max forks to process |
--mode |
preset-dependent | top_by_pushed or first_n |
--no-email |
off | Skip email lookup |
--emit-events |
off | Emit Customer.io events |
--force-email |
off | Re-enrich all forks, not just new ones |
signalforge-auto โ full daily scrape
Runs all three scrapers in sequence, then exits without emitting events. Use this as your daily cron job; fire --emit-unsent on each source afterwards.
What it runs:
signalforge-rb2b --fetch-date TODAYโ pulls today's RB2B visitor exportsignalforge-harvest --status open --pages 100โ walks all open Devpost hackathonssignalforge-github-forks --repo OWNER/REPO --limit 5000โ for every repo already tracked in the DB
# Standard daily run
signalforge-auto
# Custom date / page depth
signalforge-auto --fetch-date 2026-03-31 --pages 50 --fork-limit 2000
# Skip email enrichment (much faster, enrich later with --force-email)
signalforge-auto --no-email
# Then flush the queue when ready
signalforge-harvest --emit-unsent
signalforge-github-forks --emit-unsent
signalforge-rb2b --emit-unsent
| Flag | Default | Description |
|---|---|---|
--db PATH |
devpost_harvest.db |
SQLite path |
--pages N |
100 |
Devpost listing pages |
--fork-limit N |
5000 |
Max forks per GitHub repo |
--fetch-date YYYY-MM-DD |
today | RB2B export date |
--no-email |
off | Skip email enrichment |
--jwt TOKEN |
.env |
Devpost session cookie |
signalforge-rb2b โ RB2B visitor import
Load RB2B daily export CSVs and fire visited_site events for identified visitors.
# Import and emit new identified visitors
signalforge-rb2b daily_2026-03-*.csv --emit-events
# Just drain the unsent queue
signalforge-rb2b --emit-unsent
Requirements
Development
uv run python -m devpost_scraper.cli "ai agents" --output out.csv
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file signalforge_cli-0.5.0.tar.gz.
File metadata
- Download URL: signalforge_cli-0.5.0.tar.gz
- Upload date:
- Size: 36.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b79f8aa99c33b5baba634cec9f6cfbf75ca28ecd7fdc7b8e3a1290452b1f50a
|
|
| MD5 |
65a57875730378a9fa10263fff6dcfc5
|
|
| BLAKE2b-256 |
be635f99acd7c6fac99f10d78a1e5b15b8f158589c92e6612bfa4e06f11779da
|
File details
Details for the file signalforge_cli-0.5.0-py3-none-any.whl.
File metadata
- Download URL: signalforge_cli-0.5.0-py3-none-any.whl
- Upload date:
- Size: 36.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60c1c1b8c3af6b9ffd8935b0b35b1fd22e1372203f58f228cb40053396ce182a
|
|
| MD5 |
2e19ae5a4cd9077ad44e20956b99cdc2
|
|
| BLAKE2b-256 |
1d1fab7a47262dfa36ed843412512a88c986de7293439f7957fb7521d190d18c
|