CLI for extracting Devpost data with Backboard tool-calling and exporting results to CSV.
Project description
Devpost Scraper
CLI toolkit for extracting Devpost hackathon data, enriching participants with emails, storing results in SQLite, and emitting Customer.io events.
Three commands:
| Command | Purpose |
|---|---|
devpost-scraper |
Search Devpost projects by keyword, enrich with emails, export CSV |
devpost-participants |
Scrape a single hackathon's participant list, export CSV |
devpost-harvest |
Walk the hackathon listing, scrape all participants, store in SQLite, emit delta events |
Requirements
Install
uv sync
Environment
Copy .env.example → .env and fill in:
| Variable | Required for | Notes |
|---|---|---|
BACKBOARD_API_KEY |
devpost-scraper |
Backboard account key |
DEVPOST_ASSISTANT_ID |
auto | Persisted on first run |
DEVPOST_SESSION |
devpost-participants, devpost-harvest |
_devpost cookie from browser DevTools |
GITHUB_TOKEN |
optional | GitHub PAT for 5000 req/hr (vs 60). No scopes needed |
CUSTOMERIO_SITE_ID |
--emit-events |
Customer.io Track API |
CUSTOMERIO_API_KEY |
--emit-events |
Customer.io Track API |
devpost-scraper
Search Devpost projects by keyword, enrich each with detail page + author email, export CSV.
uv run devpost-scraper "ai agents" --output results.csv
uv run devpost-scraper "climate tech" "developer tools" -o results.csv
# Or via start.sh
./start.sh "ai agents" --output results.csv
devpost-participants
Scrape a single hackathon's participant list and export to CSV.
# First time — pass session cookie
uv run devpost-participants "https://authorizedtoact.devpost.com/participants" \
--jwt "<_devpost cookie value>" -o participants.csv
# Reuse saved session from .env
uv run devpost-participants "https://authorizedtoact.devpost.com/participants" -o out.csv
# Skip email enrichment
uv run devpost-participants "https://..." --no-email -o out.csv
# Emit Customer.io events after scrape
uv run devpost-participants "https://..." --emit-events -o out.csv
devpost-harvest
Automated pipeline: walk the hackathon listing → scrape participants → store in SQLite → emit Customer.io events for delta (new) participants.
Basic usage
# Scrape 3 pages of open hackathons (27 hackathons), enrich new participants, emit events
uv run devpost-harvest --emit-events
# Fast first run — scrape without email enrichment
uv run devpost-harvest --no-email
Flags
| Flag | Default | Description |
|---|---|---|
--pages N |
3 |
Number of hackathon listing pages to fetch (9 per page) |
--hackathons N |
0 (all) |
Only process the first N hackathons from the listing |
--jwt TOKEN |
.env |
Devpost _devpost session cookie |
--db PATH |
devpost_harvest.db |
SQLite database path |
--status {open,ended,upcoming} |
open |
Hackathon status filter (repeatable) |
--max-participants N |
0 (unlimited) |
Cap participants scraped per hackathon |
--no-email |
off | Skip email enrichment entirely (even for new participants) |
--emit-events |
off | Emit Customer.io events for unemitted participants during scrape |
--emit-unsent |
off | Skip scraping — just emit events for all unsent participants in DB |
--rescrape |
off | Re-scrape hackathons already scraped in a previous run |
How it works
Phase 1: Discover hackathons
GET /api/hackathons?status[]=open → paginated JSON listing
Phase 2: Per hackathon
2a. Fast scan — scrape all participant pages (no enrichment, ~1 req per 20 participants)
2b. Upsert into SQLite → detect delta (new participants not previously in DB)
2c. Email-enrich delta only — GitHub API + link walking (skipped with --no-email)
2d. Emit Customer.io events for unemitted participants (only with --emit-events)
Delta logic
On subsequent runs, the fast scan re-fetches participant lists but only new participants (not previously in SQLite) get the expensive email enrichment. Already-emitted participants are never re-emitted. This makes re-runs fast and safe to repeat.
Common workflows
# Initial bulk scrape (no events yet)
uv run devpost-harvest --pages 5
# Emit all unsent events from the DB (no scraping, no JWT needed)
uv run devpost-harvest --emit-unsent
# Quick delta check on first hackathon only
uv run devpost-harvest --hackathons 1 --rescrape --emit-events
# Re-scan all hackathons for new participants, enrich + emit
uv run devpost-harvest --rescrape --emit-events
# Include ended hackathons
uv run devpost-harvest --status open --status ended
# Fast delta scan (skip email enrichment for new participants too)
uv run devpost-harvest --rescrape --no-email
SQLite schema
The database (devpost_harvest.db) has two tables:
hackathons— id, url, title, org, state, dates, registrations, prize, themes.last_scraped_atis set after participants are scraped.participants— (hackathon_url, username) primary key, enrichment fields,first_seen_at,last_seen_at,event_emitted_at.
Customer.io events
Event name: devpost_hackathon. Uses participant email as the Customer.io user ID.
Event data: hackathon_url, hackathon_title, username, name, specialty, profile_url, github_url, linkedin_url.
Email templates in emails/ use {{customer.first_name}} and {{event.*}} Liquid variables.
Development
uv run python -m devpost_scraper.cli "ai agents" --output out.csv
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file devpost_scraper-0.3.0.tar.gz.
File metadata
- Download URL: devpost_scraper-0.3.0.tar.gz
- Upload date:
- Size: 18.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10563140ec7cc50b1d702e895e888e54ddd4232fce9d94b4fcd297d0116438d7
|
|
| MD5 |
3f9282c49febb75000132113b09e8a0f
|
|
| BLAKE2b-256 |
0e7a23ea56b84a78d7d82db6d00aef7a711089356a96124b7ed068a2d09b7b3b
|
File details
Details for the file devpost_scraper-0.3.0-py3-none-any.whl.
File metadata
- Download URL: devpost_scraper-0.3.0-py3-none-any.whl
- Upload date:
- Size: 21.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad1af87254081bd49889060cdc48cd47d237c25f7b8d2667a720d5c109e4b38d
|
|
| MD5 |
ac60bd68bef1c096d1a54e043fc25e0d
|
|
| BLAKE2b-256 |
63fffce410ab55012e489c87abf80eafb1d9381ba52fa7ebe1098c30fb327869
|