Skip to main content

CLI for extracting Devpost data with Backboard tool-calling and exporting results to CSV.

Project description

Devpost Scraper

CLI toolkit for extracting Devpost hackathon data, enriching participants with emails, storing results in SQLite, and emitting Customer.io events.

Three commands:

Command Purpose
devpost-scraper Search Devpost projects by keyword, enrich with emails, export CSV
devpost-participants Scrape a single hackathon's participant list, export CSV
devpost-harvest Walk the hackathon listing, scrape all participants, store in SQLite, emit delta events

Requirements

  • Python 3.11+
  • uv
  • A Backboard API key (for devpost-scraper only)

Install

uv sync

Environment

Copy .env.example.env and fill in:

Variable Required for Notes
BACKBOARD_API_KEY devpost-scraper Backboard account key
DEVPOST_ASSISTANT_ID auto Persisted on first run
DEVPOST_SESSION devpost-participants, devpost-harvest _devpost cookie from browser DevTools
GITHUB_TOKEN optional GitHub PAT for 5000 req/hr (vs 60). No scopes needed
CUSTOMERIO_SITE_ID --emit-events Customer.io Track API
CUSTOMERIO_API_KEY --emit-events Customer.io Track API

devpost-scraper

Search Devpost projects by keyword, enrich each with detail page + author email, export CSV.

uv run devpost-scraper "ai agents" --output results.csv
uv run devpost-scraper "climate tech" "developer tools" -o results.csv

# Or via start.sh
./start.sh "ai agents" --output results.csv

devpost-participants

Scrape a single hackathon's participant list and export to CSV.

# First time — pass session cookie
uv run devpost-participants "https://authorizedtoact.devpost.com/participants" \
  --jwt "<_devpost cookie value>" -o participants.csv

# Reuse saved session from .env
uv run devpost-participants "https://authorizedtoact.devpost.com/participants" -o out.csv

# Skip email enrichment
uv run devpost-participants "https://..." --no-email -o out.csv

# Emit Customer.io events after scrape
uv run devpost-participants "https://..." --emit-events -o out.csv

devpost-harvest

Automated pipeline: walk the hackathon listing → scrape participants → store in SQLite → emit Customer.io events for delta (new) participants.

Basic usage

# Scrape 3 pages of open hackathons (27 hackathons), enrich new participants, emit events
uv run devpost-harvest --emit-events

# Fast first run — scrape without email enrichment
uv run devpost-harvest --no-email

Flags

Flag Default Description
--pages N 3 Number of hackathon listing pages to fetch (9 per page)
--hackathons N 0 (all) Only process the first N hackathons from the listing
--jwt TOKEN .env Devpost _devpost session cookie
--db PATH devpost_harvest.db SQLite database path
--status {open,ended,upcoming} open Hackathon status filter (repeatable)
--max-participants N 0 (unlimited) Cap participants scraped per hackathon
--no-email off Skip email enrichment entirely (even for new participants)
--emit-events off Emit Customer.io events for unemitted participants during scrape
--emit-unsent off Skip scraping — just emit events for all unsent participants in DB
--rescrape off Re-scrape hackathons already scraped in a previous run

How it works

Phase 1: Discover hackathons
  GET /api/hackathons?status[]=open → paginated JSON listing

Phase 2: Per hackathon
  2a. Fast scan — scrape all participant pages (no enrichment, ~1 req per 20 participants)
  2b. Upsert into SQLite → detect delta (new participants not previously in DB)
  2c. Email-enrich delta only — GitHub API + link walking (skipped with --no-email)
  2d. Emit Customer.io events for unemitted participants (only with --emit-events)

Delta logic

On subsequent runs, the fast scan re-fetches participant lists but only new participants (not previously in SQLite) get the expensive email enrichment. Already-emitted participants are never re-emitted. This makes re-runs fast and safe to repeat.

Common workflows

# Initial bulk scrape (no events yet)
uv run devpost-harvest --pages 5

# Emit all unsent events from the DB (no scraping, no JWT needed)
uv run devpost-harvest --emit-unsent

# Quick delta check on first hackathon only
uv run devpost-harvest --hackathons 1 --rescrape --emit-events

# Re-scan all hackathons for new participants, enrich + emit
uv run devpost-harvest --rescrape --emit-events

# Include ended hackathons
uv run devpost-harvest --status open --status ended

# Fast delta scan (skip email enrichment for new participants too)
uv run devpost-harvest --rescrape --no-email

SQLite schema

The database (devpost_harvest.db) has two tables:

  • hackathons — id, url, title, org, state, dates, registrations, prize, themes. last_scraped_at is set after participants are scraped.
  • participants — (hackathon_url, username) primary key, enrichment fields, first_seen_at, last_seen_at, event_emitted_at.

Customer.io events

Event name: devpost_hackathon. Uses participant email as the Customer.io user ID.

Event data: hackathon_url, hackathon_title, username, name, specialty, profile_url, github_url, linkedin_url.

Email templates in emails/ use {{customer.first_name}} and {{event.*}} Liquid variables.


Development

uv run python -m devpost_scraper.cli "ai agents" --output out.csv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

devpost_scraper-0.3.0.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

devpost_scraper-0.3.0-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file devpost_scraper-0.3.0.tar.gz.

File metadata

  • Download URL: devpost_scraper-0.3.0.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for devpost_scraper-0.3.0.tar.gz
Algorithm Hash digest
SHA256 10563140ec7cc50b1d702e895e888e54ddd4232fce9d94b4fcd297d0116438d7
MD5 3f9282c49febb75000132113b09e8a0f
BLAKE2b-256 0e7a23ea56b84a78d7d82db6d00aef7a711089356a96124b7ed068a2d09b7b3b

See more details on using hashes here.

File details

Details for the file devpost_scraper-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for devpost_scraper-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ad1af87254081bd49889060cdc48cd47d237c25f7b8d2667a720d5c109e4b38d
MD5 ac60bd68bef1c096d1a54e043fc25e0d
BLAKE2b-256 63fffce410ab55012e489c87abf80eafb1d9381ba52fa7ebe1098c30fb327869

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page