Skip to main content

A polite scraper for NYC tech events from GarysGuide

Project description

garys_nyc_events

A polite, dependency-light Python library for extracting NYC tech events from GarysGuide.

Features

  • Scrapes the GarysGuide events page
  • Polite throttling and a browser-like User-Agent
  • Extracts title, date, price (including "FREE"), and URL
  • Includes a newsletter HTML fallback parser

Install

pip install garys_nyc_events

PyPI + Poetry Setup

poetry install

Publish to PyPI

poetry build
poetry publish

Releases (GitHub → PyPI)

This repo includes a GitHub Actions workflow that publishes to PyPI when you push a version tag.

  1. Add a GitHub repo secret named PYPI_API_TOKEN with your PyPI API token.
  2. Ensure tool.poetry.version in pyproject.toml is set.
  3. Create and push a matching tag:
git tag v0.1.1
git push origin v0.1.1

The workflow verifies the tag matches v{version}, runs tests, builds, checks the dist, then publishes.

Publish to TestPyPI

poetry config repositories.testpypi https://test.pypi.org/legacy/
poetry publish -r testpypi

Configure PyPI Token (Recommended)

poetry config pypi-token.pypi YOUR_TOKEN

Usage

from garys_nyc_events import (
	GarysGuideScraper,
	get_events,
	get_events_ai_json,
	get_events_safe,
	parse_newsletter_html,
)

# Live scrape (polite delay included)
events = get_events()

# Safe mode: returns [] instead of raising on network errors
events = get_events_safe()

# JSON output of AI-related events (filtered by title)
ai_events_json = get_events_ai_json()
print(ai_events_json)

# Parse raw HTML from a newsletter export
raw_html = "<html>...your email html...</html>"
newsletter_events = parse_newsletter_html(raw_html)

# Class-based usage (custom delay)
scraper = GarysGuideScraper(delay_seconds=2.0)
events = scraper.get_events()

How the Scraper Works

  • Selects anchors where href contains /events/
  • Walks up to the nearest tr, li, div, or article to capture context
  • If the container is a table row, it uses the first cell for date and the last cell for price
  • Extracts prices using $ amounts or FREE
  • Normalizes relative URLs to full URLs

Notes

  • The public API returns a list of dictionaries with keys: title, date, price, url, source.
  • The scraper is polite by default; adjust delay_seconds if needed.
  • Live E2E test is disabled by default. Run with RUN_E2E=1 to enable.

Development

poetry install
poetry run pytest

Verify Build Artifacts

./scripts/verify_build.sh

Contributing

See CONTRIBUTING.md.

Changelog

See CHANGELOG.md.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

garys_nyc_events-0.1.1.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

garys_nyc_events-0.1.1-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file garys_nyc_events-0.1.1.tar.gz.

File metadata

  • Download URL: garys_nyc_events-0.1.1.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for garys_nyc_events-0.1.1.tar.gz
Algorithm Hash digest
SHA256 840d6ccbfa75ddf62fc0c73e8af27e1afd32849c9787f1745f01429d9615893a
MD5 e95313ff900af8c3a68570b493c25294
BLAKE2b-256 1f365963ae719d7a68135080da0e2bf513bf2f783effcba77427882312f13048

See more details on using hashes here.

File details

Details for the file garys_nyc_events-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for garys_nyc_events-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 94f51b28c85dd9fe26464c7d4e3fbf445a5c83d18b0617291a0f9a3827e04244
MD5 e4255fc0252a89b7cbfb34acf986ac27
BLAKE2b-256 5228a3824e85e0718318d08d4131db991eca479022ed3bd0bfd6fd2703a912c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page