Skip to main content

A polite scraper for NYC tech events from GarysGuide

Project description

garys_nyc_events

A polite, dependency-light Python library for extracting NYC tech events from GarysGuide.

Features

  • Scrapes the GarysGuide events page
  • Polite throttling and a browser-like User-Agent
  • Extracts title, date, price (including "FREE"), and URL
  • Includes a newsletter HTML fallback parser

Install

pip install garys_nyc_events

PyPI + Poetry Setup

poetry install

Publish to PyPI

poetry build
poetry publish

Releases (GitHub → PyPI)

This repo includes a GitHub Actions workflow that publishes to PyPI when you push a version tag.

  1. Add a GitHub repo secret named PYPI_API_TOKEN with your PyPI API token.
  2. Ensure tool.poetry.version in pyproject.toml is set.
  3. Create and push a matching tag:
git tag v0.1.0
git push origin v0.1.0

The workflow verifies the tag matches v{version}, runs tests, builds, checks the dist, then publishes.

Publish to TestPyPI

poetry config repositories.testpypi https://test.pypi.org/legacy/
poetry publish -r testpypi

Configure PyPI Token (Recommended)

poetry config pypi-token.pypi YOUR_TOKEN

Usage

from garys_nyc_events import (
	GarysGuideScraper,
	get_events,
	get_events_ai_json,
	get_events_safe,
	parse_newsletter_html,
)

# Live scrape (polite delay included)
events = get_events()

# Safe mode: returns [] instead of raising on network errors
events = get_events_safe()

# JSON output of AI-related events (filtered by title)
ai_events_json = get_events_ai_json()
print(ai_events_json)

# Parse raw HTML from a newsletter export
raw_html = "<html>...your email html...</html>"
newsletter_events = parse_newsletter_html(raw_html)

# Class-based usage (custom delay)
scraper = GarysGuideScraper(delay_seconds=2.0)
events = scraper.get_events()

How the Scraper Works

  • Selects anchors where href contains /events/
  • Walks up to the nearest tr, li, div, or article to capture context
  • If the container is a table row, it uses the first cell for date and the last cell for price
  • Extracts prices using $ amounts or FREE
  • Normalizes relative URLs to full URLs

Notes

  • The public API returns a list of dictionaries with keys: title, date, price, url, source.
  • The scraper is polite by default; adjust delay_seconds if needed.
  • Live E2E test is disabled by default. Run with RUN_E2E=1 to enable.

Development

poetry install
poetry run pytest

Verify Build Artifacts

./scripts/verify_build.sh

Contributing

See CONTRIBUTING.md.

Changelog

See CHANGELOG.md.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

garys_nyc_events-0.1.0.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

garys_nyc_events-0.1.0-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file garys_nyc_events-0.1.0.tar.gz.

File metadata

  • Download URL: garys_nyc_events-0.1.0.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for garys_nyc_events-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8a2111dec40325942383cf02ae73da6d134518b00bbe67523da7253aebc1d788
MD5 4c1c186503a64eea9c1a61dfcfb662d2
BLAKE2b-256 bdd8e8f4c00550c6e8d83017e03898e64ea30025178926fb0bb21775e6bd2f27

See more details on using hashes here.

File details

Details for the file garys_nyc_events-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for garys_nyc_events-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f5d7e78c7f5203593420e462fd81fd5ce90d44beca5c5a65f50a3816731d25f1
MD5 727b489e605013869420c22f7cd7746b
BLAKE2b-256 21e5a8a7cdba9620410f13939a922403ec148ad9bd7f76be1f41939e580f3999

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page