A polite scraper for NYC tech events from GarysGuide
Project description
garys_nyc_events
A polite, dependency-light Python library for extracting NYC tech events from GarysGuide.
Features
- Scrapes the GarysGuide events page
- Polite throttling and a browser-like User-Agent
- Extracts title, date, price (including "FREE"), and URL
- Includes a newsletter HTML fallback parser
Install
pip install garys_nyc_events
PyPI + Poetry Setup
poetry install
Publish to PyPI
poetry build
poetry publish
Releases (GitHub → PyPI)
This repo includes a GitHub Actions workflow that publishes to PyPI when you push a version tag.
- Add a GitHub repo secret named
PYPI_API_TOKENwith your PyPI API token. - Ensure
tool.poetry.versioninpyproject.tomlis set. - Create and push a matching tag:
git tag v0.1.0
git push origin v0.1.0
The workflow verifies the tag matches v{version}, runs tests, builds, checks the dist, then publishes.
Publish to TestPyPI
poetry config repositories.testpypi https://test.pypi.org/legacy/
poetry publish -r testpypi
Configure PyPI Token (Recommended)
poetry config pypi-token.pypi YOUR_TOKEN
Usage
from garys_nyc_events import (
GarysGuideScraper,
get_events,
get_events_ai_json,
get_events_safe,
parse_newsletter_html,
)
# Live scrape (polite delay included)
events = get_events()
# Safe mode: returns [] instead of raising on network errors
events = get_events_safe()
# JSON output of AI-related events (filtered by title)
ai_events_json = get_events_ai_json()
print(ai_events_json)
# Parse raw HTML from a newsletter export
raw_html = "<html>...your email html...</html>"
newsletter_events = parse_newsletter_html(raw_html)
# Class-based usage (custom delay)
scraper = GarysGuideScraper(delay_seconds=2.0)
events = scraper.get_events()
How the Scraper Works
- Selects anchors where href contains
/events/ - Walks up to the nearest
tr,li,div, orarticleto capture context - If the container is a table row, it uses the first cell for date and the last cell for price
- Extracts prices using
$amounts orFREE - Normalizes relative URLs to full URLs
Notes
- The public API returns a list of dictionaries with keys:
title,date,price,url,source. - The scraper is polite by default; adjust
delay_secondsif needed. - Live E2E test is disabled by default. Run with
RUN_E2E=1to enable.
Development
poetry install
poetry run pytest
Verify Build Artifacts
./scripts/verify_build.sh
Contributing
See CONTRIBUTING.md.
Changelog
See CHANGELOG.md.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file garys_nyc_events-0.1.0.tar.gz.
File metadata
- Download URL: garys_nyc_events-0.1.0.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a2111dec40325942383cf02ae73da6d134518b00bbe67523da7253aebc1d788
|
|
| MD5 |
4c1c186503a64eea9c1a61dfcfb662d2
|
|
| BLAKE2b-256 |
bdd8e8f4c00550c6e8d83017e03898e64ea30025178926fb0bb21775e6bd2f27
|
File details
Details for the file garys_nyc_events-0.1.0-py3-none-any.whl.
File metadata
- Download URL: garys_nyc_events-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5d7e78c7f5203593420e462fd81fd5ce90d44beca5c5a65f50a3816731d25f1
|
|
| MD5 |
727b489e605013869420c22f7cd7746b
|
|
| BLAKE2b-256 |
21e5a8a7cdba9620410f13939a922403ec148ad9bd7f76be1f41939e580f3999
|