Hybrid Python/Node.js web scraper for Major League Baseball (MLB) data.
Project description
vigorish
vigorish
is a hybrid Python/Node.js application that scrapes MLB data from mlb.com, brooksbaseball.net and baseball-reference.com.
My goal is to capture as much data as possible — ranging from PitchFX measurements at the most granular level to play-by-play data (play descriptions, substitutions, manager challenges, etc) and individual player pitch/bat stats at the highest level.
Requirements
- Python 3.6+
- Node.js 10+ (Tested with Node.js 11-13)
- Xvfb
- AWS account (optional but recommended, used to store scraped data in S3)
Project Documentation
For a step-by-step install guide and instructions for configuring/using vigorish
, please visit the link below:
Vigorish: Hybrid Python/Node.Js Web Scraper
Credits
vigorish
either relies on the following projects listed below directly or as a dev dependency. It would not have been possible for me to create vigorish
without these projects, thanks to all of the creators/maintainers for making these available (projects are listed alphabetically):
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for vigorish-0.3.12-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f360b5329fd69d4c3817d3c6dfc3c8c5e64f7d520b506ee5c0d6466169701e8 |
|
MD5 | 4af6c8a5ae142f87fbfda868f24b209f |
|
BLAKE2b-256 | 7998bba1e31b9dfa466eb1e6265f1031260aac2dd64e4bf67f446eeb362f3f77 |