Hybrid Python/Node.js web scraper for Major League Baseball (MLB) data.
Project description
vigorish
vigorish
is a hybrid Python/Node.js application that scrapes MLB data from mlb.com, brooksbaseball.net and baseball-reference.com.
My goal is to capture as much data as possible — ranging from PitchFX measurements at the most granular level to play-by-play data (play descriptions, substitutions, manager challenges, etc) and individual player pitch/bat stats at the highest level.
Requirements
- Python 3.6+
- Node.js 10+ (Tested with Node.js 11-13)
- Xvfb
- AWS account (optional but recommended, used to store scraped data in S3)
Project Documentation
For a step-by-step install guide and instructions for configuring/using vigorish
, please visit the link below:
Vigorish: Hybrid Python/Node.Js Web Scraper
Credits
vigorish
either relies on the following projects listed below directly or as a dev dependency. It would not have been possible for me to create vigorish
without these projects, thanks to all of the creators/maintainers for making these available (projects are listed alphabetically):
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vigorish-0.7.0.tar.gz
.
File metadata
- Download URL: vigorish-0.7.0.tar.gz
- Upload date:
- Size: 2.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39d405f810e7dabbf451af0832f0d95c0bd49c606618491b610988eee7c34cd8 |
|
MD5 | 407b53314f558a2ed097c60b972ef088 |
|
BLAKE2b-256 | e30582491ed935fdd412331e2e20ee5ca5ce4929c15e353c3f36e03b8189bb27 |
File details
Details for the file vigorish-0.7.0-py3-none-any.whl
.
File metadata
- Download URL: vigorish-0.7.0-py3-none-any.whl
- Upload date:
- Size: 2.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2cdd0139ffb52b45f5f67a610c1fa922a0c3e2d4c569e1cd901fff71c59f07a5 |
|
MD5 | 05752b637244b00f8177393febc87fff |
|
BLAKE2b-256 | c96c4c77fc210d476893e0ecd3d2e2d86ba4c4b7d2178f9fce3feae7891d6b37 |