A library for pulling in and normalising sports stats.
Project description
sportsball
A library for pulling in and normalising sports stats.
Dependencies :globe_with_meridians:
Python 3.11.6:
- pandas
- requests
- requests-cache
- python-dateutil
- tqdm
- beautifulsoup
- openpyxl
- joblib
- pyarrow
- ipython
- pytz
- python-dotenv
- geocoder
- retry-requests
- timezonefinder
- nba_api
- pydantic
- flatten_json
- pygooglenews
- extruct
- wikipedia-api
- tweepy
- pytest-is-running
- PySocks
- func-timeout
- tenacity
- random_user_agent
- wayback
- cryptography
- feedparser
- dateparser
- playwright
- cchardet
- lxml
- gender-guesser
- scrapesession
- pyhigh
- datefinder
Raison D'être :thought_balloon:
sportsball aims to be a library for pulling in historical information about previous sporting games in a standardised fashion for easy data processing.
The models it uses are designed to be used for many different types of sports.
The supported leagues are:
- 🏉 AFL
- 🏉 AFLW
- 🎾 ATP
- ⚽ BUNDESLIGA
- ⚽ EPL
- ⚽ FIFA
- 🐎 HKJC
- 🏏 IPL
- ⚽ LALIGA
- ⚾ MLB
- 🏀 NBA
- 🏀 NCAAB
- 🏀 NCAABW
- 🏈 NCAAF
- 🏈 NFL
- 🏒 NHL
- 🏀 WNBA
- 🎾 WTA
Architecture :triangular_ruler:
sportsball is an object orientated library. The entities are organised like so:
- Game: A game within a season.
- Team: The team within the game. Note that in games with individual players a team exists as a wrapper.
- Player: A player within the team.
- Address: The address information of a players birth.
- Owner: The owner of the player.
- Venue: The college of the player.
- Odds: The odds for the team to win the game.
- Bookie: The bookie publishing the odds.
- News: News about the team the day before the game.
- Social: Social posts from the team the day before the game.
- Coach: A coach for the team.
- Player: A player within the team.
- Venue: The venue the game was played in.
- Address: The address information of a venue.
- Weather: The weather at the address.
- Address: The address information of a venue.
- Dividend: The dividends the game pays out.
- Umpire: The umpires adjudicating the game.
- Team: The team within the game. Note that in games with individual players a team exists as a wrapper.
Caching
This library uses very aggressive caching due to the large data requirements. If the requests are about a recent game (generally in the last 7 days) the caching is bypassed. The caching is as follows:
- A joblib disk cache that caches calls to pydantic model creation functions. This changes on every version update to keep the models in sync. This is the fastest cache.
- A requests cache backed by sqlite that caches requests forever.
- An attempt to find the response is made to the wayback machine, and used if found.
It's very recommended that the user uses proxies defined in the PROXIES environment variable. The more proxies the easier it is to collect data.
Installation :inbox_tray:
This is a python package hosted on pypi, so to install simply run the following command:
pip install sportsball
or install using this local repository:
python setup.py install --old-and-unmanageable
Usage example :eyes:
There are many different ways of using sportsball, but we generally recommend the CLI.
CLI
To fetch a dataframe containing information about a league, you can use the following CLI:
sportsball --league=nfl -
The final argument denotes the file to write to, in this case - is stdout.
Python
To pull a dataframe containing all the information for a particular league, the following example can be used:
from sportsball import sportsball as spb
ball = spb.SportsBall()
league = ball.league(spb.League.AFL)
df = league.to_frame()
This results in a dataframe where each game is represented by all its features.
Environment
If you wish to use the providers that require API keys, you can create a .env file with the following variables inside it:
GOOGLE_API_KEY=APIKEY
GRIBSTREAM_API_KEY=APIKEY
X_API_KEY=APIKEY
X_API_SECRET_KEY=APISECRETKEY
X_ACCESS_TOKEN=ACCESSTOKEN
X_ACCESS_TOKEN_SECRET=ACCESSTOKENSECRET
PROXIES=CSVPROXIESLIST
License :memo:
The project is available under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file sportsball-0.34.148.tar.gz.
File metadata
- Download URL: sportsball-0.34.148.tar.gz
- Upload date:
- Size: 746.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eac2ffa9580dd09f78e5492b7679838d374c4204823cea544ebec585ae163de2
|
|
| MD5 |
31b141a9da39207ab987c1904958ba23
|
|
| BLAKE2b-256 |
6f628c6b2f8e23dae26e0de35ae4fb67bde40f7137bfca6e73d84f4d95566bc5
|