Skip to main content

An API for scraping data from understat.com

Project description

understatAPI

This is a python API for scraping data from understat.com. Understat is a website with football data for 6 european leagues for every season since 2014/15 season. The leagues available are the Premier League, La Liga, Ligue 1, Serie A, Bundesliga and the Russian Premier League.

Installation

To install the package run

pip install understatapi

If you would like to use the package with the latest development changes you can clone this repo and install the package

git clone git@github.com:collinb9/understatAPI understatAPI
cd understatAPI
python setup.py install

Quick Start


NOTE

This package is in very early stages of development and the API is likely to change


The API contains endpoints which reflect the structure of the understat website. Below is a table showing the different endpoints and the pages on understat.com to which they correspond

Endpoint Webpage
UnderstatClient.league https://understat.com/league/<league_name>
UnderstatClient.team https://understat.com/team/<team_name>/
UnderstatClient.player https://understat.com/player/<player_id>
UnderstatClient.match https://understat.com/player/<match_id>

Every function in the public API corresponds to one of tables visible on the understat webpage corresponding to the endpoint to which it belongs. Each function returns a pandas DataFrame with the relevant data. Below are some examples of using the API. Note how some the functions in the league and team endpoints can accept understandable strings as identifiers, but player and match must receive an id number.

from understatapi import UnderstatClient

understat = UnderstatClient()
# get data for every player playing in the Premier League in 2019/20
league_player_data = understat.league(league="EPL").get_player_data(season="2019")
# Get the name and id of the player with the highest xG this season
# First we need to change the type of the 'xG' column, by default it is a string
league_player_data["xG"] = league_player_data["xG"].astype(float)
league_player_data = league_player_data.sort_values(by="xG", ascending=False)
player_id, player_name = league_player_data.iloc[0][["id", "player_name"]].values
# Get data for every shot this player has taken in a league match (for all seasons)
player_shot_data = understat.player(player=player_id).get_shot_data()
from understatapi import UnderstatClient

understat = UnderstatClient()
# get data for every league match involving Manchester United
team_match_data = understat.team(team="Manchester_United").get_match_data(season="2019")
# get the id for the first match of the season
match_id = match_data.iloc[0]["id"]
# get the rosters for the both teams in that match
roster_data = understat.match(match=match_id).get_roster_data()

You can also use the UnderstatClient class as a context manager which persists some information about the session between request and closes the session after it has been used. This is the recommended way to interact with the API.

from understatapi import UnderstatClient

with UnderstatClient() as understat:
    team_match_data = understat.team(team="Manchester_United").get_match_data()

There are some more examples here TODO: Add more examples and link to them For a full API reference, see the documentation TODO: Add link to docs

Contributing

If you find any bugs in the code or have any feature requests, please make an issue and I'll try to address it as soon as possible. If you would like to implement the changes yourself you can make a pull request

  • Clone this repo git clone git@github.com:collinb9/understatAPI
  • Create a branch to work off git checkout -b descriptive_branch_name
  • Make and commit your changes
  • Push your changes git push
  • Come back to this page, and click on Pull requests -> New pull request

Before a pull request can be merged the code will have to pass a number of checks that are run using TravisCI. These checks are

  • Check that the code has been formatted using black
  • Lint the code using pylint
  • Check type annotations using mypy
  • Run the unit tests and check that they have 100% coverage

These checks are in place to ensure a consistent style and quality across the code. To check if the changes you have made will pass these tests run

pip install -r requirements.txt
pip install -r test_requirments.txt
chmod +x ./run_tests.sh
./run_tests.sh

Don't let these tests deter you from making a pull request. Make the changes to introduce the new functionality/bug fix and then I will be happy to help get the code to a stage where it passes the tests.

Versioning

The versioning for this project follows the semantic versioning conventions.

TODO

  • Add functionality for using the search bar on understat
  • Make APIClient a context manager that allows you to persist a session
  • Creat an async API along with the current synchronous one

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

understatapi-0.3.0.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

understatapi-0.3.0-py3.8.egg (55.6 kB view details)

Uploaded Egg

File details

Details for the file understatapi-0.3.0.tar.gz.

File metadata

  • Download URL: understatapi-0.3.0.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.8.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.7

File hashes

Hashes for understatapi-0.3.0.tar.gz
Algorithm Hash digest
SHA256 9962d151d0c72ee02caaacd0cfc0442bd43684ce9b10ef23a040babfdcfd1a7b
MD5 2188208c2300241e3377ea8714da3fe0
BLAKE2b-256 a378ebd0c757650942b833c4e8b30b098c1bf3b36979895afeaa8979a2f26500

See more details on using hashes here.

File details

Details for the file understatapi-0.3.0-py3.8.egg.

File metadata

  • Download URL: understatapi-0.3.0-py3.8.egg
  • Upload date:
  • Size: 55.6 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.8.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.7

File hashes

Hashes for understatapi-0.3.0-py3.8.egg
Algorithm Hash digest
SHA256 bbdf1e235c2dd3d3af84afbfa4ddbd865a2934aea9c147b18f48a9ee662871c7
MD5 e27893580a3f544d74c4aa5e99426592
BLAKE2b-256 4e158e3d1bddb78a89aebd378ab5a0accf86d4d315999264029e1a999c4d10d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page