Skip to main content

llows to fully download a Strava leaderboard and saves it into a CSV file for further statistical analysis

Project description

About

Author

This script was written by Dominik Rappaport. You can contact me via email: dominik@rappaport.at.

Introduction

The Strava Segment Downloader is a Python-based script to download the full leaderboard of a given Strava segment. The data is stored in a CSV file, which is the de facto for exchanging statistical data.

Why do you want to use this script?

Strava does not provide their uses with advanced analysis methods for the segment leaderboards. You cannot apply advanced filters or calculate statistical values like mean, median, or standard deviation. All this can be easily done using software like R or Excel. The CSV file generated by this script can be easily imported into these tools.

Background details

Strava implements a public API to programmatically interact with their data. That would be the most natural way of fetching the leaderboard data. Unfortunately, Strava deprecated the API endpoint to download leaderboards in the year 2020. This link provides you with more information:

https://developers.strava.com/docs/segment-changes/

The background of that controversial decision is described in an article of the well-known cycling blogger DC Rainmaker:

https://www.dcrainmaker.com/2020/05/strava-leaderboard-reduces.html

As a consequence, traditional Screen scraping is the only way to still get that data. As Strava's website make extensive use of JavaScript, libraries like BeautifulSoup are not able to parse the data, and we have to use Selenium to remote control the browser.

Challenges that come with screen scraping

Screen scraping is a fragile method to get data from a website. The website's structure may change anytime and the script may break as a consequence.

In addition, Strava imposes measure to prevent people from doing exactly that. In particular, they apply a rate limit to the number of requests you can make to their website. If you exceed that limit, you will be blocked from accessing the leaderboard data for a certain period of time (typically 24 hours).

To make the script work in such a condition, the user can interrupt the script using Ctrl+C (SIGINT) and continue another day. With the switch --resume it continues where it left off. Obviously that may introduce inconsistencies in the data as the leaderboard may have changed in the meantime.

Furthermore, Strava enforces a rate limit to prevent people from accessing their site too frequently. Again, that works against us. If that rate limit kicks in, the website freezes in state "Loading". The script will typically throw the following error message:

Error: Can't navigate to the next page (Element <a href="/segments/..."> is not clickable at point (856,935) because another element <div class="loading-panel"> obscures it).

Please refer to section Usage with large segments for further details how to deal with such challenges.

How to use the script

Installation

The segment_downloader is distributed as a Python package. Several installation methods are available.

Using pip

Executing pip installs the package in your current Python environment. Global installation was once possible, but modern Linux distributions no longer permit this approach.

pip install segment_downloader

Using pipx or uv

Both pipx and uv enable global tool installation. The package can be installed as follows:

pipx install segment_downloader

or

uv tool install segment_downloader

Usage

Selenium starts the browser with a blank profile, and we therefore have to log in to Strava first. If you use the script more often Strava may temporarily block your account. To avoid this, we use an authentication script that logins to Strava and saves the credentials in a cookie file. This file is then used by the main script to authenticate.

Username and password are stored in environment variables. I decided to use environment variables instead of command line parameters to make it easier to use the script programmatically like in GitHub actions together with the GitHub secrets.

export STRAVA_USERNAME="your_username"
export STRAVA_PASSWORD="your_password"
segment_downloader_authenticate

The script saves the cookies in a file cookies.pkl. As of today, the filename is hardcoded.

Then you can run the main script passing the segment ID as a command line parameter:

segment_downloader 12345678

The script will download the leaderboard of the segment with the ID 12345678. It creates a CSV file with the name leaderboard_12345678.csv.

You can interrupt the script at any time using Ctrl+C as described above the paragraph Challenges that come with screen scraping. If you want to continue where you left off, you can use the --resume switch:

segment_downloader --resume 12345678

Usage with large segments

To work around Strava's rate limit we recommend the following strategy:

  1. Download the segment in smaller chunks. At the moment, the script throws an error message when it gets blocked by Strava and the effort to download was in vain.
  2. You can use Ctrl+C and then the resume option to interrupt and resume the download. Then, possibly interrupt again and resume again etc.
  3. This can be automated with a tool like gtimeout. The following example illustrates how to download a large segment in junks of 10 minutes.
# First download
gtimeout -f -s INT 10m python segment_downloader.py 2891805
# Call the script with the resume option as often as needed by repeating the following line:
gtimeout -f -s INT 10m python segment_downloader.py --resume 2891805

Note: By default gtimeout repeats the signal if the script doesn't exit instantly. As we catch SIGINT and save the data, sending the signal a second time breaks the script. That will hopefully be fixed in the future. In the meantime, we use the -f option to prevent that. Of course above mentioned approach could also be implemented manually, without tools like gtimeout.

Notes and Warnings

  • The script uses the Firefox browser and expects the Strava page to be in English. It may fail if the pages are in a different language because we identify for examples buttons or the categories with their labels.
  • The script tries to compile a single leaderboard list with all data in one table. In Strava for example age groups, sex and weight groups are not included in the full table. You can only see if a user is male or female if the leaderboard entry is displayed when the respective filter is applied. To get the full data, the scripts downloads the leaderboard for each category separately and joins the tables.
  • Please note that no user is obliged to specify their sex, weight or age or keep these values up to date. You may end up with missing data or wrong data in these columns.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

segment_downloader-0.1.0.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

segment_downloader-0.1.0-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file segment_downloader-0.1.0.tar.gz.

File metadata

  • Download URL: segment_downloader-0.1.0.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.1

File hashes

Hashes for segment_downloader-0.1.0.tar.gz
Algorithm Hash digest
SHA256 58e5ef784ff454fccd5d7ffc190147d6d5c4e8da2c2d4585d97c3eeb57252b0e
MD5 ebeeeecd93d28b17f57a2a4f72fdb021
BLAKE2b-256 75430de7662d456fbd6b33ba5e8ee3e076b9b00e74cb63cdd55cfa4044c4842e

See more details on using hashes here.

File details

Details for the file segment_downloader-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for segment_downloader-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 00145d93c84711cfb03289a5afb83800dad15aa1817ecc00fcd841c47af1aab2
MD5 fe5b3d9793215dbe51261cec56ef8ddf
BLAKE2b-256 62d32a9a81c6758720d34733b4c3d8386c770533d50d64fe863776b12c648bb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page