Skip to main content

Gandula grabs football data and puts it back into play.

Project description

gandula

gandula

gandula is a Python library developed by the Sports Analytics Lab at the Federal University of Minas Gerais (UFMG) for working with Gradient football tracking and event data.

Gandula is the word for ball boy in Brazilian Portuguese. It originates from the 1930s, from the word "gandulo", that in archaic Portuguese means slacker/beggar. Back in the 30s, the word started to be used to refer to vagabond boys who did nothing else but watch football in the pitches in Rio. These "gandulas" would help by bringing the kicked-out balls. In 1939, Clube de Regatas Vasco da Gama hired the Argentinian striker Bernardo Gandulla, who was known to bring back the ball as fair play. The gandula then got popularized over the country. In our gandula, the ball is the data, and the data scientists/analysts are the stars of the game.


Data Sources

gandula supports two data sources:

  • S3 tracking data — Stream or download tracking frames (player/ball positions at 30fps) directly from Gradient's S3 bucket. Requires AWS credentials (PFF_AWS_ACCESS_KEY_ID / PFF_AWS_SECRET_ACCESS_KEY). Each match includes tracking data (.jsonl.bz2), metadata (metadata.json), and rosters (rosters.json).

  • Local event data (Gradient v2.6) — Load match events from local JSON files in the Gradient v2.6 format ({game_id}.json). Each file contains possession events with embedded tracking snapshots, video URLs, grades, and more.


Quick Start

Installation (development)

git clone git@github.com:SALabUFMG/gandula.git
cd gandula

Set up the environment with uv:

uv sync

For S3 access:

uv sync --extra s3

For pitch control (requires PyTorch):

uv sync --extra pitch-control

Setup

Create a .env file in the project root with your AWS credentials:

PFF_AWS_ACCESS_KEY_ID='your_access_key'
PFF_AWS_SECRET_ACCESS_KEY='your_secret_key'

Then load them in your shell before running code:

export $(cat .env | xargs)

Or load them in Python:

import os
os.environ['PFF_AWS_ACCESS_KEY_ID'] = 'your_access_key'
os.environ['PFF_AWS_SECRET_ACCESS_KEY'] = 'your_secret_key'

All gandula S3 functions will pick up the credentials automatically.


Usage

The best way to get started is the walkthrough notebook, which covers every major feature end-to-end:

  1. Loading event data from Gradient v2.6 JSON files
  2. Exploring and filtering events by type
  3. Converting events to DataFrames
  4. Loading tracking data from S3 (frames, metadata, rosters)
  5. Visualizing tracking frames (single frame & animated sequences)
  6. Converting frames to DataFrames
  7. Joining events with tracking data
  8. Exporting frames as GIF, PNG, and MP4
  9. Feature engineering (player speed, ball speed)
  10. Pitch coordinate transformation
  11. Pitch control computation & visualization
  12. Accessing video URLs

Quick examples

import gandula

# --- Event data ---
events = gandula.get_events('41177.json')
df = gandula.gradient_events_to_dataframe(events)

# --- S3 tracking data ---
matches = gandula.list_s3_matches(competition_id=1, season='2025-2026')
frames = gandula.get_s3_frames(matches[0])

# --- Visualize ---
gandula.view(frames[0])

# --- Export ---
gandula.export(frames[100:200], fmt='gif', filename='play')

# --- Pitch control ---
from gandula.utils import compute_pitch_control_from_frames

result = compute_pitch_control_from_frames(
    frames,
    attacking_team='home',
    start_frame=frames[100].frame_id,
    end_frame=frames[400].frame_id,
    period=1,
)
gandula.view(result, frame_index=0)

More notebooks

Notebook What it shows
pff-load-from-json.ipynb Load and explore Gradient v2.6 event data
pff-data-transformation.ipynb Transform events to DataFrames, filter, group
pff-search.ipynb Search events by type, extract video URLs
pff-tracking.ipynb Load, visualize, and export S3 tracking data
pff-defensive-line-height.ipynb Defensive line metric from tracking data
pff-events-withing-tracking-to-pandas.ipynb Join events with tracking data in pandas

Documentation


Development

Install dev dependencies and pre-commit hooks:

uv sync --extra dev
pre-commit install

Run tests:

uv run pytest tests/

License & Copyright

The main image is "Ballkid at soccer, China" by Micah Sittig, licensed under CC BY 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gandula-2.0.0.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gandula-2.0.0-py3-none-any.whl (46.8 kB view details)

Uploaded Python 3

File details

Details for the file gandula-2.0.0.tar.gz.

File metadata

  • Download URL: gandula-2.0.0.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.19

File hashes

Hashes for gandula-2.0.0.tar.gz
Algorithm Hash digest
SHA256 489783d0c4ad464807786b34896483785e5acf831c7948718d00180c5758f501
MD5 5765d2091a36a9c29e9dbb1370e13bf0
BLAKE2b-256 b767f6343a437aec492782c7ba837ab5d74436f517a721890cdb14e40ceccd9a

See more details on using hashes here.

File details

Details for the file gandula-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: gandula-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 46.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.19

File hashes

Hashes for gandula-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9116e7bf54f58ea99c9f09d5bc614b39491e3413e87027f8fc7080419fb087bb
MD5 05df6c296191e4d09ea20aaaacf0eab5
BLAKE2b-256 0d0d5ba10bdfba6a349a79b47a6504679686025d93ccb7bb72e679c3dccc9f11

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page