Skip to main content

Gandula grabs football data and puts it back into play.

Project description

gandula

gandula

gandula is a Python library developed by the Sports Analytics Lab at the Federal University of Minas Gerais (UFMG) for working with PFF football tracking and event data.

Gandula is the word for ball boy in Brazilian Portuguese. It originates from the 1930s, from the word "gandulo", that in archaic Portuguese means slacker/beggar. Back in the 30s, the word started to be used to refer to vagabond boys who did nothing else but watch football in the pitches in Rio. These "gandulas" would help by bringing the kicked-out balls. In 1939, Clube de Regatas Vasco da Gama hired the Argentinian striker Bernardo Gandulla, who was known to bring back the ball as fair play. The gandula then got popularized over the country. In our gandula, the ball is the data, and the data scientists/analysts are the stars of the game.


Data Sources

gandula supports two data sources:

  • S3 tracking data — Stream or download tracking frames (player/ball positions at 30fps) directly from PFF's S3 bucket. Requires AWS credentials (PFF_AWS_ACCESS_KEY_ID / PFF_AWS_SECRET_ACCESS_KEY). Each match includes tracking data (.jsonl.bz2), metadata (metadata.json), and rosters (rosters.json).

  • Local event data (Gradient v2.6) — Load match events from local JSON files in the Gradient v2.6 format ({game_id}.json). Each file contains possession events with embedded tracking snapshots, video URLs, grades, and more.


Quick Start

Installation (development)

git clone git@github.com:SALabUFMG/gandula.git
cd gandula

Set up the environment with uv:

uv sync

For S3 access:

uv sync --extra s3

For pitch control (requires PyTorch):

uv sync --extra pitch-control

Setup

Create a .env file in the project root with your AWS credentials:

PFF_AWS_ACCESS_KEY_ID='your_access_key'
PFF_AWS_SECRET_ACCESS_KEY='your_secret_key'

Then load them at the top of your scripts or notebooks:

from dotenv import find_dotenv, load_dotenv
load_dotenv(find_dotenv())

All gandula S3 functions will pick up the credentials automatically.


Usage

The best way to get started is the walkthrough notebook, which covers every major feature end-to-end:

  1. Loading event data from Gradient v2.6 JSON files
  2. Exploring and filtering events by type
  3. Converting events to DataFrames
  4. Loading tracking data from S3 (frames, metadata, rosters)
  5. Visualizing tracking frames (single frame & animated sequences)
  6. Converting frames to DataFrames
  7. Joining events with tracking data
  8. Exporting frames as GIF, PNG, and MP4
  9. Feature engineering (player speed, ball speed)
  10. Pitch coordinate transformation
  11. Pitch control computation & visualization
  12. Accessing video URLs

Quick examples

import gandula

# --- Event data ---
events = gandula.get_events('41177.json')
df = gandula.gradient_events_to_dataframe(events)

# --- S3 tracking data ---
matches = gandula.list_s3_matches(competition_id=1, season='2025-2026')
frames = gandula.get_s3_frames(matches[0])

# --- Visualize ---
gandula.view(frames[0])

# --- Export ---
gandula.export(frames[100:200], fmt='gif', filename='play')

# --- Pitch control ---
from gandula.utils import compute_pitch_control_from_frames

result = compute_pitch_control_from_frames(
    frames,
    attacking_team='home',
    start_frame=frames[100].frame_id,
    end_frame=frames[400].frame_id,
    period=1,
)
gandula.view(result, frame_index=0)

More notebooks

Notebook What it shows
pff-load-from-json.ipynb Load and explore Gradient v2.6 event data
pff-data-transformation.ipynb Transform events to DataFrames, filter, group
pff-search.ipynb Search events by type, extract video URLs
pff-tracking.ipynb Load, visualize, and export S3 tracking data
pff-defensive-line-height.ipynb Defensive line metric from tracking data
pff-events-withing-tracking-to-pandas.ipynb Join events with tracking data in pandas

Documentation


Development

Install dev dependencies and pre-commit hooks:

uv sync --extra dev
pre-commit install

Run tests:

uv run pytest tests/

License & Copyright

The main image is "Ballkid at soccer, China" by Micah Sittig, licensed under CC BY 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gandula-1.0.0.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gandula-1.0.0-py3-none-any.whl (45.3 kB view details)

Uploaded Python 3

File details

Details for the file gandula-1.0.0.tar.gz.

File metadata

  • Download URL: gandula-1.0.0.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.19

File hashes

Hashes for gandula-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1d15929869df20360648ce73499e5403bf76c22599c12e4c9ee3ee071e80a20a
MD5 fd8c1a0d016295c65291672926866a07
BLAKE2b-256 255b671a9567ecafc2f49547353cca2630021cebeea52547bb2e20b096a5fbaf

See more details on using hashes here.

File details

Details for the file gandula-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: gandula-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 45.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.19

File hashes

Hashes for gandula-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 afcb3187ca8d9d71c083eb8dc1c34f347b9d7eda8b3ac9a37e1fd8f78bd97af6
MD5 b8f3bfd9577bb8e7af1d4d24d23c45b3
BLAKE2b-256 86014e170d17f1ebdd239b501b5f64798535819e138628ad1b47479871fdb3dd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page