Skip to main content

Standardizing soccer tracking- and event data

Project description

kloppy: standardizing soccer tracking- and event data

klop·pen (klopte, heeft geklopt) - juist zijn; overeenkomen, uitkomen met: dat klopt, dat kan kloppen is juist; dat klopt als een zwerende vinger dat is helemaal juist

PyPI Latest Release Downloads Powered by PySport

What is it?

kloppy is a Python package providing (de)serializers for soccer tracking- and event data, standardized data models, filters, and transformers designed to make working with different tracking- and event data like a breeze. It aims to be the fundamental building blocks for loading, filtering and tranforming tracking- and event data.

Main Features

Here are just a few of the things that kloppy does well:

Loading data

  • Directly load Public datasets to get started right away.
  • Understandable Standardized models for tracking- and event datasets
  • Out-of-the-box (De)serializing tracking- and event data from different sources into standardized models and vice versa

Processing data

  • Flexible pitch dimensions transformer for changing a dataset pitch dimensions from one to another (eg OPTA's 100x100 -> TRACAB meters)
  • Intelligent orientation transforming orientation of a dataset (eg from TRACAB fixed orientation to "Home Team" orientation)

Pattern matching

  • Search for complexe patterns in event data.
  • Use kloppy-query to export fragments to XML file

Where to get it

The source code is currently hosted on GitHub at: https://github.com/PySport/kloppy

Installers for the latest released version are available at the Python package index.

# or PyPI
pip install kloppy

Quickstart

We added some helper functions to get started really quickly. The helpers allow eay loading, transforming and converting to pandas of tracking data.

from kloppy import (
    load_metrica_tracking_data, 
    load_tracab_tracking_data,
    load_epts_tracking_data, 
    load_statsbomb_event_data,
    load_opta_event_data,
    to_pandas, 
    transform
)

# metrica data
dataset = load_metrica_tracking_data('home_file.csv', 'away_file.csv')
# or tracab
dataset = load_tracab_tracking_data('meta.xml', 'raw_data.txt')
# or epts
dataset = load_epts_tracking_data('meta.xml', 'raw_data.txt')

# or event data: statsbomb
dataset = load_statsbomb_event_data('event_data.json', 'lineup.json')
# opta
dataset = load_opta_event_data('f24_data.xml', 'f7_data.xml')


dataset = transform(dataset, to_pitch_dimensions=[[0, 108], [-34, 34]])
pandas_data_frame = to_pandas(dataset)

Public datasets / Very quick start

More and more companies are publishing (demo) datasets to get you started. Inspired by the tensorflow_datasets package, we added a "dataset loader" which does all the heavy lifting for you: find urls, download files, organize and load them.

from kloppy import datasets

dataset = datasets.load("metrica_tracking", options={'sample_rate': 1./12, 'limit': 10})

Standardized models

Most providers use different names for the same thing. This module tries to model the real world as much as possible. Understandable models are important and in some cases this means performance is subordinate to models that are easy to reason about. Please browse to source of domain.models to find the available models.

(De)serializing data

When working with tracking- or event data we need to deserialize it from the format the provider uses. kloppy will provide both deserializing as serializing. This will make it possible to read format one, transform and filter and store in a different format.

from kloppy import TRACABSerializer

serializer = TRACABSerializer()

with open("tracab_data.dat", "rb") as raw, \
        open("tracab_metadata.xml", "rb") as meta:

    dataset = serializer.deserialize(
        inputs={
            'raw_data': raw,
            'meta_data': meta
        },
        options={
            "sample_rate": 1 / 12
        }
    )
    
    # start working with dataset

or Metrica data

from kloppy import MetricaTrackingSerializer

serializer = MetricaTrackingSerializer()

with open("Sample_Game_1_RawTrackingData_Away_Team.csv", "rb") as raw_away, \
        open("Sample_Game_1_RawTrackingData_Home_Team.csv", "rb") as raw_home:

    dataset = serializer.deserialize(
        inputs={
            'raw_data_home': raw_home,
            'raw_data_away': raw_away
        },
        options={
            "sample_rate": 1 / 12
        }
    )
    
    # start working with dataset

or EPTS data

from kloppy import EPTSSerializer

serializer = EPTSSerializer()

with open("raw_data.txt", "rb") as raw, \
        open("metadata.xml", "rb") as meta:

    dataset = serializer.deserialize(
        inputs={
            'raw_data': raw,
            'meta_data': meta
        },
        options={
            "sample_rate": 1 / 12
        }
    )
    
    # start working with dataset

or StatsBomb event data

from kloppy import StatsBombSerializer

serializer = StatsBombSerializer()

with open("events/123123.json", "rb") as event_data, \
        open("lineup/123123.json", "rb") as lineup_data:

    dataset = serializer.deserialize(
        inputs={
            'event_data': event_data,
            'lineup_data': lineup_data
        },
        options={
            "event_types": ["pass", "shot", "carry", "take_on"]
        }
    )
    
    # start working with dataset

or Opta event data

from kloppy import OptaSerializer

serializer = OptaSerializer()

with open("f24_data.xml", "rb") as f24_data, \
        open("f7_data.xml", "rb") as f7_data:

    dataset = serializer.deserialize(
        inputs={
            'f24_data': f24_data,
            'f7_data': f7_data
        },
        options={
            "event_types": ["pass", "shot"]
        }
    )
    
    # start working with dataset

Transform the pitch dimensions

Data providers use their own pitch dimensions. Some use actual meters while others use 100x100. Use the Transformer to get from one pitch dimensions to another one.

from kloppy.domain import Transformer, PitchDimensions, Dimension

# use deserialized `dataset`
new_dataset = Transformer.transform_dataset(
    dataset,
    to_pitch_dimensions=PitchDimensions(
        x_dim=Dimension(0, 100),
        y_dim=Dimension(0, 100)
    )
)

Transform the orientation

Data providers can use different orientations. Some use a fixed orientation and others use ball owning team.

from kloppy.domain import Transformer, Orientation

new_dataset = Transformer.transform_dataset(
    dataset,
    to_orientation=Orientation.BALL_OWNING_TEAM
)

Transforming pitch dimensions and orientation at the same time

from kloppy.domain import Transformer, PitchDimensions, Dimension, Orientation

# use deserialized `dataset`
new_dataset = Transformer.transform_dataset(
    dataset,
    to_pitch_dimensions=PitchDimensions(
        x_dim=Dimension(0, 100),
        y_dim=Dimension(0, 100)
    ),
    to_orientation=Orientation.BALL_OWNING_TEAM
)

Adding additional columns to DataFrame

You can add more information to the DataFrame returned by to_pandas by passing a Dictionary in the format {'column_name': function or scalar}. Function should take a frame (for tracking data) or an event (for event data) as argument and return a valid input for a DataFrame.

from kloppy import datasets, to_pandas
dataset = datasets.load("statsbomb")

# Statsbomb events may include a list of related events.
# This function returns the first event from that list, if it exists.
get_first_related_event = lambda x: x.raw_event['related_events'][0] if 'related_events' in x.raw_event else None
to_pandas(dataset, additional_columns={'related_event': get_first_related_event, 'match': 'test1'})

Contributing to kloppy

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

A overview on how to contribute can be found in the contributing guide.

If you are simply looking to start working with the kloppy codebase, navigate to the GitHub "issues" tab and start looking through interesting issues.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kloppy-0.6.2.tar.gz (48.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page