Skip to main content

This package contains an ETL pipeline for extracting, transforming, and preparing Formula 1 telemetry data for time series classification tasks, specifically designed for safety car prediction and other F1 data science applications.

Project description

The f1_etl package

This package contains an ETL pipeline for extracting, transforming, and preparing Formula 1 telemetry data for time series classification tasks, specifically designed for safety car prediction and other F1 data science applications.

Features

  • Automated Data Extraction: Pull telemetry data from FastF1 for entire seasons
  • Time Series Generation: Create sliding window sequences from raw telemetry
  • Feature Engineering: Handle missing values, normalization, and data type conversion
  • Track Status Integration: Align telemetry with track status for safety car prediction
  • Flexible Configuration: Support for custom features, window sizes, and prediction horizons
  • Caching Support: Cache raw data to avoid repeated API calls

Installation

The project is managed with uv but you can just use pip if that is preferable.

Install:

  • From Source...
    uv pip install -e .
    
  • From Wheel...
    uv build
    uv pip install dist/f1_etl-0.1.0-py3-none-any.whl
    

Verify:

uv pip list | grep f1-etl

Quick Start

Basic Usage - Single Race

from f1_etl import SessionConfig, DataConfig, create_safety_car_dataset

# Define a single race session
session = SessionConfig(
    year=2024,
    race="Monaco Grand Prix",
    session_type="R"  # Race
)

# Configure the dataset
config = DataConfig(
    sessions=[session],
    cache_dir="./f1_cache"
)

# Generate the dataset
dataset = create_safety_car_dataset(
    config=config,
    window_size=100,
    prediction_horizon=10
)

print(f"Generated {dataset['config']['n_sequences']} sequences")
print(f"Features: {dataset['config']['feature_names']}")
print(f"Class distribution: {dataset['class_distribution']}")

Full Season Dataset

from f1_etl import create_season_configs

# Generate configs for all 2024 races
race_configs = create_season_configs(2024, session_types=['R'])

# Create dataset configuration
config = DataConfig(
    sessions=race_configs,
    cache_dir="./f1_cache"
)

# Generate the complete dataset
dataset = create_safety_car_dataset(
    config=config,
    window_size=150,
    prediction_horizon=20,
    normalization_method='standard'
)

# Access the data
X = dataset['X']  # Shape: (n_sequences, window_size, n_features)
y = dataset['y']  # Encoded labels
metadata = dataset['metadata']  # Sequence metadata

Multiple Session Types

# Include practice, qualifying, and race sessions
all_configs = create_season_configs(
    2024, 
    session_types=['FP1', 'FP2', 'FP3', 'Q', 'R']
)

config = DataConfig(
    sessions=all_configs,
    drivers=['HAM', 'VER', 'LEC'],  # Specific drivers only
    cache_dir="./f1_cache"
)

dataset = create_safety_car_dataset(config=config)

Custom Target Variable

# Use a different target column (not track status)
dataset = create_safety_car_dataset(
    config=config,
    target_column='Speed',  # Predict speed instead
    window_size=50,
    prediction_horizon=5
)

Machine Learning Integration

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Generate dataset
dataset = create_safety_car_dataset(config=config)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    dataset['X'], dataset['y'], test_size=0.2, random_state=42
)

# For sklearn models, reshape to 2D
n_samples, n_timesteps, n_features = X_train.shape
X_train_2d = X_train.reshape(n_samples, n_timesteps * n_features)
X_test_2d = X_test.reshape(X_test.shape[0], -1)

# Train a model
clf = RandomForestClassifier()
clf.fit(X_train_2d, y_train)
score = clf.score(X_test_2d, y_test)
print(f"Accuracy: {score:.3f}")

Advanced Configuration

# Custom feature engineering
dataset = create_safety_car_dataset(
    config=config,
    window_size=200,
    prediction_horizon=15,
    handle_non_numeric='encode',  # or 'drop'
    normalization_method='minmax',  # or 'standard', 'per_sequence'
    target_column='TrackStatus',
    enable_debug=True  # Detailed logging
)

# Access preprocessing components for reuse
feature_engineer = dataset['feature_engineer']
label_encoder = dataset['label_encoder']

# Use on new data
new_X_normalized = feature_engineer.normalize_sequences(new_X, fit=False)
new_y_encoded = label_encoder.transform(new_y)

Configuration Options

SessionConfig

  • year: F1 season year
  • race: Race name (e.g., "Monaco Grand Prix")
  • session_type: Session type ('R', 'Q', 'FP1', etc.)

DataConfig

  • sessions: List of SessionConfig objects
  • drivers: Optional list of driver abbreviations
  • cache_dir: Directory for caching raw data
  • include_weather: Include weather data (default: True)

Pipeline Parameters

  • window_size: Length of each time series sequence
  • prediction_horizon: Steps ahead to predict
  • handle_non_numeric: How to handle non-numeric features ('encode' or 'drop')
  • normalization_method: Normalization strategy ('standard', 'minmax', 'per_sequence')
  • target_column: Column to predict (default: 'TrackStatus')

Output Structure

dataset = {
    'X': np.ndarray,              # Normalized feature sequences
    'y': np.ndarray,              # Encoded target labels
    'y_raw': np.ndarray,          # Original target values
    'metadata': List[Dict],       # Sequence metadata
    'label_encoder': LabelEncoder, # For inverse transformation
    'feature_engineer': FeatureEngineer,  # For applying to new data
    'raw_telemetry': pd.DataFrame, # Original telemetry data
    'class_distribution': Dict,    # Label distribution
    'config': Dict                # Pipeline configuration
}

Error Handling

The pipeline includes robust error handling:

  • Missing telemetry data for specific drivers
  • Insufficient data for sequence generation
  • Track status alignment issues
  • Feature processing errors

Enable debug logging to troubleshoot issues:

dataset = create_safety_car_dataset(config=config, enable_debug=True)

Performance Tips

  1. Use caching: Set cache_dir to avoid re-downloading data
  2. Filter drivers: Specify drivers list to reduce data volume
  3. Adjust window size: Smaller windows = more sequences but less context
  4. Choose appropriate step size: Default is window_size // 2 for 50% overlap

License

TBD

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

f1_etl-0.3.1.tar.gz (105.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

f1_etl-0.3.1-py3-none-any.whl (32.6 kB view details)

Uploaded Python 3

File details

Details for the file f1_etl-0.3.1.tar.gz.

File metadata

  • Download URL: f1_etl-0.3.1.tar.gz
  • Upload date:
  • Size: 105.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for f1_etl-0.3.1.tar.gz
Algorithm Hash digest
SHA256 0d918d49a1b87485ed8e7d6ab6b044bdfc298a69c4b638e5ecfc8619f67641ed
MD5 479473ba1d2db5f60bd28b30429d2049
BLAKE2b-256 9373eb9508a55859bd5b20597a9c6a25166481418d1ab209fa95c68234e7b1da

See more details on using hashes here.

File details

Details for the file f1_etl-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: f1_etl-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 32.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for f1_etl-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a61cfc8037b0400ddcb6763ebba0fae1d1892735e6c6523ac7272d04ffc0a7ec
MD5 b003f8c7a6daf1f5e75ea24506b7f56b
BLAKE2b-256 b3db767531a980864681de1416b121923048c55683dae9d20f0b9f5c7ea47a71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page