Skip to main content

Python wrapper for MLB Stats API

Project description

PyMLB StatsAPI

PyPI version Documentation Status Tests codecov Python 3.10+ License: MIT GitHub stars GitHub forks

A clean, Pythonic wrapper for MLB Stats API endpoints with automatic schema-driven parameter validation.

โœจ Features

  • ๐ŸŽฏ Clean API: Parameters are intelligently routed to path or query params based on the schema configuration
  • ๐Ÿชถ Lean: Only requires requests - no heavy dependencies
  • ๐Ÿ“‹ Schema-driven: All endpoints and methods generated from JSON schemas
  • โœ… Type-safe: Automatic parameter validation from API schemas
  • ๐Ÿ”„ Dynamic: Zero hardcoded models - updates via schema changes only
  • ๐Ÿงช Well-tested: Comprehensive unit tests with pytest and BDD test suite with stub capture/replay
  • ๐Ÿ“š Self-documenting: Auto-generated docstrings from API schemas
  • ๐Ÿš€ Fast: Stub-based testing runs in <1 second

๐Ÿš€ Quick Start

Installation

# With pip
pip install pymlb-statsapi

# With uv (recommended)
uv add pymlb-statsapi

Basic Usage

from pymlb_statsapi import api

# Get today's game schedule
response = api.Schedule.schedule(sportId=1, date="2024-10-27")
data = response.json()

# Get latest live game data (no timecode = most recent)
response = api.Game.liveGameV1(game_pk=747175)
data = response.json()

# Get game data at specific time
response = api.Game.liveGameV1(game_pk=747175, timecode="20241027_233000")
data = response.json()

# Get team information
response = api.Team.team(teamId=147, season=2024)
team_data = response.json()

# Save response to gzipped file with metadata
result = response.gzip(prefix="mlb-data")
print(f"Saved to: {result['path']}")
print(f"Captured at: {result['timestamp']}")

๐Ÿ”ง Smart Parameter Validation

Parameters accept both integers and strings - the library handles type conversion automatically:

# These are equivalent - use whichever is more convenient
api.Game.liveGameV1(game_pk=747175)          # Integer (Pythonic)
api.Game.liveGameV1(game_pk="747175")        # String (API format)

api.Team.team(teamId=147)                     # Integer
api.Team.team(teamId="147")                   # String

# The MLB API sometimes returns IDs as strings in responses
# You can pass them directly without conversion:
games = api.Schedule.schedule(sportId=1, date="2024-10-27").json()
for game in games['dates'][0]['games']:
    # game['gamePk'] is an integer from the API
    live_data = api.Game.liveGameV1(game_pk=game['gamePk'])

    # Or if you have a string ID from elsewhere:
    game_id = "747175"  # From database, user input, etc.
    live_data = api.Game.liveGameV1(game_pk=game_id)  # Works!

Why this matters: The MLB Stats API returns some fields as integers and others as strings. This flexible parameter handling means you never need to worry about type conversion - just pass what you have!

๐Ÿ“– Documentation

๐Ÿ” Start Here: Schema Reference

The Schema Reference is the heart of this library - browse all 21 MLB Stats API endpoints with detailed parameter docs and working examples for every method.

Additional Resources

  • Full Documentation - Complete guide on ReadTheDocs
  • API Reference - Implementation documentation
  • Examples - Check the examples/ directory for working code samples
  • Testing Guide - See the Testing documentation

๐Ÿ—๏ธ Architecture

Config-Driven Design

All MLB API endpoints are defined as JSON schemas rather than hardcoded. These schemas were sourced from the MLB Stats API Beta documentation site (https://beta-statsapi.mlb.com/docs/), which is no longer publicly available:

pymlb_statsapi/resources/schemas/statsapi/stats_api_1_0/
โ”œโ”€โ”€ schedule.json  โ†’ Schedule endpoint with methods
โ”œโ”€โ”€ game.json      โ†’ Game endpoint (live feed, boxscore, etc.)
โ”œโ”€โ”€ team.json      โ†’ Team endpoint (roster, stats, etc.)
โ”œโ”€โ”€ person.json    โ†’ Person endpoint (player data)
โ””โ”€โ”€ ...

Each schema defines which parameters are path parameters vs query parameters. Method paths are mapped in endpoint-model.json:

{
  "schedule": {
    "schedule": {
      "path": "/v1/schedule",
      "name": "schedule"
    },
    "tieGames": {
      "path": "/v1/schedule/games/tied",
      "name": "tieGames"
    }
  }
}

Clean API: Intelligent Parameter Routing

The library automatically routes parameters to path or query parameters based on schema configuration:

# Parameters are routed correctly based on the schema
response = api.Game.liveGameV1(game_pk=747175, timecode="20241027_233000")
# Resolves to: /api/v1/game/747175/feed/live?timecode=20241027_233000
#              game_pk โ†’ path parameter, timecode โ†’ query parameter

response = api.Schedule.schedule(sportId=1, date="2024-10-27")
# Resolves to: /api/v1/schedule?sportId=1&date=2024-10-27
#              Both are query parameters

# Latest game data (omit optional timecode)
response = api.Game.liveGameV1(game_pk=747175)
# Resolves to: /api/v1/game/747175/feed/live

Key Components

Dynamic Factory (factory.py):

  • Generates endpoint classes and methods from schemas at runtime
  • Creates clean function signatures with proper parameter handling
  • Handles method overloading (e.g., seasons() with/without seasonId)

Registry (registry.py):

  • Central api singleton that loads all endpoints
  • Provides discovery API for exploring available methods

API Response (factory.py: APIResponse):

  • Wraps requests.Response with metadata
  • Provides .json(), .save_json(), .get_path(), .get_uri() methods
  • Generates consistent resource paths for file storage

๐ŸŽ“ Examples

Working with Different Endpoints

from pymlb_statsapi import api

# Schedule queries
response = api.Schedule.schedule(
    sportId=1,
    date="2024-10-27",
    teamId=147
)

# Get all teams
response = api.Team.teams(sportId=1, season=2024)

# Get player information
response = api.Person.people(personId=660271)

# Get season information (overloaded method)
response = api.Season.seasons(sportId=1)  # All seasons
response = api.Season.seasons(seasonId=2024)  # Specific season

# Get game stats
response = api.Stats.stats(
    group="hitting",
    stats="season",
    season=2024,
    sportId=1
)

File Storage

# Auto-generate file path
result = response.save_json(prefix="mlb-data")
print(f"Saved to: {result['path']}")
print(f"Bytes written: {result['bytes_written']}")

# Explicit file path
response.save_json("/path/to/file.json")

# Gzipped JSON
response.gzip(prefix="mlb-data")

URI Generation for Different Protocols

# File protocol (default)
uri = response.get_uri(protocol="file", prefix="mlb-data")
# Result: file:///path/to/.var/local/mlb_statsapi/mlb-data/schedule/schedule/date=2025-06-01.json

# S3 protocol (requires PYMLB_STATSAPI__S3_BUCKET env var)
uri = response.get_uri(protocol="s3", prefix="raw-data", gzip=True)
# Result: s3://my-bucket/raw-data/schedule/schedule/date=2025-06-01.json.gz

# Redis protocol
uri = response.get_uri(protocol="redis", prefix="mlb")
# Result: redis://localhost:6379/0/mlb/schedule/schedule/date=2025-06-01

API Discovery

# List all available endpoints
print(api.get_endpoint_names())
# ['schedule', 'game', 'team', 'person', 'season', ...]

# List methods for an endpoint
endpoint = api.get_endpoint("schedule")
print(endpoint.get_method_names())
# ['schedule', 'tieGames', 'postseason', ...]

# Get detailed method information
info = api.get_method_info("schedule", "schedule")
print(info["path"])          # /v1/schedule
print(info["summary"])       # View schedule info
print(info["path_params"])   # []
print(info["query_params"])  # [{"name": "sportId", ...}, ...]

๐Ÿงช Testing

Unit Tests

# Run unit tests
pytest

# With coverage
pytest --cov=pymlb_statsapi --cov-report=html

# Specific test file
pytest tests/unit/pymlb_statsapi/model/test_factory.py

BDD Tests with Stubs (Fast)

# Run all BDD tests with stubs (completes in <1 second)
behave

# Or explicitly
STUB_MODE=replay behave

Capture Fresh Stubs

# Capture stubs by making real API calls
STUB_MODE=capture behave

# Capture stubs for specific endpoint
STUB_MODE=capture behave features/schedule.feature

Run Specific BDD Tests

# Test specific feature
behave features/game.feature

# Verbose output
behave -v features/season.feature

# Test with specific tag
behave --tags=@game

๐Ÿ› ๏ธ Development

Setup

# Clone repository
git clone https://github.com/power-edge/pymlb_statsapi.git
cd pymlb_statsapi

# Install dependencies
uv sync

# Install pre-commit hooks
pre-commit install

Code Quality

# Linting
ruff check .

# Auto-fix linting issues
ruff check --fix .

# Formatting
ruff format .

# Run all pre-commit hooks
pre-commit run --all-files

# Security scan
bandit -r pymlb_statsapi/

Building

# Build package
hatch build

# Or with uv
uv build

# Version is auto-generated from git tags via hatch-vcs

Publishing

For local publishing (requires .env configuration):

# Set up credentials
cp .env.example .env
# Edit .env with your PyPI tokens

# Publish to TestPyPI (for testing)
./scripts/publish.sh testpypi

# Publish to PyPI (production)
./scripts/publish.sh pypi

For automated publishing, use GitHub Actions:

  • Push a version tag (e.g., v1.2.0) to trigger automatic PyPI publishing
  • Configure PYPI_TOKEN secret in GitHub repository settings

๐Ÿ“Š Test Coverage

  • 30 unit tests with pytest covering core functionality
  • 39 BDD scenarios covering all major endpoints
  • 277 test steps with path/query parameter variations
  • Stub-based testing for fast, deterministic CI/CD
  • Real-world data using completed games (October 2024 World Series)

๐ŸŒŸ Support This Project

If you find this library useful, consider supporting its development:

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿค Contributing

Contributions are welcome! Please feel free to fork the repository and submit a Pull Request.

We have a git helper for common operations, see scripts/git.sh

================================
   Git Workflow Helper
================================
1) Format & commit changes
2) Create release (bump version & tag)
3) Push to remote
4) Full release (format, commit, bump, tag, push, build)
5) Status
6) Exit

๐Ÿ”— Links

๐Ÿ™ Acknowledgments

  • Built on the excellent MLB Stats API
  • Inspired by various MLB data projects in the community
  • Thanks to all contributors!

Made with โค๏ธ by the PyMLB StatsAPI Team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymlb_statsapi-1.4.3.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymlb_statsapi-1.4.3-py3-none-any.whl (141.5 kB view details)

Uploaded Python 3

File details

Details for the file pymlb_statsapi-1.4.3.tar.gz.

File metadata

  • Download URL: pymlb_statsapi-1.4.3.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for pymlb_statsapi-1.4.3.tar.gz
Algorithm Hash digest
SHA256 64b20f49b61dda9a930613d9249d4c901d6f07ad4bd18deba4df5cefe8fc9063
MD5 41bae403d5803d0d6c3693ff937ca368
BLAKE2b-256 ea9193da724ae254394eb84c9403dec38a6673f978b567eae29d78a7d36e66a0

See more details on using hashes here.

File details

Details for the file pymlb_statsapi-1.4.3-py3-none-any.whl.

File metadata

  • Download URL: pymlb_statsapi-1.4.3-py3-none-any.whl
  • Upload date:
  • Size: 141.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for pymlb_statsapi-1.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e52ecc263f8be01c0d5bde1cd7177e350917c635ec192a9b0f60c2b2c8bdd607
MD5 30930abf490793cfc62c28f8a9d237ef
BLAKE2b-256 5dd6f2f204079a168f9bb3119b45ea529153cf8ce734abf7ce96da8d1b3a2e6e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page