Skip to main content

Football data pipeline, MCP server tools, and pluggable storage for soccer analytics.

Project description

football-data-mcp

A multi-source football data pipeline and MCP server that lets Claude (and any MCP-compatible AI assistant) answer real football analytics questions — player scouting, similarity search, market value filtering, xG tables, match shot maps, and more.

The installable distribution on PyPI is named football-data-mcp (same as this repository). The ScraperFC name still refers to the upstream scraper Python package vendored under src/ScraperFC/ — e.g. from ScraperFC import Sofascore — not the PyPI distribution name for this project.

Built on top of ScraperFC by Owen Seymour.


What it does

Pulls data from four sources, merges them into a single unified dataset, and serves it through a Model Context Protocol (MCP) server:

Source What it contributes
SofaScore 80+ match stats per player: rating, progressive carries, big chances, dribbles, aerials, accurate long balls, etc.
FBref xG, npxG, xA, progressive passes received, GCA, SCA, pass completion %
Understat xg_chain, xg_buildup (involvement in build-up play)
Transfermarkt Market value, contract expiration, height, nationality, position

Coverage: 10 leagues · 3 seasons (2023-24, 2024-25, 2025-26) · 18,800+ player records · 146 columns


The 10 MCP tools

Once connected, Claude can use these tools directly in conversation:

Tool What you can ask
get_player "Show me everything on Bukayo Saka"
scout_position "Top 10 pressing forwards in the Bundesliga this season"
compare_players "Compare Salah and Son across all stats"
find_similar_players "Find players similar to Bellingham under €80m"
get_league_table "xG league table for Serie A, home games only"
get_match "Shot map from the El Clasico in March"
get_sofascore_match "Deep SofaScore stats for a specific fixture"
get_club_elo "ClubElo strength for Real Madrid"
get_player_history "Haaland's xG per game across the season"
data_status Coverage check across all data sources

Setup

1. Install dependencies

From a clone of this repo (editable install for development):

pip install -e .

That installs the football-data-mcp distribution and puts two CLI commands on your PATH:

  • soccer-mcp — same as python -m soccer_server (stdio MCP server).
  • collect-data — same as python -m collect_data (data pipeline CLI).

After the first PyPI release, end users can install with:

pip install football-data-mcp

2. Collect the data

# Full collection (takes a while — runs headless Chrome for FBref + SofaScore)
python3 -m collect_data
# Equivalent: collect-data   (console script from pip install)
# (equivalent: python3 collect_data.py — thin wrapper around the package)

# Individual sources
python3 -m collect_data --sofascore-only
python3 -m collect_data --understat-only
python3 -m collect_data --transfermarkt-only

# Supplementary data (xG tables, match shots, rosters)
python3 -m collect_data --understat-tables-only
python3 -m collect_data --understat-matches-only

# Rebuild the unified Parquet from already-collected raw files
python3 -m collect_data --rebuild-only
# Optional spreadsheet export alongside Parquet:
python3 -m collect_data --rebuild-only --export-csv

3. Connect to Claude Desktop

Add this to your claude_desktop_config.json (use one of the patterns below).

Recommended — run the package module from the repo (no need for the venv bin on PATH):

{
  "mcpServers": {
    "soccer-data": {
      "command": "python3",
      "args": ["-m", "soccer_server"],
      "cwd": "/path/to/football-data-mcp"
    }
  }
}

If soccer-mcp is on your PATH (after pip install -e . or pip install football-data-mcp):

{
  "mcpServers": {
    "soccer-data": {
      "command": "soccer-mcp",
      "cwd": "/path/to/football-data-mcp"
    }
  }
}

Legacy configs that pointed at a single file still work — python3 soccer_server.py is a thin shim that delegates to the same server:

{
  "mcpServers": {
    "soccer-data": {
      "command": "python3",
      "args": ["soccer_server.py"],
      "cwd": "/path/to/football-data-mcp"
    }
  }
}

On macOS the config file lives at: ~/Library/Application Support/Claude/claude_desktop_config.json

Restart Claude Desktop. The 10 tools will appear automatically.

Optional: set MCP_STDIO_TOOL_HINTS=0 in the server environment if you do not want extra _stdio_note lines on tool errors (HTTP wrappers typically ignore hints and use error / error_code only).


Data files

Raw files are stored in data/raw/. The merged player table is written to data/unified_player_stats.parquet (and optionally data/unified_player_stats.csv if you pass --export-csv). The MCP server reads Parquet first and falls back to CSV for older installs.

Storage backends (local vs R2)

  • Default: DATA_BACKEND=local (or unset). All paths live under data/ in the repo.
  • Cloudflare R2: set DATA_BACKEND=r2 and install extras: pip install -e ".[r2]" (from a clone) or pip install "football-data-mcp[r2]" (from PyPI once published). Required environment variables: R2_BUCKET, R2_ENDPOINT_URL, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY. Object keys mirror local layout (e.g. raw/foo.parquet, unified_player_stats.parquet).

Leagues covered

League Seasons
England Premier League 2023-24, 2024-25, 2025-26
England EFL Championship 2023-24, 2024-25, 2025-26
Spain La Liga 2023-24, 2024-25, 2025-26
Germany Bundesliga 2023-24, 2024-25, 2025-26
Italy Serie A 2023-24, 2024-25, 2025-26
France Ligue 1 2023-24, 2024-25, 2025-26
Netherlands Eredivisie 2023-24, 2024-25, 2025-26
Portugal Primeira Liga 2023-24, 2024-25, 2025-26
UEFA Champions League 2023-24, 2024-25, 2025-26
UEFA Europa League 2023-24, 2024-25, 2025-26

Transfermarkt financial data (market value, contract, nationality) covers the 8 domestic leagues for 2024-25 at 99.6% match rate.


Project structure

football-data-mcp/
├── collect_data.py          # Compatibility CLI wrapper (runs ``python -m collect_data``)
├── soccer_server.py         # Compatibility shim (runs ``python -m soccer_server``)
├── collect_data/            # Pipeline package
│   ├── config.py            # League lists, rename maps, seasons
│   ├── storage.py           # Paths, StorageBackend, save_raw, CheckpointTracker, freshness
│   ├── backends/            # ``local`` + ``r2`` implementations (``DATA_BACKEND``)
│   ├── helpers.py           # Name normalisation, retries, season helpers
│   ├── pipeline.py          # ``main()`` CLI (argparse + dispatch)
│   ├── collectors/        # One module per source (fbref, understat, …)
│   └── build/               # ``unified.py`` + ``financials.py`` merge layer
├── soccer_server/           # MCP server package (10 tools, stdio transport)
│   ├── tools.py             # Tool implementations
│   ├── registry.py          # ``TOOLS`` map (schemas + callables)
│   ├── cache.py             # Unified-table cache (optional TTL for hosted use)
│   ├── data_loading.py      # Filters + ClubElo / SofaScore helpers
│   ├── transport_stdio.py   # JSON-RPC stdin/stdout loop
│   └── __main__.py          # ``python -m soccer_server``
├── src/ScraperFC/           # ScraperFC scrapers (upstream: oseymour/ScraperFC)
└── data/
    ├── unified_player_stats.parquet   # Main merged dataset (gitignored)
    ├── unified_player_stats.csv       # Optional export (gitignored)
    └── raw/                           # Per-source parquet files (gitignored)

Contributing

This project builds on ScraperFC. Bug fixes to the underlying scrapers are contributed back upstream — if you find something broken in a scraper, consider opening an issue or PR there too.

For issues specific to the pipeline (collect_data package / collect-data / collect_data.py) or the MCP server (soccer_server package / soccer-mcp / python -m soccer_server), open an issue here.


Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

football_data_mcp-0.1.0.tar.gz (127.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

football_data_mcp-0.1.0-py3-none-any.whl (123.8 kB view details)

Uploaded Python 3

File details

Details for the file football_data_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: football_data_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 127.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for football_data_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0d987ec33e3849f1e72d592a55d1d0ae728260adcbc078736089ab76b19eeaf7
MD5 ab9fde76d0cb60e7018ef7ad18f8ac26
BLAKE2b-256 ae557c8d7c26301b46e9db2493fd1242a72240f4deb217edf6033f38e9ea587a

See more details on using hashes here.

Provenance

The following attestation bundles were made for football_data_mcp-0.1.0.tar.gz:

Publisher: publish.yml on kupsas/football-data-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file football_data_mcp-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for football_data_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 379eb98dc754e797ebe2a8523b1497d0422ddaed4c85561715544c3f860c73ae
MD5 1006a74fe6b44678e77627bd6222636e
BLAKE2b-256 a401b9a02df004e164ac9ca1e44f8b6964789c38f58844d8f60d9f7d50dff9f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for football_data_mcp-0.1.0-py3-none-any.whl:

Publisher: publish.yml on kupsas/football-data-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page