Football data pipeline, MCP server tools, and pluggable storage for soccer analytics.
Project description
football-data-mcp
A multi-source football data pipeline and MCP server that lets Claude (and any MCP-compatible AI assistant) answer real football analytics questions — player scouting, similarity search, market value filtering, xG tables, match shot maps, and more.
The installable distribution on PyPI is named football-data-mcp (same as this repository). The ScraperFC name still refers to the upstream scraper Python package vendored under src/ScraperFC/ — e.g. from ScraperFC import Sofascore — not the PyPI distribution name for this project.
Built on top of ScraperFC by Owen Seymour.
What it does
Pulls data from four sources, merges them into a single unified dataset, and serves it through a Model Context Protocol (MCP) server:
| Source | What it contributes |
|---|---|
| SofaScore | 80+ match stats per player: rating, progressive carries, big chances, dribbles, aerials, accurate long balls, etc. |
| FBref | xG, npxG, xA, progressive passes received, GCA, SCA, pass completion % |
| Understat | xg_chain, xg_buildup (involvement in build-up play) |
| Transfermarkt | Market value, contract expiration, height, nationality, position |
Coverage: 10 leagues · 3 seasons (2023-24, 2024-25, 2025-26) · 18,800+ player records · 146 columns
The 10 MCP tools
Once connected, Claude can use these tools directly in conversation:
| Tool | What you can ask |
|---|---|
get_player |
"Show me everything on Bukayo Saka" |
scout_position |
"Top 10 pressing forwards in the Bundesliga this season" |
compare_players |
"Compare Salah and Son across all stats" |
find_similar_players |
"Find players similar to Bellingham under €80m" |
get_league_table |
"xG league table for Serie A, home games only" |
get_match |
"Shot map from the El Clasico in March" |
get_sofascore_match |
"Deep SofaScore stats for a specific fixture" |
get_club_elo |
"ClubElo strength for Real Madrid" |
get_player_history |
"Haaland's xG per game across the season" |
data_status |
Coverage check across all data sources |
Setup
1. Install dependencies
From a clone of this repo (editable install for development):
pip install -e .
That installs the football-data-mcp distribution and puts two CLI commands on your PATH:
soccer-mcp— same aspython -m soccer_server(stdio MCP server).collect-data— same aspython -m collect_data(data pipeline CLI).
After the first PyPI release, end users can install with:
pip install football-data-mcp
2. Collect the data
# Full collection (takes a while — runs headless Chrome for FBref + SofaScore)
python3 -m collect_data
# Equivalent: collect-data (console script from pip install)
# (equivalent: python3 collect_data.py — thin wrapper around the package)
# Individual sources
python3 -m collect_data --sofascore-only
python3 -m collect_data --understat-only
python3 -m collect_data --transfermarkt-only
# Supplementary data (xG tables, match shots, rosters)
python3 -m collect_data --understat-tables-only
python3 -m collect_data --understat-matches-only
# Rebuild the unified Parquet from already-collected raw files
python3 -m collect_data --rebuild-only
# Optional spreadsheet export alongside Parquet:
python3 -m collect_data --rebuild-only --export-csv
3. Connect to Claude Desktop
Add this to your claude_desktop_config.json (use one of the patterns below).
Recommended — run the package module from the repo (no need for the venv bin on PATH):
{
"mcpServers": {
"soccer-data": {
"command": "python3",
"args": ["-m", "soccer_server"],
"cwd": "/path/to/football-data-mcp"
}
}
}
If soccer-mcp is on your PATH (after pip install -e . or pip install football-data-mcp):
{
"mcpServers": {
"soccer-data": {
"command": "soccer-mcp",
"cwd": "/path/to/football-data-mcp"
}
}
}
Legacy configs that pointed at a single file still work — python3 soccer_server.py is a thin shim that delegates to the same server:
{
"mcpServers": {
"soccer-data": {
"command": "python3",
"args": ["soccer_server.py"],
"cwd": "/path/to/football-data-mcp"
}
}
}
On macOS the config file lives at:
~/Library/Application Support/Claude/claude_desktop_config.json
Restart Claude Desktop. The 10 tools will appear automatically.
Optional: set MCP_STDIO_TOOL_HINTS=0 in the server environment if you do not want extra _stdio_note lines on tool errors (HTTP wrappers typically ignore hints and use error / error_code only).
Data files
Raw files are stored in data/raw/. The merged player table is written to
data/unified_player_stats.parquet (and optionally data/unified_player_stats.csv
if you pass --export-csv). The MCP server reads Parquet first and falls back
to CSV for older installs.
Storage backends (local vs R2)
- Default:
DATA_BACKEND=local(or unset). All paths live underdata/in the repo. - Cloudflare R2: set
DATA_BACKEND=r2and install extras:pip install -e ".[r2]"(from a clone) orpip install "football-data-mcp[r2]"(from PyPI once published). Required environment variables:R2_BUCKET,R2_ENDPOINT_URL,R2_ACCESS_KEY_ID,R2_SECRET_ACCESS_KEY. Object keys mirror local layout (e.g.raw/foo.parquet,unified_player_stats.parquet).
Leagues covered
| League | Seasons |
|---|---|
| England Premier League | 2023-24, 2024-25, 2025-26 |
| England EFL Championship | 2023-24, 2024-25, 2025-26 |
| Spain La Liga | 2023-24, 2024-25, 2025-26 |
| Germany Bundesliga | 2023-24, 2024-25, 2025-26 |
| Italy Serie A | 2023-24, 2024-25, 2025-26 |
| France Ligue 1 | 2023-24, 2024-25, 2025-26 |
| Netherlands Eredivisie | 2023-24, 2024-25, 2025-26 |
| Portugal Primeira Liga | 2023-24, 2024-25, 2025-26 |
| UEFA Champions League | 2023-24, 2024-25, 2025-26 |
| UEFA Europa League | 2023-24, 2024-25, 2025-26 |
Transfermarkt financial data (market value, contract, nationality) covers the 8 domestic leagues for 2024-25 at 99.6% match rate.
Project structure
football-data-mcp/
├── collect_data.py # Compatibility CLI wrapper (runs ``python -m collect_data``)
├── soccer_server.py # Compatibility shim (runs ``python -m soccer_server``)
├── collect_data/ # Pipeline package
│ ├── config.py # League lists, rename maps, seasons
│ ├── storage.py # Paths, StorageBackend, save_raw, CheckpointTracker, freshness
│ ├── backends/ # ``local`` + ``r2`` implementations (``DATA_BACKEND``)
│ ├── helpers.py # Name normalisation, retries, season helpers
│ ├── pipeline.py # ``main()`` CLI (argparse + dispatch)
│ ├── collectors/ # One module per source (fbref, understat, …)
│ └── build/ # ``unified.py`` + ``financials.py`` merge layer
├── soccer_server/ # MCP server package (10 tools, stdio transport)
│ ├── tools.py # Tool implementations
│ ├── registry.py # ``TOOLS`` map (schemas + callables)
│ ├── cache.py # Unified-table cache (optional TTL for hosted use)
│ ├── data_loading.py # Filters + ClubElo / SofaScore helpers
│ ├── transport_stdio.py # JSON-RPC stdin/stdout loop
│ └── __main__.py # ``python -m soccer_server``
├── src/ScraperFC/ # ScraperFC scrapers (upstream: oseymour/ScraperFC)
└── data/
├── unified_player_stats.parquet # Main merged dataset (gitignored)
├── unified_player_stats.csv # Optional export (gitignored)
└── raw/ # Per-source parquet files (gitignored)
Contributing
This project builds on ScraperFC. Bug fixes to the underlying scrapers are contributed back upstream — if you find something broken in a scraper, consider opening an issue or PR there too.
For issues specific to the pipeline (collect_data package / collect-data / collect_data.py) or the MCP server (soccer_server package / soccer-mcp / python -m soccer_server), open an issue here.
Credits
- ScraperFC by Owen Seymour — the foundation this project builds on
- Data sources: FBref, SofaScore, Understat, Transfermarkt
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file football_data_mcp-0.1.0.tar.gz.
File metadata
- Download URL: football_data_mcp-0.1.0.tar.gz
- Upload date:
- Size: 127.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d987ec33e3849f1e72d592a55d1d0ae728260adcbc078736089ab76b19eeaf7
|
|
| MD5 |
ab9fde76d0cb60e7018ef7ad18f8ac26
|
|
| BLAKE2b-256 |
ae557c8d7c26301b46e9db2493fd1242a72240f4deb217edf6033f38e9ea587a
|
Provenance
The following attestation bundles were made for football_data_mcp-0.1.0.tar.gz:
Publisher:
publish.yml on kupsas/football-data-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
football_data_mcp-0.1.0.tar.gz -
Subject digest:
0d987ec33e3849f1e72d592a55d1d0ae728260adcbc078736089ab76b19eeaf7 - Sigstore transparency entry: 1567684301
- Sigstore integration time:
-
Permalink:
kupsas/football-data-mcp@92a8776bd01566dfb8d3dd964b80f28d310c280d -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/kupsas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@92a8776bd01566dfb8d3dd964b80f28d310c280d -
Trigger Event:
push
-
Statement type:
File details
Details for the file football_data_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: football_data_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 123.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
379eb98dc754e797ebe2a8523b1497d0422ddaed4c85561715544c3f860c73ae
|
|
| MD5 |
1006a74fe6b44678e77627bd6222636e
|
|
| BLAKE2b-256 |
a401b9a02df004e164ac9ca1e44f8b6964789c38f58844d8f60d9f7d50dff9f8
|
Provenance
The following attestation bundles were made for football_data_mcp-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on kupsas/football-data-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
football_data_mcp-0.1.0-py3-none-any.whl -
Subject digest:
379eb98dc754e797ebe2a8523b1497d0422ddaed4c85561715544c3f860c73ae - Sigstore transparency entry: 1567684392
- Sigstore integration time:
-
Permalink:
kupsas/football-data-mcp@92a8776bd01566dfb8d3dd964b80f28d310c280d -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/kupsas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@92a8776bd01566dfb8d3dd964b80f28d310c280d -
Trigger Event:
push
-
Statement type: