Fuzzy matcher for sports team names across data feeds. Jaccard + Containment + kickoff-time bonus.

These details have not been verified by PyPI

Project links

Project description

team-matcher

Fuzzy matcher for sports team names across data feeds. Pure Python, zero dependencies, well-tested.

If you've ever joined data from two sports providers, you've hit this:

Feed A	Feed B
Man Utd	Manchester United FC
Real Madrid CF	Real Madrid
Hearts	Heart of Midlothian
Bayern München	FC Bayern Munich
LDU	Liga Dep. Universitaria

Naive == fails. difflib is fragile (Manchester United vs Manchester City are 84% similar). This library uses a Jaccard + Containment hybrid with stop-word filtering, plus an optional kickoff-time proximity bonus when matching whole fixtures, so cross-feed name variation, abbreviations, and inconsistent league naming all work out of the box.

⚙️ Used in production at scorecast.info to link millions of football fixtures across data sources.

Install

pip install team-matcher

Requires Python 3.9+.

Quick start

1. Compare two team names

from team_matcher import similarity

similarity("Manchester United", "Man Utd")        # 1.0
similarity("Manchester United", "Manchester City") # 0.5
similarity("Liverpool", "Chelsea")                 # 0.0

2. Match a fixture against candidates

from datetime import datetime
from team_matcher import Candidate, match_fixture

kickoff = datetime(2026, 4, 27, 19, 45)

candidates = [
    Candidate("Manchester United FC", "Liverpool FC",
              league="Premier League", kickoff=kickoff,
              payload="match_id_123"),
    Candidate("Chelsea", "Arsenal",
              league="Premier League", kickoff=kickoff,
              payload="match_id_124"),
]

match = match_fixture(
    home="Man Utd",
    away="Liverpool",
    league="EPL",
    kickoff=kickoff,
    candidates=candidates,
)

if match:
    print(match.score)              # 1.0
    print(match.candidate.payload)  # "match_id_123"
    print(match.swapped)            # False

3. Inspect ranking

from team_matcher import rank_candidates

for m in rank_candidates("Man Utd", "Liverpool", "EPL",
                         candidates, kickoff=kickoff):
    print(f"{m.score:.3f}  {m.candidate.home} vs {m.candidate.away}")

How it works

Token-based similarity

Each name is normalized (lowercase, strip accents, drop parentheticals like (W) or (Reserves) and age tags like U21), then split on whitespace and punctuation. Stop-words (fc, sc, cf, real, atletico, language particles…) are filtered out. Common variants are aliased (utd → united, man → manchester, münchen → munich).

Two token sets are then compared with a hybrid metric:

sim = 0.4 * jaccard(A, B) + 0.6 * containment(A, B)
containment(A, B) = |A intersect B| / min(|A|, |B|)

Containment makes the metric robust to length asymmetry — Olancho vs Olancho FC collapse to the same single-token set after stop-word filtering and score 1.0.

Pair scoring

A fixture pair (home + away + league) is scored as

score = 0.4 * sim(home_a, home_b)
      + 0.4 * sim(away_a, away_b)
      + 0.2 * sim(league_a, league_b)

The matcher tries both team orderings and picks the higher score, returning a swapped: bool flag.

Kickoff-time bonus (the secret sauce)

League names are wildly inconsistent between feeds (POR D1 vs Portuguese Primeira Liga share zero tokens). When the same fixture appears in two feeds, the kickoff time is the strongest available signal. If both query and candidate have a kickoff, an additional bonus is applied:

time delta	bonus
≤ 30 min	up to +0.20
≤ 90 min	+0.05
> 90 min	0

This single rule typically boosts cross-feed match rate from ~10% to >65% in our benchmarks.

Configuration

You can extend the stop-word set and alias map at runtime:

from team_matcher import add_stop_word, add_token_alias

add_stop_word("clube")
add_token_alias("psg", "paris")

You can also tune the threshold:

match_fixture(..., threshold=0.65)   # default 0.55

The default of 0.55 is calibrated for cross-feed football data; raise it for stricter matching.

What this library is not

❌ Not a database, not a service. It's a 200-line pure-Python module.
❌ Not a name canonicalization dictionary. If your feeds use Hearts and Heart of Midlothian, you'll need a small alias dictionary on top — fuzzy alone can't bridge that gap.
❌ Not specific to football. Tokenization rules are sport-agnostic; replace stop-words for basketball, MMA, etc.

Development

git clone https://github.com/scorecast/team-matcher
cd team-matcher
pip install -e ".[dev]"
pytest
ruff check src tests
mypy src

License

MIT — see LICENSE.

Built and battle-tested at ScoreCast — football odds analytics platform tracking value bets across millions of matches. If this library saves you a few hours, consider giving us a ⭐ on GitHub.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

team_matcher-0.1.0.tar.gz (9.5 kB view details)

Uploaded Apr 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

team_matcher-0.1.0-py3-none-any.whl (9.8 kB view details)

Uploaded Apr 29, 2026 Python 3

File details

Details for the file team_matcher-0.1.0.tar.gz.

File metadata

Download URL: team_matcher-0.1.0.tar.gz
Upload date: Apr 29, 2026
Size: 9.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for team_matcher-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6d1fa70e18b94948af23798e6d92eeeac4de3d7d9d26c7bef8d3cc4750391587`
MD5	`0a266f10a6111aaa1db22ada798a5290`
BLAKE2b-256	`9f1db705adf1fc53d3d2d44055a3a3ffd8b0fc7d4332260b8a49f5fc09cf0d77`

See more details on using hashes here.

Provenance

The following attestation bundles were made for team_matcher-0.1.0.tar.gz:

Publisher: publish.yml on scorecast-software/team-matcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: team_matcher-0.1.0.tar.gz
- Subject digest: 6d1fa70e18b94948af23798e6d92eeeac4de3d7d9d26c7bef8d3cc4750391587
- Sigstore transparency entry: 1399314213
- Sigstore integration time: Apr 29, 2026
Source repository:
- Permalink: scorecast-software/team-matcher@3bc21cc2a527369c74cef92266d39cfdb909ec61
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/scorecast-software
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3bc21cc2a527369c74cef92266d39cfdb909ec61
- Trigger Event: release

File details

Details for the file team_matcher-0.1.0-py3-none-any.whl.

File metadata

Download URL: team_matcher-0.1.0-py3-none-any.whl
Upload date: Apr 29, 2026
Size: 9.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for team_matcher-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`53d46948d4c2b8445bea5df88dffe0ec5216578066122795aa97e552c949335d`
MD5	`e6c103898590a3fa1ae7de6bbd69f84e`
BLAKE2b-256	`b18c97b7488781b83558629a394ff334fff4ea97d6ab5905ecebf8d6670cc3a9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for team_matcher-0.1.0-py3-none-any.whl:

Publisher: publish.yml on scorecast-software/team-matcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: team_matcher-0.1.0-py3-none-any.whl
- Subject digest: 53d46948d4c2b8445bea5df88dffe0ec5216578066122795aa97e552c949335d
- Sigstore transparency entry: 1399314234
- Sigstore integration time: Apr 29, 2026
Source repository:
- Permalink: scorecast-software/team-matcher@3bc21cc2a527369c74cef92266d39cfdb909ec61
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/scorecast-software
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3bc21cc2a527369c74cef92266d39cfdb909ec61
- Trigger Event: release

team-matcher 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

team-matcher

Install

Quick start

1. Compare two team names

2. Match a fixture against candidates

3. Inspect ranking

How it works

Token-based similarity

Pair scoring

Kickoff-time bonus (the secret sauce)

Configuration

What this library is not

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance