An advanced, open-source cricket intelligence SDK powered by DuckDB, PyArrow, and FastAPI for high-performance analytics.
Project description
Midwicket
Cricket Data Infrastructure
20,888+ Matches · 9,148,005+ Deliveries · 25+ Years of Coverage
Ball-by-ball cricket analytics. Local. Fast. No cloud required.
Why Midwicket?
These are real findings — generated in seconds from the IPL corpus using Midwicket's query engine.
| Finding | How Midwicket got there |
|---|---|
| Virat Kohli's 2026 IPL season (155.6 SR) is his fastest ever — at age 37 | Season-by-season strike rate over 19 consecutive IPL seasons |
| Sixes per match nearly doubled — 10.7 (2008) → 19.3 (2026) — while dot ball % fell 5.4 points | 1,239 matches, 18-season trend decomposition |
| Vaibhav Suryavanshi: 211 SR in Powerplay — the highest ever recorded in IPL | 272 powerplay balls, 51 sixes, 35.7% dot rate |
| 85% of IPL batters perform better when chasing than when setting a target | SR uplift (2nd innings – 1st innings) for 200+ batters |
| DW Steyn's 44.65% dot rate at 6.79 economy would be structurally impossible in 2024 IPL | Era-segmented dot ball analysis across all 141 IPL death bowlers |
See all 10 showcase analyses →
Quick Start
30 seconds. No data download. No account.
pip install midwicket
import midwicket.express as px
# Win probability — works instantly, bundled in-memory data
result = px.predict_win(
venue="Wankhede Stadium",
target=180,
current_score=120,
wickets_down=5,
overs_done=15.0,
)
print(f"Win probability: {result['win_prob']:.1%}")
# Win probability: 22.5%
That's it. The model runs locally, no API key, no download.
Open in Colab → — zero-install, browser-based.
Loading Datasets
Midwicket connects to Cricsheet and manages download, extraction, and ingestion automatically.
from midwicket.datasets import load_dataset
# IPL — 1,100+ matches, 2008–present (~50 MB, downloads once)
session = load_dataset("ipl")
# Big Bash League
session = load_dataset("bbl")
# Everything — all formats, all genders, 25+ years
session = load_dataset("all")
Available datasets:
| Key | Competition | Est. Matches |
|---|---|---|
"ipl" |
Indian Premier League | 1,100+ |
"t20s" |
T20 Internationals (M + W) | 3,200+ |
"bbl" |
Big Bash League | 650+ |
"psl" |
Pakistan Super League | 350+ |
"cpl" |
Caribbean Premier League | 380+ |
"wbbl" |
Women's Big Bash League | 550+ |
"odis" |
One Day Internationals | 2,400+ |
"tests" |
Test Matches | 700+ |
"all_t20" |
All T20 globally | 8,500+ |
"all" |
Complete Cricsheet corpus | 16,000+ |
Once loaded, a session gives you a thread-safe DuckDB engine over ball-by-ball events. Query anything:
df = session.engine.execute_sql("""
SELECT batter,
SUM(runs_batter) AS runs,
ROUND(SUM(runs_batter) * 100.0 / COUNT(*), 1) AS strike_rate,
COUNT(DISTINCT match_id) AS matches
FROM ball_events
WHERE over >= 15 -- death overs only
GROUP BY batter
HAVING COUNT(*) >= 100
ORDER BY runs DESC LIMIT 10
""").to_pandas()
Feature Store
Six production-grade metrics, computed from ball-by-ball data:
from midwicket.features import (
build_pressure_index,
build_bowler_quality_rating,
build_match_context_score,
build_venue_bias_rating,
build_batter_intent_score,
build_expected_runs,
)
# Pressure Index — situational leverage per delivery
pi = build_pressure_index(session)
# Returns DataFrame: match_id, inning, over, ball, batter_id, bowler_id, pressure_index
# Bowler Quality Rating — dot balls + wicket rate combined
bqr = build_bowler_quality_rating(session)
# Returns DataFrame: bowler_id, total_balls, dot_balls, wickets, bowler_quality_rating
# Venue Bias Rating — batter-friendly vs bowler-friendly grounds
vbr = build_venue_bias_rating(session)
# VBR > 1.0 = batter-friendly | VBR < 1.0 = bowler-friendly
# Venues with < 5 matches default to VBR = 1.0 (stabilised)
# Match Context Score — chase pressure at any moment in the 2nd innings
mcs = build_match_context_score(session)
All features support date filtering — analyse any historical window without leakage:
bqr_2023 = build_bowler_quality_rating(session, start_date="2023-01-01", end_date="2023-12-31")
Scouting Reports
import midwicket as md
session = md.init("./data") # point at your local dataset
report = md.scouting_report("Virat Kohli")
print(report["role"]) # "Batter"
print(report["strengths"]) # ["Powerplay anchor", "Middle-over accelerator", ...]
print(report["phase_batting"]) # {"Powerplay": {...}, "Middle": {...}, "Death": {...}}
print(report["venue_performance"]) # per-venue batting average and SR
print(report["recent_form"]) # last-N-matches rolling stats
The scouting report resolves name aliases automatically — "V Kohli", "Virat Kohli", "kohli" all resolve to the same entity across 17+ seasons.
Showcase Gallery
Ten analyses built on real IPL data. Click any image to see the full walkthrough.
|
All-Time Run Leaders Kohli's 9,228 runs lead by 1,897. Bars coloured by strike rate — greener hits faster. |
IPL Scoring: 18 Years of Evolution Avg 1st innings: 161 (2008) → 192 (2026). Sixes per match nearly doubled. |
|
Venue Scoring Atlas (76 grounds) VBR 0.848 → 1.253 across IPL grounds. 40% swing in expected scoring. |
Death Over Bowler Landscape 141 bowlers, economy vs wicket rate. Bumrah: #11, economy 8.07. |
|
Chase Specialists 85%+ of batters hit harder when chasing. Pat Cummins: +46 SR points. |
Powerplay Kings Suryavanshi: 211 SR — the highest ever recorded in IPL powerplay. |
View all 10 showcases with charts, queries, and walkthroughs →
Five-Minute Tutorial
New to Midwicket? docs/getting_started.md takes you from install to first insight in under 5 minutes — using real data, real outputs.
Examples
The examples/ directory contains 36 runnable scripts organised by complexity:
| Scripts | Topic |
|---|---|
01–05 |
Session setup, data ingest, player lookup, venue stats |
06–15 |
Win prediction, fantasy points, SQL queries, season filters |
16–27 |
Leaderboards, partnerships, consistency, full pipeline demos |
28–36 |
Express API, config, debug, full library tour |
showcase_01–25 |
Deep-dive analyses with charts and findings |
portfolio/ |
14 player and team scouting studies |
Start here: examples/28_express_quickstart.py
Architecture
Midwicket separates concerns across five layers. Data flows from raw JSON through a typed ingestion pipeline into a DuckDB analytical store, with a query planner routing between live scans and pre-built feature tables.
Cricsheet JSON
│
▼
┌─────────────────────┐
│ Canonicaliser │ Strict V1 Arrow schema · retirement fix · int32 upcasting
│ (core/canonicalize)│ Deterministic match_id · venue alias resolution
└──────────┬──────────┘
│ PyArrow Table
▼
┌─────────────────────┐
│ Identity Registry │ Player / venue / team aliases across 25+ years
│ (storage/registry) │ Temporal-safe: resolves names at match date, not today
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ DuckDB Engine │ Thread-safe · snapshot management · temporal filtering
│ (storage/engine) │ ball_events: 9M+ rows · sub-second aggregations
└──────────┬──────────┘
│
┌────┴────┐
▼ ▼
Feature Express
Store API
(features.py) (express.py)
│ │
└────┬────┘
▼
FastAPI + Prometheus
(midwicket/serve/)
Data integrity guarantees:
- Schema version-locked (
BALL_EVENT_SCHEMAv1.0.0) — breaking changes are explicit overstored asint16,runsasint32— no silent overflow on aggregation- Retirements classified correctly:
RETIRED_HURT/RETIRED_NOT_OUT→is_wicket=False - Temporal filters are leak-proof — verified against 4 cutoff dates, 0 leaked rows
Enterprise Deployment
git clone https://github.com/CodersAcademy006/Midwicket.git && cd Midwicket
cp .env.example .env # set MIDWICKET_SECRET_KEY, MIDWICKET_API_KEYS
docker-compose up -d # FastAPI + Prometheus + Grafana
The FastAPI service exposes REST endpoints for win probability, player stats, matchups, and scouting reports. Prometheus scrape config and a Grafana dashboard definition are included.
Contributing
Contributions welcome. Areas where help is most needed:
- Additional competition datasets (WBBL scouting, CPL analysis)
- Jupyter notebook tutorials for the showcase analyses
- Performance benchmarks across dataset sizes
- Documentation translations
Read CONTRIBUTING.md before submitting a PR.
MIT License · Built on Cricsheet data · Powered by DuckDB + PyArrow
Getting Started · Showcase Gallery · API Reference · Changelog
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file midwicket-1.1.0.tar.gz.
File metadata
- Download URL: midwicket-1.1.0.tar.gz
- Upload date:
- Size: 5.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68e9459ffe85b3e41e6d2f50b5541ea0018fe2099ef9c7f2a0c85d8c38d6e01f
|
|
| MD5 |
102d40d5e71a4e5b65d8ad434ccde18e
|
|
| BLAKE2b-256 |
9424859462dacd770b851b919f1770c1370141899026ea8e3cf01ded35dd56c9
|
Provenance
The following attestation bundles were made for midwicket-1.1.0.tar.gz:
Publisher:
publish.yml on CodersAcademy006/Midwicket
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
midwicket-1.1.0.tar.gz -
Subject digest:
68e9459ffe85b3e41e6d2f50b5541ea0018fe2099ef9c7f2a0c85d8c38d6e01f - Sigstore transparency entry: 1677269308
- Sigstore integration time:
-
Permalink:
CodersAcademy006/Midwicket@a28a52720323729984649a13016a977708606e10 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/CodersAcademy006
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a28a52720323729984649a13016a977708606e10 -
Trigger Event:
release
-
Statement type:
File details
Details for the file midwicket-1.1.0-py3-none-any.whl.
File metadata
- Download URL: midwicket-1.1.0-py3-none-any.whl
- Upload date:
- Size: 5.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e29cef35ef8719a197fdb72e48ca9dfc9b13ee35cc5c13953c18062babf9522
|
|
| MD5 |
1c572b398df30ffcdef893f7b47dd62d
|
|
| BLAKE2b-256 |
4f9ff40fc9f9e0dea380f5436648b48c86c03475a386092dfb06e33a33914943
|
Provenance
The following attestation bundles were made for midwicket-1.1.0-py3-none-any.whl:
Publisher:
publish.yml on CodersAcademy006/Midwicket
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
midwicket-1.1.0-py3-none-any.whl -
Subject digest:
5e29cef35ef8719a197fdb72e48ca9dfc9b13ee35cc5c13953c18062babf9522 - Sigstore transparency entry: 1677269342
- Sigstore integration time:
-
Permalink:
CodersAcademy006/Midwicket@a28a52720323729984649a13016a977708606e10 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/CodersAcademy006
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a28a52720323729984649a13016a977708606e10 -
Trigger Event:
release
-
Statement type: