Skip to main content

An advanced, open-source cricket intelligence SDK powered by DuckDB, PyArrow, and FastAPI for high-performance analytics.

Project description

Midwicket

Midwicket

Cricket Data Infrastructure

20,888+ Matches  ·  9,148,005+ Deliveries  ·  25+ Years of Coverage

Open In Colab   PyPI   CI   Python   MIT

Ball-by-ball cricket analytics. Local. Fast. No cloud required.


Why Midwicket?

These are real findings — generated in seconds from the IPL corpus using Midwicket's query engine.

Finding How Midwicket got there
Virat Kohli's 2026 IPL season (155.6 SR) is his fastest ever — at age 37 Season-by-season strike rate over 19 consecutive IPL seasons
Sixes per match nearly doubled — 10.7 (2008) → 19.3 (2026) — while dot ball % fell 5.4 points 1,239 matches, 18-season trend decomposition
Vaibhav Suryavanshi: 211 SR in Powerplay — the highest ever recorded in IPL 272 powerplay balls, 51 sixes, 35.7% dot rate
85% of IPL batters perform better when chasing than when setting a target SR uplift (2nd innings – 1st innings) for 200+ batters
DW Steyn's 44.65% dot rate at 6.79 economy would be structurally impossible in 2024 IPL Era-segmented dot ball analysis across all 141 IPL death bowlers

See all 10 showcase analyses →


Quick Start

30 seconds. No data download. No account.

pip install midwicket
import midwicket.express as px

# Win probability — works instantly, bundled in-memory data
result = px.predict_win(
    venue="Wankhede Stadium",
    target=180,
    current_score=120,
    wickets_down=5,
    overs_done=15.0,
)
print(f"Win probability: {result['win_prob']:.1%}")
# Win probability: 22.5%

That's it. The model runs locally, no API key, no download.

Open in Colab → — zero-install, browser-based.


Loading Datasets

Midwicket connects to Cricsheet and manages download, extraction, and ingestion automatically.

from midwicket.datasets import load_dataset

# IPL — 1,100+ matches, 2008–present (~50 MB, downloads once)
session = load_dataset("ipl")

# Big Bash League
session = load_dataset("bbl")

# Everything — all formats, all genders, 25+ years
session = load_dataset("all")

Available datasets:

Key Competition Est. Matches
"ipl" Indian Premier League 1,100+
"t20s" T20 Internationals (M + W) 3,200+
"bbl" Big Bash League 650+
"psl" Pakistan Super League 350+
"cpl" Caribbean Premier League 380+
"wbbl" Women's Big Bash League 550+
"odis" One Day Internationals 2,400+
"tests" Test Matches 700+
"all_t20" All T20 globally 8,500+
"all" Complete Cricsheet corpus 16,000+

Once loaded, a session gives you a thread-safe DuckDB engine over ball-by-ball events. Query anything:

df = session.engine.execute_sql("""
    SELECT batter,
           SUM(runs_batter) AS runs,
           ROUND(SUM(runs_batter) * 100.0 / COUNT(*), 1) AS strike_rate,
           COUNT(DISTINCT match_id) AS matches
    FROM ball_events
    WHERE over >= 15              -- death overs only
    GROUP BY batter
    HAVING COUNT(*) >= 100
    ORDER BY runs DESC LIMIT 10
""").to_pandas()

Feature Store

Six production-grade metrics, computed from ball-by-ball data:

from midwicket.features import (
    build_pressure_index,
    build_bowler_quality_rating,
    build_match_context_score,
    build_venue_bias_rating,
    build_batter_intent_score,
    build_expected_runs,
)

# Pressure Index — situational leverage per delivery
pi = build_pressure_index(session)
# Returns DataFrame: match_id, inning, over, ball, batter_id, bowler_id, pressure_index

# Bowler Quality Rating — dot balls + wicket rate combined
bqr = build_bowler_quality_rating(session)
# Returns DataFrame: bowler_id, total_balls, dot_balls, wickets, bowler_quality_rating

# Venue Bias Rating — batter-friendly vs bowler-friendly grounds
vbr = build_venue_bias_rating(session)
# VBR > 1.0 = batter-friendly  |  VBR < 1.0 = bowler-friendly
# Venues with < 5 matches default to VBR = 1.0 (stabilised)

# Match Context Score — chase pressure at any moment in the 2nd innings
mcs = build_match_context_score(session)

All features support date filtering — analyse any historical window without leakage:

bqr_2023 = build_bowler_quality_rating(session, start_date="2023-01-01", end_date="2023-12-31")

Scouting Reports

import midwicket as md

session = md.init("./data")          # point at your local dataset
report = md.scouting_report("Virat Kohli")

print(report["role"])                # "Batter"
print(report["strengths"])           # ["Powerplay anchor", "Middle-over accelerator", ...]
print(report["phase_batting"])       # {"Powerplay": {...}, "Middle": {...}, "Death": {...}}
print(report["venue_performance"])   # per-venue batting average and SR
print(report["recent_form"])         # last-N-matches rolling stats

The scouting report resolves name aliases automatically"V Kohli", "Virat Kohli", "kohli" all resolve to the same entity across 17+ seasons.


Showcase Gallery

Ten analyses built on real IPL data. Click any image to see the full walkthrough.

All-Time Run Leaders

Run Leaders

Kohli's 9,228 runs lead by 1,897. Bars coloured by strike rate — greener hits faster.

IPL Scoring: 18 Years of Evolution

Season Trends

Avg 1st innings: 161 (2008) → 192 (2026). Sixes per match nearly doubled.

Venue Scoring Atlas (76 grounds)

Venue Atlas

VBR 0.848 → 1.253 across IPL grounds. 40% swing in expected scoring.

Death Over Bowler Landscape

Death Bowlers

141 bowlers, economy vs wicket rate. Bumrah: #11, economy 8.07.

Chase Specialists

Chase Scatter

85%+ of batters hit harder when chasing. Pat Cummins: +46 SR points.

Powerplay Kings

Powerplay Kings

Suryavanshi: 211 SR — the highest ever recorded in IPL powerplay.

View all 10 showcases with charts, queries, and walkthroughs →


Five-Minute Tutorial

New to Midwicket? docs/getting_started.md takes you from install to first insight in under 5 minutes — using real data, real outputs.


Examples

The examples/ directory contains 36 runnable scripts organised by complexity:

Scripts Topic
0105 Session setup, data ingest, player lookup, venue stats
0615 Win prediction, fantasy points, SQL queries, season filters
1627 Leaderboards, partnerships, consistency, full pipeline demos
2836 Express API, config, debug, full library tour
showcase_0125 Deep-dive analyses with charts and findings
portfolio/ 14 player and team scouting studies

Start here: examples/28_express_quickstart.py


Architecture

Midwicket separates concerns across five layers. Data flows from raw JSON through a typed ingestion pipeline into a DuckDB analytical store, with a query planner routing between live scans and pre-built feature tables.

Cricsheet JSON
      │
      ▼
┌─────────────────────┐
│  Canonicaliser      │  Strict V1 Arrow schema · retirement fix · int32 upcasting
│  (core/canonicalize)│  Deterministic match_id · venue alias resolution
└──────────┬──────────┘
           │ PyArrow Table
           ▼
┌─────────────────────┐
│  Identity Registry  │  Player / venue / team aliases across 25+ years
│  (storage/registry) │  Temporal-safe: resolves names at match date, not today
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  DuckDB Engine      │  Thread-safe · snapshot management · temporal filtering
│  (storage/engine)   │  ball_events: 9M+ rows · sub-second aggregations
└──────────┬──────────┘
           │
      ┌────┴────┐
      ▼         ▼
 Feature     Express
  Store        API
(features.py) (express.py)
      │         │
      └────┬────┘
           ▼
   FastAPI + Prometheus
   (midwicket/serve/)

Data integrity guarantees:

  • Schema version-locked (BALL_EVENT_SCHEMA v1.0.0) — breaking changes are explicit
  • over stored as int16, runs as int32 — no silent overflow on aggregation
  • Retirements classified correctly: RETIRED_HURT/RETIRED_NOT_OUTis_wicket=False
  • Temporal filters are leak-proof — verified against 4 cutoff dates, 0 leaked rows

Enterprise Deployment

git clone https://github.com/CodersAcademy006/Midwicket.git && cd Midwicket
cp .env.example .env           # set MIDWICKET_SECRET_KEY, MIDWICKET_API_KEYS
docker-compose up -d           # FastAPI + Prometheus + Grafana

The FastAPI service exposes REST endpoints for win probability, player stats, matchups, and scouting reports. Prometheus scrape config and a Grafana dashboard definition are included.


Contributing

Contributions welcome. Areas where help is most needed:

  • Additional competition datasets (WBBL scouting, CPL analysis)
  • Jupyter notebook tutorials for the showcase analyses
  • Performance benchmarks across dataset sizes
  • Documentation translations

Read CONTRIBUTING.md before submitting a PR.


MIT License · Built on Cricsheet data · Powered by DuckDB + PyArrow

Getting Started · Showcase Gallery · API Reference · Changelog

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

midwicket-1.1.0.tar.gz (5.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

midwicket-1.1.0-py3-none-any.whl (5.1 MB view details)

Uploaded Python 3

File details

Details for the file midwicket-1.1.0.tar.gz.

File metadata

  • Download URL: midwicket-1.1.0.tar.gz
  • Upload date:
  • Size: 5.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for midwicket-1.1.0.tar.gz
Algorithm Hash digest
SHA256 68e9459ffe85b3e41e6d2f50b5541ea0018fe2099ef9c7f2a0c85d8c38d6e01f
MD5 102d40d5e71a4e5b65d8ad434ccde18e
BLAKE2b-256 9424859462dacd770b851b919f1770c1370141899026ea8e3cf01ded35dd56c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for midwicket-1.1.0.tar.gz:

Publisher: publish.yml on CodersAcademy006/Midwicket

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file midwicket-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: midwicket-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for midwicket-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5e29cef35ef8719a197fdb72e48ca9dfc9b13ee35cc5c13953c18062babf9522
MD5 1c572b398df30ffcdef893f7b47dd62d
BLAKE2b-256 4f9ff40fc9f9e0dea380f5436648b48c86c03475a386092dfb06e33a33914943

See more details on using hashes here.

Provenance

The following attestation bundles were made for midwicket-1.1.0-py3-none-any.whl:

Publisher: publish.yml on CodersAcademy006/Midwicket

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page