Skip to main content

An advanced, open-source cricket intelligence SDK powered by DuckDB, PyArrow, and FastAPI for high-performance analytics.

Project description

Midwicket Logo

Midwicket

The Open-Source Cricket Intelligence SDK

Open In Colab PyPI version Build Status Python Versions

Fast, deterministic cricket analytics powered by PyArrow and DuckDB.


The Problem

Processing unstructured sports telemetry is historically a nightmare. Traditional APIs are slow, schemas constantly break, and calculating complex metrics like "venue bias" or "live win probability" across millions of events requires expensive cloud data warehouses.

The Midwicket Solution

Midwicket brings the data warehouse to your laptop. It is a high-performance cricket intelligence SDK built on a structured pipeline architecture: a query planner routes requests between the PyArrow in-memory layer and a materialized DuckDB cache, keeping aggregations fast without cloud costs.

By leveraging vectorized PyArrow operations and an embedded DuckDB engine, Midwicket processes over 10 years of play-by-play data locally.

Key Capabilities

  • Fast Local Queries: PyArrow and DuckDB power sub-second aggregations on cached, materialized views. Raw event scans are available for arbitrary flexibility.
  • Pipeline Architecture: Specialized components (Executor, Planner, Storage Engine, Registry) isolate concerns and route queries along the most efficient path.
  • Predictive Machine Learning: Logistic regression win probability model trained on IPL data (AUC 0.843), running entirely in memory with no external call.
  • Type-Safe & Deterministic: Immutable V1 schemas enforced via Pydantic. Queries are hashed and cached; identical inputs always produce identical outputs.
  • FastAPI Backend: Production-ready REST API with auth, rate limiting, CORS, and Prometheus metrics.

Architecture

The Midwicket engine separates concerns across a structured pipeline: incoming data flows from Cricsheet JSON through a PyArrow ingestion layer into a DuckDB cache, where a query planner decides whether to scan raw events or serve a pre-computed view.

graph LR
    A[Cricsheet JSON] -->|Ingestion| B(PyArrow Pipeline)
    B -->|Parquet| C{DuckDB Cache}
    C -->|SQL Queries| D[Query Planner]
    D -->|Express API| E[Jupyter / Colab]
    D -->|FastAPI| F[Web / Mobile Clients]

Quick Start

Try it instantly in your browser — no install required:

Open In Colab


Step 1 — Install

pip install midwicket

Step 2 — Run a prediction (no data download needed)

The win probability model runs entirely in memory. No dataset, no waiting.

import midwicket.express as px

result = px.predict_win(
    venue="Wankhede Stadium",
    target=180,
    current_score=120,
    wickets_down=5,
    overs_done=15.0,
)
print(f"Win Probability: {result['win_prob']:.1%}")
# Win Probability: 22.5%

The result also includes a confidence field — a heuristic certainty indicator (0.1–0.95) that reflects how extreme the prediction is and how much situational information is available. It is not a statistical confidence interval; treat it as a qualitative signal.


Step 3 — Query player stats and head-to-head matchups

Midwicket ships with a bundled in-memory dataset. Player stats and matchups work out of the box — no download needed:

import midwicket.express as px

stats = px.get_player_stats("Virat Kohli")
print(f"Player: {stats.name} | Runs: {stats.runs} | Strike Rate: {stats.strike_rate}")

matchup = px.get_matchup("V Kohli", "JJ Bumrah")
print(f"Head-to-head | Matches: {matchup.matches} | Average: {matchup.average:.1f}")

How the data layer works:

  1. Bundled data (default): The in-memory ZIP ships with the package. Stats and matchups read from it automatically with no setup.
  2. Download full history (optional): For 10+ years of ball-by-ball IPL data (~50 MB), run this once and it persists to disk:
    px.download_data()          # downloads to ./data by default
    # px.download_data("~/cricket-data")  # or a custom path
    
  3. Registry: Player resolution and matchup stats are indexed in an in-memory IdentityRegistry built from the loaded data. If a player name isn't found, get_player_stats raises EntityNotFoundError with the missing name.

Enterprise Deployment

Midwicket includes a FastAPI backend, Prometheus scrape config, and a Grafana dashboard definition. The observability stack is provisioned via Docker Compose.

Status: The FastAPI service and Prometheus integration are production-ready. The Grafana dashboard is provided as a starting point and may need metric name adjustments to match your environment.

# Clone the repository
git clone https://github.com/CodersAcademy006/Midwicket.git
cd Midwicket

# Configure environment variables
cp .env.example .env
# Edit .env: set MIDWICKET_SECRET_KEY, MIDWICKET_API_KEYS, GRAFANA_PASSWORD

# Start the FastAPI server + Prometheus + Grafana
docker-compose up -d

Examples

The examples/ directory contains 36 runnable scripts covering the full SDK:

Range Topic
0103 Setup, basic session, data ingest
03b08 Player lookup, venue stats, win prediction
0920 Fantasy points, raw SQL, season filters, leaderboards
2127 Partnership stats, consistency, reports, pipelines
2836 Express API, config, full library tour

Browse examples/ or start with 28_express_quickstart.py.


Contributing

Contributions are highly encouraged! We are actively looking for help with:

  • Expanding the built-in machine learning models.
  • Optimizing DuckDB materialized views.
  • Writing tests for the query planner.

Before submitting code, please review the component architecture in Agents.md.

License

Midwicket is open-source software released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

midwicket-1.0.0.tar.gz (5.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

midwicket-1.0.0-py3-none-any.whl (5.1 MB view details)

Uploaded Python 3

File details

Details for the file midwicket-1.0.0.tar.gz.

File metadata

  • Download URL: midwicket-1.0.0.tar.gz
  • Upload date:
  • Size: 5.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for midwicket-1.0.0.tar.gz
Algorithm Hash digest
SHA256 4d9852de286487d5a746a61269585d00563d638c0ca656f17c655e44a8c69239
MD5 8b74ebd90b12878a3fe0df0fd0309488
BLAKE2b-256 2987f216d717d30db604e78d47c1e2327ab7b6c2489cfa023aea7fe2b3630acc

See more details on using hashes here.

Provenance

The following attestation bundles were made for midwicket-1.0.0.tar.gz:

Publisher: publish.yml on CodersAcademy006/Midwicket

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file midwicket-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: midwicket-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for midwicket-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dd7dd2c830aa7650911f6dc62c52067e5389b56b94d0e37a4ae600b95c3021e9
MD5 56c0ed2aee5b152855a4c47c31f57a6c
BLAKE2b-256 2a29ce6fb5c99356af8b9547b3ee4c71b5ee68a3e60a95603e44a446900d1895

See more details on using hashes here.

Provenance

The following attestation bundles were made for midwicket-1.0.0-py3-none-any.whl:

Publisher: publish.yml on CodersAcademy006/Midwicket

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page