Skip to main content

An advanced, open-source cricket intelligence SDK powered by DuckDB, PyArrow, and FastAPI for high-performance analytics.

Project description

Midwicket Logo

Midwicket

The Open-Source Cricket Intelligence SDK

Open In Colab PyPI version Build Status Python Versions

Fast, deterministic cricket analytics powered by PyArrow and DuckDB.


The Problem

Processing unstructured sports telemetry is historically a nightmare. Traditional APIs are slow, schemas constantly break, and calculating complex metrics like "venue bias" or "live win probability" across millions of events requires expensive cloud data warehouses.

The Midwicket Solution

Midwicket brings the data warehouse to your laptop. It is a high-performance cricket intelligence SDK built on a structured pipeline architecture: a query planner routes requests between the PyArrow in-memory layer and a materialized DuckDB cache, keeping aggregations fast without cloud costs.

By leveraging vectorized PyArrow operations and an embedded DuckDB engine, Midwicket processes over 10 years of play-by-play data locally.

Key Capabilities

  • Fast Local Queries: PyArrow and DuckDB power sub-second aggregations on cached, materialized views. Raw event scans are available for arbitrary flexibility.
  • Pipeline Architecture: Specialized components (Executor, Planner, Storage Engine, Registry) isolate concerns and route queries along the most efficient path.
  • Predictive Machine Learning: Logistic regression win probability model trained on IPL data (AUC 0.843), running entirely in memory with no external call.
  • Type-Safe & Deterministic: Immutable V1 schemas enforced via Pydantic. Queries are hashed and cached; identical inputs always produce identical outputs.
  • FastAPI Backend: Production-ready REST API with auth, rate limiting, CORS, and Prometheus metrics.

Architecture

The Midwicket engine separates concerns across a structured pipeline: incoming data flows from Cricsheet JSON through a PyArrow ingestion layer into a DuckDB cache, where a query planner decides whether to scan raw events or serve a pre-computed view.

graph LR
    A[Cricsheet JSON] -->|Ingestion| B(PyArrow Pipeline)
    B -->|Parquet| C{DuckDB Cache}
    C -->|SQL Queries| D[Query Planner]
    D -->|Express API| E[Jupyter / Colab]
    D -->|FastAPI| F[Web / Mobile Clients]

Quick Start

Try it instantly in your browser — no install required:

Open In Colab


Step 1 — Install

pip install midwicket

Step 2 — Run a prediction (no data download needed)

The win probability model runs entirely in memory. No dataset, no waiting.

import midwicket.express as px

result = px.predict_win(
    venue="Wankhede Stadium",
    target=180,
    current_score=120,
    wickets_down=5,
    overs_done=15.0,
)
print(f"Win Probability: {result['win_prob']:.1%}")
# Win Probability: 22.5%

The result also includes a confidence field — a heuristic certainty indicator (0.1–0.95) that reflects how extreme the prediction is and how much situational information is available. It is not a statistical confidence interval; treat it as a qualitative signal.


Step 3 — Query player stats and head-to-head matchups

Midwicket ships with a bundled in-memory dataset. Player stats and matchups work out of the box — no download needed:

import midwicket.express as px

stats = px.get_player_stats("Virat Kohli")
print(f"Player: {stats.name} | Runs: {stats.runs} | Strike Rate: {stats.strike_rate}")

matchup = px.get_matchup("V Kohli", "JJ Bumrah")
print(f"Head-to-head | Matches: {matchup.matches} | Average: {matchup.average:.1f}")

How the data layer works:

  1. Bundled data (default): The in-memory ZIP ships with the package. Stats and matchups read from it automatically with no setup.
  2. Download full history (optional): For 10+ years of ball-by-ball IPL data (~50 MB), run this once and it persists to disk:
    px.download_data()          # downloads to ./data by default
    # px.download_data("~/cricket-data")  # or a custom path
    
  3. Registry: Player resolution and matchup stats are indexed in an in-memory IdentityRegistry built from the loaded data. If a player name isn't found, get_player_stats raises EntityNotFoundError with the missing name.

Enterprise Deployment

Midwicket includes a FastAPI backend, Prometheus scrape config, and a Grafana dashboard definition. The observability stack is provisioned via Docker Compose.

Status: The FastAPI service and Prometheus integration are production-ready. The Grafana dashboard is provided as a starting point and may need metric name adjustments to match your environment.

# Clone the repository
git clone https://github.com/CodersAcademy006/Midwicket.git
cd Midwicket

# Configure environment variables
cp .env.example .env
# Edit .env: set MIDWICKET_SECRET_KEY, MIDWICKET_API_KEYS, GRAFANA_PASSWORD

# Start the FastAPI server + Prometheus + Grafana
docker-compose up -d

Examples

The examples/ directory contains 36 runnable scripts covering the full SDK:

Range Topic
0103 Setup, basic session, data ingest
03b08 Player lookup, venue stats, win prediction
0920 Fantasy points, raw SQL, season filters, leaderboards
2127 Partnership stats, consistency, reports, pipelines
2836 Express API, config, full library tour

Browse examples/ or start with 28_express_quickstart.py.


Contributing

Contributions are highly encouraged! We are actively looking for help with:

  • Expanding the built-in machine learning models.
  • Optimizing DuckDB materialized views.
  • Writing tests for the query planner.

Before submitting code, please review the component architecture in Agents.md.

License

Midwicket is open-source software released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

midwicket-0.1.2.tar.gz (5.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

midwicket-0.1.2-py3-none-any.whl (5.1 MB view details)

Uploaded Python 3

File details

Details for the file midwicket-0.1.2.tar.gz.

File metadata

  • Download URL: midwicket-0.1.2.tar.gz
  • Upload date:
  • Size: 5.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for midwicket-0.1.2.tar.gz
Algorithm Hash digest
SHA256 fd92ce91fb6564408cb6dd5b277b4dafe8b436e4b78a4fa6f415bff0680d35ed
MD5 8d1b5b9ff73ff73d33a85dd9f9db0a5e
BLAKE2b-256 7506cbf066f3bde4f4178c7f28c77d1dd08a22b1bc8090ec1a074aa5541bbfa6

See more details on using hashes here.

Provenance

The following attestation bundles were made for midwicket-0.1.2.tar.gz:

Publisher: publish.yml on CodersAcademy006/Midwicket

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file midwicket-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: midwicket-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 5.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for midwicket-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 adb1210f4617e4f55982558b3687f18acc5f6d2b3ad2219df7c6b8ab1a3b77c7
MD5 1a63c27b0bfbff46342bc4f17f6811b0
BLAKE2b-256 bf91bdee0de8173cb6845f73a6ac0cbdfb215dd485200c6b074eeff1d04c8238

See more details on using hashes here.

Provenance

The following attestation bundles were made for midwicket-0.1.2-py3-none-any.whl:

Publisher: publish.yml on CodersAcademy006/Midwicket

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page