The Open Source Cricket Intelligence SDK
Project description
Midwicket
The Open Source Cricket Intelligence SDK
Midwicket is a comprehensive Python library for cricket analytics, providing a robust, agent-based architecture for querying, processing, and analyzing cricket data. Built on top of PyArrow, DuckDB, and Pydantic, it offers deterministic, cacheable queries with strict schema enforcement.
Table of Contents
- Introduction
- Features
- Installation
- Quick Start
- Usage
- Examples
- Architecture Overview
- Data Sources
- Documentation
- Performance
- Stability & Compatibility
- Contributing
- License
- Support
- Roadmap
Introduction
Midwicket is a powerful cricket analytics SDK designed for developers, data scientists, and cricket enthusiasts. It provides a complete solution for ingesting, querying, and analyzing cricket data with a focus on performance, reliability, and ease of use.
The library leverages modern data engineering tools (PyArrow, DuckDB) and architectural patterns (agent-based design, deterministic queries) to deliver a professional-grade analytics platform. Whether you're building a fantasy cricket application, conducting statistical research, or creating interactive dashboards, Midwicket provides the tools you need.
Target Audience
- Data Scientists: Perform advanced cricket analytics with Python
- Application Developers: Build cricket-related applications with robust APIs
- Cricket Enthusiasts: Explore and analyze cricket data programmatically
- Researchers: Conduct statistical analysis on cricket matches
Features
- Agent-Based Architecture: Specialized agents (Gatekeeper, Planner, Archivist, Identity Manager, Analyst) handle different aspects of data processing
- Deterministic Queries: All queries are hashed for reproducible results and caching
- Schema V1 Contract: Immutable data schema with evolution rules for backward compatibility
- High Performance: Vectorized operations using PyArrow and analytical queries via DuckDB
- Time-Aware Identity: Consistent player/team/venue resolution across historical data
- Express API: One-liner access to common operations with sensible defaults
- Cricket Data Integration: Works with Cricsheet.org IPL dataset (download required on first run)
- Win Probability Model: ML-powered match outcome predictions
- Rich Visualizations: Charts, reports, and interactive dashboards
Installation
Prerequisites
- Python 3.9 or higher
- pip package manager
Install from PyPI
pip install midwicket
Install from Source
For development or to get the latest features:
git clone https://github.com/CodersAcademy006/Midwicket.git
cd Midwicket
pip install -r requirements.txt
pip install -e .
Verify Installation
import midwicket as mw
print(mw.__version__) # Should print: 0.1.0
Legacy compatibility is preserved:
import midwicket as md # still supported
Deployment
Docker Deployment
Midwicket includes production-ready Docker configuration for easy deployment:
Quick Start with Docker
# Clone the repository
git clone https://github.com/CodersAcademy006/Midwicket.git
cd Midwicket
# Copy environment configuration
cp .env.example .env
# Edit .env with your production values
# Start all services
docker-compose up -d
# Check health (uses unauthenticated internal probe)
curl http://localhost:8000/_internal/health
Services Included
- Midwicket API: FastAPI-based REST API (
http://localhost:8000) - DuckDB (embedded): Analytics and metadata storage inside the API service
- Prometheus: Metrics collection (
http://localhost:9090) - Grafana: Monitoring dashboards (
http://localhost:3000)
API Endpoints
GET /health - Health check
GET /v1/metrics - System and API metrics
GET /matches - List matches
POST /analyze - Custom analysis
GET /win_probability - Win probability predictions
Authentication
Include your API key in requests:
curl -H "X-API-Key: your-api-key" http://localhost:8000/health
Rate Limiting
- 60 requests per minute per API key/IP
- Rate limit headers included in responses
- 429 status code when exceeded
Manual Deployment
For custom deployment scenarios:
# Install dependencies
pip install -r requirements.txt
pip install 'midwicket[serve]'
# Set environment variables (all use MIDWICKET_ prefix)
export MIDWICKET_SECRET_KEY="your-secret-key-at-least-32-chars"
export MIDWICKET_API_KEY_REQUIRED="true"
export MIDWICKET_API_KEYS="your-api-key-here"
export MIDWICKET_CORS_ORIGINS="https://your-frontend.example.com"
export MIDWICKET_ALLOWED_HOSTS="your-domain.example.com,localhost"
# Run the API
python -c "from midwicket import serve; serve()"
Quick Start
Midwicket uses live cricket data from Cricsheet.org. On first run, download the IPL dataset (~50 MB, one-time):
Step 1 — Download Data
from midwicket.data.loader import DataLoader
loader = DataLoader()
loader.download() # Downloads from cricsheet.org — run once
Step 2 — Use the Library
import midwicket.express as px
# Get player statistics (requires data downloaded above)
stats = px.get_player_stats("Virat Kohli")
if stats:
print(f"{stats.name}: {stats.runs} runs in {stats.matches} matches")
# Predict win probability (no data required)
from midwicket.compute.winprob import win_probability
prob = win_probability(target=180, current_runs=120, wickets_down=5, overs_done=15.0)
print(f"Win probability: {prob['win_prob']:.1%}")
Full Setup with Custom Data
For production use or custom datasets:
pip install midwicket
import midwicket as md
# Initialize session with data directory
session = md.api.session.MidwicketSession("./data")
# Download sample data (IPL 2023)
from midwicket.data.loader import DataLoader
loader = DataLoader("./data")
loader.download()
# Analyze player performance
stats = session.get_player_stats("V Kohli")
print(f"Player: {stats.name}")
print(f"Matches: {stats.matches}, Runs: {stats.runs}")
Usage
API Overview
Midwicket provides multiple API levels for different use cases:
Express API (midwicket.express)
- Best for: Quick analysis, prototyping, beginners
- Features: One-liner functions, automatic setup, sensible defaults
- Example:
px.get_player_stats("V Kohli")
Core API (midwicket.api)
- Best for: Production applications, custom workflows
- Features: Full control, session management, advanced features
- Example:
MidwicketSession("./data").get_player_stats("V Kohli")
Direct Engine Access (midwicket.storage, midwicket.compute)
- Best for: Custom analytics, high-performance computing
- Features: Raw data access, custom queries, plugin system
Key Capabilities
Player Analytics
# Career statistics
stats = px.get_player_stats("Steve Smith")
# Head-to-head matchups
matchup = px.get_matchup("V Kohli", "JJ Bumrah")
# Fantasy cheat sheet for a venue
cheat = md.fantasy.cheat_sheet("Wankhede Stadium")
print(cheat.head())
Match Analysis
# Load a specific match into the engine
session.load_match("980959")
# Win probability at a point in the match
from midwicket.compute.winprob import win_probability
prob = win_probability(target=180, current_runs=120,
wickets_down=5, overs_done=15.0, venue=None)
print(f"Chase win probability: {prob['win_prob']:.1%}")
Predictive Modeling
# Win probability via Express API
result = px.predict_win("Eden Gardens", 180, 120, 5, 15.0)
print(f"Win chance: {result['win_prob']:.1%}")
# Venue batting/bowling bias
bias = md.fantasy.venue_bias("Wankhede Stadium")
print(f"Verdict: {bias['verdict']}")
Data Management
# Download IPL data (~50 MB)
from midwicket.data.loader import DataLoader
loader = DataLoader("./data")
loader.download()
# Build the identity registry from raw files
from midwicket.data.pipeline import build_registry_stats
build_registry_stats(loader, session.registry)
# Raw SQL via the query engine
from midwicket.storage.engine import QueryEngine
engine = QueryEngine("./data/midwicket.duckdb")
results = engine.execute_sql("SELECT * FROM ball_events LIMIT 10")
Architecture Overview
Midwicket uses a modular, agent-based architecture with clear separation of concerns:
Data Flow: Cricsheet JSON → Ingestion → DuckDB Cache → PyArrow Table → Pandas
Module Structure
midwicket/
├── api/ # User-Facing APIs (Express, Core, Plugins)
├── schema/ # Immutable Data Definitions (Schema V1)
├── query/ # Explicit Query Objects with Hashing
├── storage/ # I/O & State Management (DuckDB/Parquet)
├── runtime/ # Execution & Planning (Cache, Modes)
├── compute/ # Pure Math & Analytics (PyArrow)
├── core/ # Raw Data Processing (Cricsheet → Arrow)
├── data/ # External Data Fetching & Loading
├── models/ # ML Models (Win Probability, etc.)
├── visuals/ # Charts, Reports, Dashboards
├── report/ # PDF/Interactive Report Generation
├── live/ # Live Broadcasting Overlays
├── serve/ # REST API Server
└── tests/ # Comprehensive Test Suite
For detailed architecture information, see Agents.md.
Data Sources
Midwicket uses Cricsheet as its primary data source, providing comprehensive ball-by-ball data for international and domestic cricket matches. The library also supports:
- Custom Data Ingestion: Import your own cricket data in JSON format
- Cricsheet Data Download: Fetch IPL/international data via
loader.download()(~50 MB, one-time) - Live Data Streaming: Real-time match data (upcoming feature)
Documentation
Core Documentation
- Complete API Reference: Detailed function documentation with examples
- Architecture Guide: Agent-based system design and philosophy
- Win Probability Model: ML model implementation details
- Debug Mode: Troubleshooting and debugging guide
Additional Resources
- Examples: Jupyter notebooks and sample scripts (25+ examples)
- Adapters: Custom data source integration guide
- Impact Player: Player impact analysis documentation
Examples
Midwicket includes a comprehensive collection of examples to help you get started. All examples are located in the examples/ directory.
Basic Analysis
Analyze player statistics across multiple players:
import midwicket.express as px
# Load data first (one-time download from cricsheet.org, ~50 MB)
session = px.quick_load()
# Compare top run scorers
players = ["V Kohli", "S Dhawan", "RG Sharma", "DA Warner", "AB de Villiers"]
for player in players:
stats = px.get_player_stats(player)
if stats:
avg = stats.runs / stats.matches if stats.matches > 0 else 0
print(f"{player}: {stats.runs} runs ({avg:.1f} avg)")
Match Win Prediction
Predict match outcomes using real-time data:
import midwicket.express as px
# Real-time win probability calculation
venue = "Wankhede Stadium"
target = 180
current_score = 120
wickets_down = 5
overs_completed = 15.0
prob = px.predict_win(venue, target, current_score, wickets_down, overs_completed)
print(f"Current win probability: {prob['win_prob']:.1%}")
print(f"Model confidence: {prob['confidence']:.1%}")
Fantasy Cricket Cheat Sheet
Generate a fantasy selection cheat sheet ranked by projected points at a venue:
from midwicket.api.fantasy import cheat_sheet, venue_bias
# Top 20 players by avg fantasy points at this venue
df = cheat_sheet("Wankhede Stadium")
print(df[["player_id", "avg_points"]].head(10))
# Batting-first vs chase advantage at the venue
bias = venue_bias("Eden Gardens")
print(f"Verdict: {bias['verdict']} "
f"(bat-first win%: {bias['win_bat_first_pct']}, "
f"chase win%: {bias['win_chase_pct']})")
Advanced Analytics
Perform custom analytics using direct SQL queries:
from midwicket.storage.engine import QueryEngine
# Initialize query engine
engine = QueryEngine("./data/midwicket.duckdb")
# Custom SQL query for detailed analysis
query = """
SELECT batter_id,
SUM(runs_batter) AS total_runs,
COUNT(*) AS balls_faced
FROM ball_events
WHERE match_id = ?
GROUP BY batter_id
ORDER BY total_runs DESC
LIMIT 10
"""
results = engine.execute_sql(query, ["980959"])
print(results.to_pandas())
For more examples, see the examples/ directory which contains 25+ scripts covering various use cases.
Performance
Midwicket is engineered for high performance with modern data processing technologies:
Performance Features
- Vectorized Operations: Leverages PyArrow for fast columnar data processing
- Analytical Queries: Uses DuckDB for sub-second analytical queries on large datasets
- Smart Caching: Implements deterministic query hashing for efficient result caching
- Memory Efficient: Employs lazy loading and streaming for handling large datasets
- Optimized I/O: Parquet file format for fast reads and minimal storage
Benchmark Results
Performance metrics on sample IPL 2023 dataset:
| Operation | Execution Time |
|---|---|
| Player stats query | ~400μs |
| Match loading | ~6.5ms |
| Registry resolution | ~800μs |
| Win probability prediction | ~50μs |
Note: Benchmarks performed on standard hardware. Actual performance may vary based on dataset size and hardware specifications.
Stability & Compatibility
Versioning
Midwicket follows Semantic Versioning:
- Major (1.x → 2.x): Breaking architecture changes
- Minor (0.1 → 0.2): New features (backward compatible)
- Patch (0.1.1 → 0.1.2): Bug fixes only
API Stability
- Express API: Designed to be stable with backward compatibility maintained in future versions
- Core API: Structurally stable, with parameter additions only in minor versions
- Internal APIs: May change between minor versions (use at your own risk)
Contributing
We welcome contributions from the community! Midwicket is an open-source project and we appreciate help in the following areas:
- Bug fixes and issue reporting
- Feature development and enhancements
- Documentation improvements
- Test coverage expansion
- Performance optimizations
How to Contribute
- Fork the Repository: Create your own fork of the Midwicket repository
- Create a Branch: Make a feature branch for your changes
git checkout -b feature/your-feature-name
- Make Changes: Implement your changes with clear, documented code
- Write Tests: Add tests for new features or bug fixes
- Run Tests: Ensure all tests pass
pytest
- Submit Pull Request: Create a PR with a clear description of your changes
Development Setup
# Clone the repository
git clone https://github.com/CodersAcademy006/Midwicket.git
cd Midwicket
# Install dependencies
pip install -r requirements.txt
# Install in editable mode
pip install -e .
# Run tests
pytest
# Run with coverage
pytest --cov=midwicket
Code Style Guidelines
- Follow PEP 8 Python style guidelines
- Use type hints for function signatures
- Write docstrings for all public functions and classes
- Keep functions focused and modular
- Add comments for complex logic
Reporting Issues
When reporting issues, please include:
- Python version and operating system
- Midwicket version
- Minimal code example to reproduce the issue
- Expected vs. actual behavior
- Error messages and stack traces
License
Midwicket is released under the MIT License. This means you are free to use, modify, and distribute the software, subject to the terms and conditions of the MIT License.
For full license details, see the LICENSE file in the repository.
Support
Getting Help
If you need assistance or have questions about Midwicket:
- Documentation: Comprehensive guides available in the docs directory
- GitHub Issues: Report bugs or request features
- GitHub Discussions: Ask questions and share ideas
- Examples: Browse 25+ example scripts for common use cases
Community
Join the Midwicket community to connect with other users and contributors:
- Share your cricket analytics projects
- Get help from experienced users
- Contribute to the project's development
- Stay updated on new features and releases
Roadmap
Midwicket is under active development with a clear roadmap for future enhancements.
Current Version: v0.1.0
Completed Features:
- ✅ Express API with one-liner access patterns
- ✅ Data integration with Cricsheet (download via
loader.download()) - ✅ Win probability ML model implementation
- ✅ Comprehensive test suite (87% auth coverage, 70%+ overall target)
- ✅ Performance benchmarks and optimizations
- ✅ PDF report generation capabilities
- ✅ Live broadcasting overlay support
- ✅ Agent-based architecture with clear separation of concerns
Upcoming: v1.1
Planned Features:
- 🔄 Enhanced ML models (player impact analysis, pitch condition predictions)
- 🔄 Real-time data streaming capabilities
- 🔄 Advanced data visualizations and interactive charts
- 🔄 Plugin ecosystem for extensibility
- 🔄 REST API server improvements and optimizations
- 🔄 Expanded test coverage (target: 75%+)
Future Releases
Long-term Goals:
- Multi-sport support (extending beyond cricket)
- Cloud deployment options and scalability
- Mobile SDK for iOS and Android
- Advanced AI-powered analytics and insights
- Enhanced caching strategies
- Support for additional data sources
We welcome community input on the roadmap. Feel free to suggest features or vote on priorities in our GitHub Discussions.
Built with ❤️ for the cricket analytics community
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file midwicket-0.1.0.tar.gz.
File metadata
- Download URL: midwicket-0.1.0.tar.gz
- Upload date:
- Size: 233.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee7ee026eba017e9eca52400d0bef3ce20fc484782d3b751f5db500b650a1cce
|
|
| MD5 |
131aca0e68fbb0bdf1b138989d94082d
|
|
| BLAKE2b-256 |
983004c975bc0a5e892cba2ac92738dc6f544436f082e72503ca51da2db020a4
|
Provenance
The following attestation bundles were made for midwicket-0.1.0.tar.gz:
Publisher:
publish.yml on CodersAcademy006/Midwicket
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
midwicket-0.1.0.tar.gz -
Subject digest:
ee7ee026eba017e9eca52400d0bef3ce20fc484782d3b751f5db500b650a1cce - Sigstore transparency entry: 1654787245
- Sigstore integration time:
-
Permalink:
CodersAcademy006/Midwicket@f6ba43ae4943584cd772258ac7a9b9e43260f49a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/CodersAcademy006
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f6ba43ae4943584cd772258ac7a9b9e43260f49a -
Trigger Event:
release
-
Statement type:
File details
Details for the file midwicket-0.1.0-py3-none-any.whl.
File metadata
- Download URL: midwicket-0.1.0-py3-none-any.whl
- Upload date:
- Size: 187.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05f7388cd3f2e31a32122b1e31d188dff4d4df4823b4e8aa0125d4ba17cc4ed6
|
|
| MD5 |
fe87a452f4ba1f1ef8a3902350c4ff4e
|
|
| BLAKE2b-256 |
e1870ba36f8f7dc406dcf91750c8bc544a29ff22ba9541406457212f7aecc489
|
Provenance
The following attestation bundles were made for midwicket-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on CodersAcademy006/Midwicket
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
midwicket-0.1.0-py3-none-any.whl -
Subject digest:
05f7388cd3f2e31a32122b1e31d188dff4d4df4823b4e8aa0125d4ba17cc4ed6 - Sigstore transparency entry: 1654787486
- Sigstore integration time:
-
Permalink:
CodersAcademy006/Midwicket@f6ba43ae4943584cd772258ac7a9b9e43260f49a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/CodersAcademy006
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f6ba43ae4943584cd772258ac7a9b9e43260f49a -
Trigger Event:
release
-
Statement type: