Skip to main content

EIR Estimation using Machine learning INTerventions - Python port

Project description

estiMINT (Python)

Python port of the estiMINT R package for EIR (Entomological Inoculation Rate) estimation using machine learning.

Installation

pip install -e .

Or install dependencies directly:

pip install -r requirements.txt

File Mapping (R → Python)

R File Python File Description
estiMINT-package.R __init__.py Package initialization and exports
globals.R globals.py Global variables and constants
utils.R utils.py Utility functions (metrics, QMAP, etc.)
data_processing.R data_processing.py Data loading and preprocessing
models.R models.py XGBoost model training
train.R train.py Main training pipeline with K-fold CV
plotting.R plotting.py Visualization functions
storage.R storage.py Model persistence and loading
run.R run.py Model inference

API Reference

Training

from estimint import train_xgb_model

model = train_xgb_model(
    in_parquet="data/input.parquet",
    out_dir="output/",
    thr_lo=0.02,           # Lower prevalence threshold
    thr_hi=0.95,           # Upper prevalence threshold
    k_strata=16,           # K-means strata for EIR
    K=10,                  # CV folds
    seed=42,
    save_pkl=True,
    save_plots=True,
    save_artifacts=True
)

Inference

from estimint import load_xgb_model, run_xgb_model
import pandas as pd

# Load model
model = load_xgb_model("output/models/estiMINT_model.pkl")

# Prepare input data
new_data = pd.DataFrame({
    "dn0_use": [0.5],
    "Q0": [0.3],
    "phi_bednets": [0.6],
    "seasonal": [1],
    "itn_use": [0.7],
    "irs_use": [0.2],
    "prev_y9": [0.15]  # or "prevalence"
})

# Run prediction
eir_predictions = run_xgb_model(new_data, model)
print(f"Predicted EIR: {eir_predictions[0]:.2f}")

Using Global Model

from estimint import load_xgb_model, run_xgb_model, set_global_model

# Set global model once
model = load_xgb_model("output/models/estiMINT_model.pkl")
set_global_model(model)

# Run predictions without passing model
predictions = run_xgb_model(new_data)  # Uses global model

Utility Functions

from estimint import (
    r2, rmse, mse, mae, median_ae, mae_rel, rmsle, smape,
    fit_qmap_w, predict_qmap_w, scale_pos
)

# Calculate metrics
y_true = [1, 2, 3, 4, 5]
y_pred = [1.1, 2.2, 2.9, 4.1, 4.8]

print(f"R²: {r2(y_true, y_pred):.4f}")
print(f"RMSE: {rmse(y_true, y_pred):.4f}")
print(f"MAE: {mae(y_true, y_pred):.4f}")

# Quantile mapping calibration
cal = fit_qmap_w(y_pred, y_true)
y_calibrated = predict_qmap_w(y_pred, cal)

Data Processing

from estimint import load_and_filter, make_value_weights, strata_and_split

# Load and filter parquet data
result = load_and_filter("data.parquet", thr_lo=0.02, thr_hi=0.95)
df = result["DT"]
df_excluded = result["DT_excluded"]

# Create inverse-frequency weights
weights = make_value_weights(df["eir"].values, digits=3)

# Stratified split
df["eir_log10"] = np.log10(df["eir"])
df = strata_and_split(df, k_strata=16, seed=42)

Key Differences from R Version

  1. File format: Models saved as .pkl (pickle) instead of .rds
  2. Data handling: Uses pandas instead of data.table
  3. Plotting: Uses matplotlib instead of ggplot2
  4. Global model: Use set_global_model() / get_global_model() instead of .GlobalEnv

Dependencies

  • numpy >= 1.20.0
  • pandas >= 1.3.0
  • duckdb >= 0.8.0
  • xgboost >= 1.6.0
  • scikit-learn >= 1.0.0
  • matplotlib >= 3.4.0
  • requests >= 2.28.0 (optional, for model download)
  • appdirs >= 1.4.0 (optional, for cache directory)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

estimint-1.0.0.tar.gz (4.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

estimint-1.0.0-py3-none-any.whl (4.7 MB view details)

Uploaded Python 3

File details

Details for the file estimint-1.0.0.tar.gz.

File metadata

  • Download URL: estimint-1.0.0.tar.gz
  • Upload date:
  • Size: 4.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for estimint-1.0.0.tar.gz
Algorithm Hash digest
SHA256 437bdd3337a93f78c8e0e766e0ce13726b7c8c04c109d8eadc20853b23587023
MD5 1f5ca2c071b4b6ec6cfb40d8dc1f8533
BLAKE2b-256 b2de5ed87c31bb61aa5e1551c078874879a86101a336feef51c757da632a038e

See more details on using hashes here.

File details

Details for the file estimint-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: estimint-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 4.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for estimint-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1df3bf63bbf4d4631f2b3b09c0284b71484ed9028fba01d65c3399e173a62ab1
MD5 4ef0a67b31cd258c0af185dc52980276
BLAKE2b-256 c383f0522dfe0414e3ae01f5ae45b6836219c6a87412d3edbdb727dc9eb4f82c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page