End-to-end ML pipeline for vehicle CO2 emissions prediction
Project description
CO₂ Emissions Prediction from Vehicle Features
Authors: Shashvat Jain
Affiliation: Integrated M.Tech. in Mathematics & Computing, IIT Dhanbad
GitHub: https://github.com/Shashvat-Jain/CO2-predictions-using-Automotive-Features
co2_emissions_ml
CO₂ Emissions Prediction from Vehicle Features
End-to-end Python package for analyzing and predicting on-road vehicle CO₂ emissions (g/km) via machine learning.
Features
- Preprocessing & Feature Engineering: scaling, one-hot encoding, target transformation
- Baseline Models: linear, polynomial, ridge/lasso, random forest, XGBoost, LightGBM, CatBoost
- Stacked Ensemble: LightGBM + XGBoost + CatBoost → MLP meta-learner → Ridge residual correction
- Bayesian Hyperparameter Tuning: Optuna pruners, early stopping
- Diagnostics & Explainability: parity plots, residual analysis, learning curves, permutation importance, SHAP
Key result:
Test set: (R^2 = 0.9830), MAE ≈ 3.08 g/km, RMSE ≈ 8.64 g/km
📦 Repository Structure
.
├── README.md
├── LICENSE
├── CITATION.cff
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── DATA_DICTIONARY.md
├── .gitignore
├── environment.yml
├── requirements.txt
├── setup.py
├── Dockerfile
│
├── data/
│ └── New Dataset.csv
│
├── notebooks/
│ └── co2-emissions-predict.ipynb
│
├── src/
│ ├──models
│ └──co2_emissions_ml
│ ├── __init__.py
│ ├── preprocessing.py
│ ├── models.py
│ ├── evaluation.py
│ └── pipeline.py
│
├── tests/
│ └── test_pipeline.py
│
├── scripts/
│ └── train_and_save.py
│
├── Figures/
│ ├── parity_plot.png
│ ├── residual_hist.png
│ ├── qq_plot.png
│ ├── residuals_vs_pred.png
│ ├── mae_decile.png
│ ├── learning_curve.png
│ ├── perm_importance.png
│ ├── shap_summary.png
│ ├── shap_dependence.png
│ └── pipeline_diagram.png
│
├── Slides/
│ └── End Evaluation.pdf
│
└── Reports/
├── Split Report
└── Final Report with plag report.pdf
⚙️ Installation
# From PyPI
pip install co2_emissions_ml
# Or install latest from GitHub
pip install git+https://github.com/Shashvat-Jain/CO2-predictions-using-Automotive-Features.git
Quickstart
- Predict via CLI
run_co2 \
--data path/to/your_new_data.csv \
--model path/to/pretrained_bundle.pkl \
--output path/to/predictions.csv
-
--data (required): input CSV with vehicle features
-
--model (optional): path to serialized bundle.pkl (default: models/bundle.pkl)
-
--output (optional): CSV path for predictions
-
--target (optional): dependent variable name in input CSV
- Programmatic API
import pandas as pd
import joblib
from co2_emissions_ml.models import predict_bundle
# Load pre-trained bundle
bundle = joblib.load("models/bundle.pkl")
# Prepare new data
df_new = pd.read_csv("your_new_data.csv")
X_new = df_new.copy()
# Predict
df_new["predicted_CO2"] = predict_bundle(bundle, X_new)
df_new.to_csv("predictions.csv", index=False)
🚀 Usage of GitHub Repository
-
Prepare data Place New Dataset.csv under data/.
-
Run notebook Open and execute notebooks/co2_emissions_predict.ipynb to reproduce EDA, model training, and evaluation.
-
Diagnostics & plots Generated in figures/:
- Parity plot
- Residual histogram & Q-Q plot
- Learning curve
- Permutation & SHAP importance charts
Note: The notebook co2_emissions_predict.ipynb contains the complete code for the thesis whereas the src folder only contains the code for the new pipeline presented in this research.
📊 Results Snapshot
Figure:
Figure:
📚 References
-
Smith A., Jones B., Lee C. (2020). Random Forest–Based Prediction of Vehicle CO₂ Emissions. Int. J. Automotive Technol.
-
Gupta R., Ramesh S. (2021). XGBoost Regression for Estimating Vehicle Emissions. IEEE Trans. Intelligent Vehicles.
-
Tansini A., Pavlović I., Fontaras G. (2022). Forecasting CO₂ Emissions Using Ensemble, ML & DL. PeerJ.
-
Zhao P., Zhang X., Li Y. (2023). Global Fuel- and Vehicle-Type-Specific CO₂ Emissions. Earth Syst. Sci. Data.
-
Government of Canada (2024). Fuel Consumption Ratings. Open Gov. Portal.
-
U.S. EPA (2022). 2022 EPA Automotive Trends Report. EPA-420-S-22-001.
-
(See full bibliography in reports/.)
📄 License
This project is licensed under the MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file co2_emissions_ml-1.0.1.tar.gz.
File metadata
- Download URL: co2_emissions_ml-1.0.1.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3adda9e268c9bcb590f6ad0a7e4cc07463c95e2a7814bddc99f80b72368d17fe
|
|
| MD5 |
0da4ed9ae729919e0ecb40e31a5c5a08
|
|
| BLAKE2b-256 |
44101426ddf6bf6af5bdc516f335047dec5e77a5ae7339e75f7c807aee26aece
|
File details
Details for the file co2_emissions_ml-1.0.1-py3-none-any.whl.
File metadata
- Download URL: co2_emissions_ml-1.0.1-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7622a5f0201402d2689764355cfbdb76ef55ddaf72cf5ebd1dc083e58a1be333
|
|
| MD5 |
762936bed80b1f914c6bd7907068fa6a
|
|
| BLAKE2b-256 |
39a070f5caa48dbd9a47d3d11ac4ea1b7a7cfef082675993950c37b4899c18c3
|