A simple package for applying Machine Learnign to CMAQ and associated obs.
Project description
Universal Kriging for CMAQ
author: Barron H. Henderson
original date: 2020-02-01
last updated: 2020-02-05
contributors: <your name here>
Quick Start
Open Working.ipynb for a working example. Review config.json to understand the options that are used.
Status
Under active development. Currently, working with Ozone and PM for a single test day. Only ready for developers to test and help develop.
Prerequisites
- Python >= 3.6
- numpy >= 2
- pykrige >= 1.5.1
- Optional:
- sklearn >= 0.24
Overview
Apply Universal Krigging as implemented in pykrige
to CMAQ fields. This
is an example of regression kriging where the mean is first removed using
another model. The simplest example is a linear model, but cmaqkrig supports
multilinear regression and Random Forest as well. The options here will expand
over time.
-
yhat = m CMAQ + b
CMAQ
: CMAQ concentrations in ppbm
,b
: parameters fit by scipy.stats.linregress where y is AQS 1st maximum 8-hour average ozoneyhat
: estimate based on CMAQe
:e = obs - yhat
; bias that is assumed to have spatial correlation
-
UK_ERROR = Krig(e)
-
UK_TOTAL = yhat + UK_ERROR
Blending
In addition, cmaqkrig provides a mechanism that allows the mean estimation model and the UniversalKriging system to be optimized in subdomains and then reconstruct a complete surface by blending. Subdomains currently support splitting on latitIn addition to spatial subdomains, I use a urban/rural division as well.
Mean Estimation Models
Before kriging the residual, the package estimates the best fit of the model to observations using linear regression, multiple linear regression, or Random Forest. In upcoming versions, we are likely to support extended voronoi neighbor averaging, and custom models using the scipy.optimize framework. The models for the mean are accessed via the config.json file "regression_options" "model" key.
- scipy_linregress: provides access to scipy.stats.linregress for univariate linear regression,
- sklearn_LinearRegression: provides uni- or multi-variate regression via sklearn.linear_model.LinearRegression, or
- sklearn_RandomForestRegressor: provides Ensemble Random Forest modeling via sklearn.ensemble.RandomForestRegressor
- cmaqml_evna: provides an enhanced Voronoi Neighbor Averaging scheme. This has been custom built and may need to be made more efficient. At this point, the Voronoi neighbors are calculated for each point independently. Another approach would be to calculate one set of Voronoi diagrams and then find points within the single set of polygons.
Any sklearn model is capable of being added. The challenge is in finding the right way to export the model as a text representation for meta-data. To add a new model from sklearn, follow the templates sklearn_LinearRegression and sklearn_RandomForestRegressor in scripts/models.py.
Please submit any additions back to the project.
Annotated Directory Structure
.
|-- README.md
|-- config.json
| # Fitting parameters and spatial domain splitting parameters
|-- src/
| `-- cmaqml
| | # CMAQ Machine Learning framework
| |-- models
| | # Module of known Machine Learning modules
| | # Currently includes regression, Random Forest, eVNA and others
| `-- obs
| # Module of known observation readers. Currently only AQS
|-- scripts
| |-- validate_figs.py
| | # Create validation figures including statistics from a single
| | # withholding
| |-- validate_stats.py
| | # Create validation statistics from multiple witholdings
| |-- make_maps.py
| | # Script for visualization
| `-- fitting.py
| # not complete. Ideally, optimize UK settings for application to domains
`-- examples/
|-- Working.ipynb
| # Working example
|-- Blend.ipynb
| # An example where multiple subset grids are run
| # and then blended.
|-- input/
| |-- daily_44201_20160715.zip
| | # subset of AQS; right now not part of repository for testing
| |-- daily_88101_20160115.zip
| | # subset of AQS; right now not part of repository for testing
| |-- dailyavg.LST.Y_24.2016fh.v531.108US2.01.nc
| | # A single day of PM25_FRM post-processed output from CMAQ
| |-- O3_8HRMAX.LST.Y_24.2016fh.v531.108US2.5-9.nc
| | # A single day of O3_8HRMAX post-processed output from CMAQ
| |-- gpw_v4_une_atotpopbt_densy_108US2.IOAPI.nc
| | # An IOAPI-like file with population density derived from the SEDAC
| | # Gridded Population World v4
| |-- GRIDCRO2D.108US2.35L.160101.nc
| | # A single day file with terrain height
| |-- GRIDDESC
| | # An IOAPI text file defining common grids
| `-- make_test.py
| # subset of CMAQ. right now not part of repository
`-- output
|-- UK.<YYYYMMDD>.<querykey>.nc
| # outputs from cmaq_uk.py
| # template where
| # * YYYYMMDD is the date
| # * querykey in: (EN|ES|WN|WS|ALL)_(URB|RUR|BOTH)
`-- UK.YYYYMMDD.FUSED.<querykey>.nc where
# outputs from blend.py
# where querykey in ALL_URB, ALL_RUR, oroutputs from blend.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file cmaqml-0.2.1.tar.gz
.
File metadata
- Download URL: cmaqml-0.2.1.tar.gz
- Upload date:
- Size: 191.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.6.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a25c9fa9a8915f465b01c10c9c4902c053d704d362fb9ce9f14d5571450f75d9 |
|
MD5 | d4117b304754379ecbfc18a466852dfb |
|
BLAKE2b-256 | 82552f623cd722609e9ce97d50d94456fc25e73290226e643f2f67e8220c947a |