Predicts soil moisture from cosmic ray neutron sensor data using a random forest model
Project description
SOIL MOISTURE PREDICTION
Description
This script performs soil moisture prediction using a Random Forest model based on soil properties. Additionally, it allows for incorporating soil moisture uncertainty in the input file and performs a probabilistic prediction using a Monte Carlo approach.
Usage
The package provides a command line interface smp_cli
to run the prediction.
The cli-tool takes the path to a directory containing a JSON file with input parameters. The input files can be put in the same directory as the parameters file or must be given as an absolute path.
Directory structure
This is an example of this directory structure:
soil_moisture_prediction/test_data/
├── crn_soil-moisture.csv
├── parameters.json
├── predictor_1.csv
├── predictor_2.csv
├── predictor_3.csv
└── predictor_4.csv
Input parameters
The parameters.json file in this example directory contains the following content:
$ cat soil_moisture_prediction/test_data/parameters.json
{
"geometry": [
632612,
634112,
5739607,
5741107,
250
],
"projection": "EPSG:25832",
"predictors": {
"elevation": {
"file_path": [
"/abs/path/to/predictor_1.csv"
],
"unit": "m",
"std_deviation": true,
"constant": true,
"nan_value": ""
},
"variable predictor": {
"file_path": "predictor_2.csv",
"unit": "u",
"std_deviation": true,
"constant": false,
"nan_value": "0.0"
},
"pred_3": {
"file_path": "predictor_3.csv",
"unit": "u",
"std_deviation": true,
"constant": true,
"nan_value": ""
},
"pred_4": {
"file_path": "predictor_4.csv",
"unit": "u",
"std_deviation": true,
"constant": true,
"nan_value": "NaN"
}
},
"soil_moisture_data": "crn_soil-moisture.csv",
"monte_carlo_soil_moisture": false,
"monte_carlo_predictors": false,
"monte_carlo_iterations": 10,
"predictor_qmc_sampling": false,
"compute_slope": true,
"compute_aspect": true,
"past_prediction_as_feature": true,
"reset_when_rain_occured": false,
"average_measurements_over_time": true,
"allow_nan_in_training": false,
"what_to_plot": {
"predictors": true,
"pred_correlation": true,
"day_measurements": true,
"day_predictor_importance": true,
"day_prediction_map": true,
"alldays_predictor_importance": true
},
"save_results": true
}
Input data
There are two ways to provide predictor data. Either by providing a file path or by providing a specific key for one of the following predictor sources. If a key is used, the information in the parameters.json file must be 'null'. For each predictor key an external source is used to retrieve the data for the selected geometry.
If for ["predictors"][pred_key]["file_path"] and ["soil_moisture_data"] a file path is given, the file is assumed to be in the same directory as the parameters.json file.
The predictor keys are:
-
elevation_bkg: Elevation data provided by the Bundes Amtes für Kartographie und Geodäsie (BKG) The resolution of the data is 200m x 200m. The covered area is defined by the bounding box: Latitude: 47.23766056897108 to 54.88593008642519 Longitude: 6.083977454450403 to 15.57232578151963
-
bdod_x-ycm: Bulk density of the fine earth fraction Soil property data provided by SoilGrids The data is available for the whole world. The resolution of the data is 250m x 250m. Measured in a depth of x-ycm
-
cec_x-ycm: Cation Exchange Capacity of the soil Soil property data provided by SoilGrids The data is available for the whole world. The resolution of the data is 250m x 250m. Measured in a depth of x-ycm
-
cfvo_x-ycm: Volumetric fraction of coarse fragments (> 2 mm) Soil property data provided by SoilGrids The data is available for the whole world. The resolution of the data is 250m x 250m. Measured in a depth of x-ycm
-
clay_x-ycm: Proportion of clay particles (< 0.002 mm) in the fine earth fraction Soil property data provided by SoilGrids The data is available for the whole world. The resolution of the data is 250m x 250m. Measured in a depth of x-ycm
-
nitrogen_x-ycm: Total nitrogen (N) Soil property data provided by SoilGrids The data is available for the whole world. The resolution of the data is 250m x 250m. Measured in a depth of x-ycm
-
phh2o_x-ycm: Soil pH Soil property data provided by SoilGrids The data is available for the whole world. The resolution of the data is 250m x 250m. Measured in a depth of x-ycm
-
sand_x-ycm: Proportion of sand particles (> 0.05/0.063 mm) in the fine earth fraction Soil property data provided by SoilGrids The data is available for the whole world. The resolution of the data is 250m x 250m. Measured in a depth of x-ycm
-
silt_x-ycm: Proportion of silt particles (≥ 0.002 mm and ≤ 0.05/0.063 mm) in the fine earth fraction Soil property data provided by SoilGrids The data is available for the whole world. The resolution of the data is 250m x 250m. Measured in a depth of x-ycm
-
soc_x-ycm: Soil organic carbon content in the fine earth fraction Soil property data provided by SoilGrids The data is available for the whole world. The resolution of the data is 250m x 250m. Measured in a depth of x-ycm
-
ocd_x-ycm: Organic carbon density Soil property data provided by SoilGrids The data is available for the whole world. The resolution of the data is 250m x 250m. Measured in a depth of x-ycm
-
ocs_x-ycm: Organic carbon stocks Soil property data provided by SoilGrids The data is available for the whole world. The resolution of the data is 250m x 250m. Measured in a depth of x-ycm
-
Available depth levels for SoilGrids data: 0-5cm, 5-15cm, 15-30cm, 30-60cm, 60-100cm, 100-200cm
So these are three possbile ways to provide the predictor data:
"predictors": {
"elevation": {
"file_path": [
"/abs/path/to/predictor_1.csv"
],
"unit": "m",
"std_deviation": true,
"constant": true,
"nan_value": ""
},
...
}
"predictors": {
"elevation": {
"file_path": [
"/abs/path/to/predictor_1.csv"
],
"unit": "m",
"std_deviation": true,
"constant": true,
"nan_value": ""
},
...
}
"predictors": {
"elevation_bkg": null,
...
}
The predictor file looks like this:
$ head -n 5 soil_moisture_prediction/test_data/predictor_data.csv
# { "predictor_name": "elevation", "unit": "m", "std_deviation": true, "constant": true, "nan_value": "", "file_path": null }
632200.0,5741600.0,251.3,5.026
632400.0,5741600.0,241.85,4.837
632600.0,5741600.0,235.02,4.7004
632800.0,5741600.0,229.0,4.58
The predictor can have a head starting with a #. After the #, a json must be given with the same information as the parameters.json file. This is a redundant way of giving the parameters and is used for programmatic reading with out a parameters.json file.
The soil moisture data looks like this:
$ head -n 5 soil_moisture_prediction/test_data/soil_moisture_data.csv
EPSG_UTM_x,EPSG_UTM_y,Day,soil_moisture,err_low,err_high
633742.2079,5741065.818,20220327,0.26870625,-0.0264875,0.0298375
633694.9659,5741026.54,20220327,0.27261,-0.02075,0.022775
633652.0085,5740981.625,20220327,0.27655625,-0.0171125,0.018425
633613.7622,5740928.489,20220327,0.280341071,-0.01545,0.0165375
The soil moisture data can have a header with the column names.
Pydantic model
This is a description of the input parameters model: geometry: A list of five numbers representing the bounding box. [xmin, xmax, ymin, ymax, resolution].
projection: The projection of the bounding box e.g. EPSG:25832
soil_moisture_data: The path to the soil moisture data.
predictors: A dictionary of predictors. Either provide one of the predefined predictors (e.g. 'corine') with None or provide a predictor information model.
monte_carlo_soil_moisture: Whether to use a Monte Carlo Simulation to predict uncertainty for soil moisture.
monte_carlo_predictors: Whether to use a Monte Carlo Simulation to predict uncertainty for the predictors.
monte_carlo_iterations: Number of iterations for the Monte Carlo Simulation.
allow_nan_in_training: Whether to allow NaN values in the training data.
predictor_qmc_sampling: Whether to use Quasi-Monte Carlo sampling for the predictors.
reset_when_rain_occured: Whether to reset the model when rain occured.
compute_slope: Whether to compute the slope from elevation and use as predictor.
compute_aspect: Whether to compute the aspect from elevation and use as predictor.
past_prediction_as_feature: Whether to use the past prediction as a feature.
average_measurements_over_time: Whether to average the measurements over time.
what_to_plot: List of which plotting functions should be used.
save_results: Dump random forest model as numpy arrays.
Algorithm
The algorithm trains a random forest regressor (RandomForestRegressor from scikit-learn) with the soil moisture data and the predictor values at the measurements locations. The trained model is then applied on the whole densely gridded area. The output is the a numpy array with the soil moisture values at each grid node.
Visualization
In addition to the resulting array(s) (prediction only or prediction and coefficient of dispersion),
the programm offers to plot some results.
predictors: plot all the predictors as color maps after re-gridding them to the project grid.
pred_correlation: compute and plot the correlation between each predictors and display them as a heatmap. The color intensity indicates the strength and direction of correlation,
ranging from -1 (strong negative correlation) to 1 (strong positive correlation). It can help to remove redundant predictors highly correlated between them.
day_measurements: plot soil moisture measurements as a scatter plot on an x-y mapfor each day. The measurements are colored according to their corresponding soil moisture values.
If Monte Carlo simulations are enabled, error bands representing the standard deviations are overlaid on the scatter plot.
day_predictor_importance: plot histogram of the normalized predictor importances from the random forest model for each day.
If Monte Carlo simulations are enabled, the plot shows the 5th, 50th (median), and 95th quantiles of the importance values.
day_prediction_map: plot the map of the densely modelled soil moisture on the project area. If uncertainty are provided
the coefficient of dispersion map is also provided.
alldays_predictor_importance: if several days are provided, the predictor importance is computed for each day
and a curve of the predictor importance along days is plotted for each predictor. The x-axis represents the days, and the y-axis represents the importance values.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file soil_moisture_prediction-0.0.32.tar.gz
.
File metadata
- Download URL: soil_moisture_prediction-0.0.32.tar.gz
- Upload date:
- Size: 48.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.10.15 Linux/5.15.0-124-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae1177e22b2504d9908231f8830e713171fceaee4ffdc2472f0500f492594811 |
|
MD5 | 6d7a46e4c40f113b7a0c6b9543461009 |
|
BLAKE2b-256 | 16e9c63c65fcfcdf0878a5e97acc29911085ab2e9c70ce6143f988f785fd17ba |
File details
Details for the file soil_moisture_prediction-0.0.32-py3-none-any.whl
.
File metadata
- Download URL: soil_moisture_prediction-0.0.32-py3-none-any.whl
- Upload date:
- Size: 55.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.10.15 Linux/5.15.0-124-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c96cb4ace3c3cd07f706d29346b425b54659a826636787e08de273fcd342a1fe |
|
MD5 | 4ff4dfba198147f3c71ca03e3151658c |
|
BLAKE2b-256 | b96e2a427dc831a54965d5b73d468ce9aa7782ee90be518cd8dd74328be2f57a |