Optimize adaptive sampling
Project description
Trufl
Trufl was initiated in the context of the IAEA (International Atomic Energy Agency) Coordinated Research Project titled “Monitoring and Predicting Radionuclide Uptake and Dynamics for Optimizing Remediation of Radioactive Contamination in Agriculture”.
While Trufl was originally developed to address the remediation of farmland affected by nuclear accidents, its approach and algorithms are applicable to a wide range of application domains. This includes managing legacy contaminants or monitoring any phenomenon that requires consideration of multiple decision criteria, potentially involving a large set of data.
This package leverages the work done by Floris Abrams in the context of his PhD at KU Leuven and Franck Albinet, International Consultant in Geospatial Data Science and currently PhD researcher in AI applied to nuclear remedation at KU Leuven.
Install
pip install trufl
Getting started
Create a vector grid from a given raster
fname_raster = '../files/ground-truth-01-4326-simulated.tif'
gdf_grid = gridder(fname_raster, nrows=10, ncols=10)
gdf_grid.head()
geometry | |
---|---|
loc_id | |
0 | POLYGON ((-1.20830 43.26950, -1.20830 43.26042... |
1 | POLYGON ((-1.20830 43.27858, -1.20830 43.26950... |
2 | POLYGON ((-1.20830 43.28766, -1.20830 43.27858... |
3 | POLYGON ((-1.20830 43.29673, -1.20830 43.28766... |
4 | POLYGON ((-1.20830 43.30581, -1.20830 43.29673... |
gdf_grid.boundary.plot(color=black, lw=0.5);
Random sampling in areas of interest
Generating a random set of points within a given: - a geodataframe of
polygons of interest (in this example just a grid with loc_id
s); - For
each subarea (loc_id
), we specify the number of measurements to be
taken, which we simulate here by generating random numbers.
sampler = Sampler(gdf_grid)
n = np.random.randint(1, high=10, size=len(gdf_grid), dtype=int)
sample_locs = sampler.sample(n, method='uniform')
print(sample_locs.head())
sample_locs.plot(markersize=2, color=red);
geometry
loc_id
0 MULTIPOINT ((-1.22251 43.26756), (-1.22194 43....
1 MULTIPOINT ((-1.22303 43.27756), (-1.22251 43....
2 MULTIPOINT ((-1.22194 43.28433), (-1.22170 43....
3 MULTIPOINT ((-1.22204 43.29152), (-1.22199 43....
4 MULTIPOINT ((-1.22174 43.30568), (-1.21978 43....
Emulating data collection
With random sampling location defined, data collector should be to the field to take measurements. In our case, we “emulate” this process by “extracting” measurements from provided raster file.
We will emulate data collection from the raster shown below:
with rasterio.open(fname_raster) as src:
plt.axis('off')
plt.imshow(src.read(1))
“Measuring” variable of interest from a given raster:
dc_emulator = DataCollector(fname_raster)
samples_t0 = dc_emulator.collect(sample_locs)
print(samples_t0.head())
ax = samples_t0.plot(column='value', s=2, legend=True)
gdf_grid.boundary.plot(color=black, ax=ax);
geometry value
loc_id
0 POINT (-1.22251 43.26756) 0.107457
0 POINT (-1.22194 43.26461) 0.140862
0 POINT (-1.22131 43.26484) 0.145688
0 POINT (-1.22111 43.26411) 0.144795
0 POINT (-1.21696 43.26822) 0.132611
Getting current state
state = State(samples_t0, gdf_grid, cbs=[
MaxCB(), MinCB(), StdCB(), CountCB(), MoranICB(k=5), PriorCB(fname_raster)
])
# You have to call the instance
state_t0 = state(); state_t0
Max | Min | Standard Deviation | Count | Moran.I | Prior | |
---|---|---|---|---|---|---|
loc_id | ||||||
0 | 0.145688 | 0.054230 | 0.029788 | 9 | 0.786626 | 0.102492 |
1 | 0.156101 | 0.000000 | 0.055564 | 6 | 0.348230 | 0.125727 |
2 | 0.171939 | 0.153706 | 0.005947 | 6 | 0.200736 | 0.161802 |
3 | 0.221299 | 0.161790 | 0.022979 | 8 | 0.882324 | 0.184432 |
4 | 0.209163 | 0.175360 | 0.012759 | 6 | 0.756250 | 0.201405 |
... | ... | ... | ... | ... | ... | ... |
95 | 0.881591 | 0.806614 | 0.021775 | 8 | 0.498415 | 0.803670 |
96 | 0.833478 | 0.753105 | 0.026137 | 8 | 0.789527 | 0.763408 |
97 | 0.708564 | 0.668151 | 0.017366 | 4 | NaN | 0.727797 |
98 | 0.706323 | 0.674502 | 0.010833 | 8 | 0.818699 | 0.646002 |
99 | 0.709104 | 0.674233 | 0.010549 | 8 | 0.804634 | 0.655185 |
100 rows × 6 columns
Build the ranking of polygons based on several criteria
Criteria
- MaxCB()
- MinCB()
- StdCB()
- CountCB()
- MoranICB(k=5) – Gives 2 values (value , p-value)
- PriorCB
Criteria type
-
Benefit (high values –> high score –> rank high –> prioritized sampling needed)
-
Cost (high values –> low score –> low high –> Less sampling needed)
-
MaxCB() – Benefit
-
MinCB() – ???
-
StdCB() – Benefit
-
CountCB() – Cost (Low count – higher priority because more samples need)
-
MoranICB(k=5) – Cost (high value – highly correlated – less need for sampling ?? )
-
PriorCB – Benefit
MCDM techniques
- CP – low values – good alternative
- TOPSIS – High Value – good alternative
! Everything is converted to rank to account for these differences !
benefit_criteria = [True, True, True]
state = State(samples_t0, gdf_grid, cbs=[MaxCB(), MinCB(), StdCB()])
optimizer = Optimizer(state=state())
df = optimizer.rank(is_benefit_x=benefit_criteria, w_vector = [0.3, 0.3, 0.4],
n_method=None, c_method = None, w_method=None, s_method="CP")
df.head()
rank | |
---|---|
loc_id | |
83 | 1 |
91 | 2 |
92 | 3 |
84 | 4 |
93 | 5 |
# https://kapernikov.com/ipywidgets-with-matplotlib/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.