Optimize adaptive sampling

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Project description

Trufl

Trufl was initiated in the context of the IAEA (International Atomic Energy Agency) Coordinated Research Project titled “Monitoring and Predicting Radionuclide Uptake and Dynamics for Optimizing Remediation of Radioactive Contamination in Agriculture”.

While Trufl was originally developed to address the remediation of farmland affected by nuclear accidents, its approach and algorithms are applicable to a wide range of application domains. This includes managing legacy contaminants or monitoring any phenomenon that requires consideration of multiple decision criteria, potentially involving a large set of data.

This package leverages the work done by Floris Abrams in the context of his PhD at KU Leuven and Franck Albinet, International Consultant in Geospatial Data Science and currently PhD researcher in AI applied to nuclear remedation at KU Leuven.

Install

pip install trufl

Getting started

Create a vector grid from a given raster

fname_raster = './files/ground-truth-01-4326-simulated.tif'
gdf_grid = gridder(fname_raster, nrows=10, ncols=10)

gdf_grid.head()

	geometry
loc_id
0	POLYGON ((-1.20830 43.26950, -1.20830 43.26042...
1	POLYGON ((-1.20830 43.27858, -1.20830 43.26950...
2	POLYGON ((-1.20830 43.28766, -1.20830 43.27858...
3	POLYGON ((-1.20830 43.29673, -1.20830 43.28766...
4	POLYGON ((-1.20830 43.30581, -1.20830 43.29673...

gdf_grid.boundary.plot(color=black, lw=0.5);

Random sampling in areas of interest

Generating a random set of points within a given: - a geodataframe of polygons of interest (in this example just a grid with loc_ids); - For each subarea (loc_id), we specify the number of measurements to be taken, which we simulate here by generating random numbers.

sampler = Sampler(gdf_grid)
n = np.random.randint(1, high=10, size=len(gdf_grid), dtype=int)
sample_locs = sampler.sample(n, method='uniform')

print(sample_locs.head())
sample_locs.plot(markersize=2, color=red);

                                                 geometry
loc_id                                                   
0       MULTIPOINT ((-1.22182 43.26138), (-1.22158 43....
1       MULTIPOINT ((-1.21672 43.27361), (-1.21541 43....
2                               POINT (-1.21801 43.27919)
3       MULTIPOINT ((-1.22264 43.29411), (-1.22207 43....
4       MULTIPOINT ((-1.21371 43.29700), (-1.21006 43....

Emulating data collection

With random sampling location defined, data collector should be to the field to take measurements. In our case, we “emulate” this process by “extracting” measurements from provided raster file.

We will emulate data collection from the raster shown below:

with rasterio.open(fname_raster) as src:
    plt.axis('off')
    plt.imshow(src.read(1))

“Measuring” variable of interest from a given raster:

dc_emulator = DataCollector(fname_raster)
samples_t0 = dc_emulator.collect(sample_locs)

print(samples_t0.head())
ax = samples_t0.plot(column='value', s=2, legend=True)
gdf_grid.boundary.plot(color=black, ax=ax);

                         geometry     value
loc_id                                     
0       POINT (-1.22182 43.26138)  0.137890
0       POINT (-1.22158 43.26741)  0.124069
0       POINT (-1.21942 43.26181)  0.141608
0       POINT (-1.21880 43.26477)  0.145231
0       POINT (-1.21376 43.26352)  0.104063

Getting current state

state = State(samples_t0, gdf_grid, cbs=[
    MaxCB(), MinCB(), StdCB(), CountCB(), MoranICB(k=5), PriorCB(fname_raster)
])

# You have to call the instance
state_t0 = state(); state_t0

	Max	Min	Standard Deviation	Count	Moran.I	Prior
loc_id
0	0.145231	0.074017	0.025631	8	0.915078	0.102492
1	0.152964	0.111763	0.015033	7	0.679907	0.125727
2	0.160229	0.160229	0.000000	1	NaN	0.161802
3	0.188177	0.164264	0.008248	6	0.940746	0.184432
4	0.261005	0.241690	0.009657	2	NaN	0.201405
...	...	...	...	...	...	...
95	0.877876	0.800729	0.027397	5	NaN	0.803670
96	0.804376	0.795942	0.004217	2	NaN	0.763408
97	0.799064	0.672111	0.047581	8	0.948441	0.727797
98	0.708847	0.661171	0.013101	9	0.731343	0.646002
99	0.704167	0.673213	0.015477	2	NaN	0.655185

100 rows × 6 columns

Build the ranking of polygons based on several criteria

Criteria

MaxCB()
MinCB()
StdCB()
CountCB()
MoranICB(k=5) – Gives 2 values (value , p-value)
PriorCB

Criteria type

Benefit (high values –> high score –> rank high –> prioritized sampling needed)
Cost (high values –> low score –> low high –> Less sampling needed)
MaxCB() – Benefit
MinCB() – ???
StdCB() – Benefit
CountCB() – Cost (Low count – higher priority because more samples need)
MoranICB(k=5) – Cost (high value – highly correlated – less need for sampling ?? )
PriorCB – Benefit

MCDM techniques

CP – low values – good alternative
TOPSIS – High Value – good alternative

! Everything is converted to rank to account for these differences !

benefit_criteria = [True, True, True]
state = State(samples_t0, gdf_grid, cbs=[MaxCB(), MinCB(), StdCB()])

optimizer = Optimizer(state=state())
df = optimizer.rank(is_benefit_x=benefit_criteria, w_vector = [0.3, 0.3, 0.4],  
                    n_method=None, c_method = None, w_method=None, s_method="CP")

df.head()

	rank
loc_id
93	1
92	2
84	3
91	4
83	5

# https://kapernikov.com/ipywidgets-with-matplotlib/

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

0.1.0

Jul 13, 2024

0.0.4

Jul 2, 2024

This version

0.0.3

Jun 30, 2024

0.0.2

Jun 30, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trufl-0.0.3.tar.gz (22.7 kB view hashes)

Uploaded Jun 30, 2024 Source

Built Distribution

trufl-0.0.3-py3-none-any.whl (20.7 kB view hashes)

Uploaded Jun 30, 2024 Python 3

Hashes for trufl-0.0.3.tar.gz

Hashes for trufl-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`cfd50ea7ff73564969aeae8b312ed3b70869c3b833b0a47c04aa012ff74f70dc`
MD5	`393e6c50ef2b3d08e7a82c346f47a10f`
BLAKE2b-256	`686c84840d11b54ff6c09ab0368c709688f0e35cafca82c39b43e239e9e8f2e8`

Hashes for trufl-0.0.3-py3-none-any.whl

Hashes for trufl-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`87f94b6b7c7ce87813f3aceda40a250c6ee7d1abfa56cc530a3a84e3dbcb2d4e`
MD5	`db64db7ec4aa85640fb7e1cf5c90db69`
BLAKE2b-256	`a29cb1c05dec7db9a31bc825362d1364f20a538b6189e7291714228f8542d28a`