Machine learning research in water exploration
Project description
WATex: machine learning research in water exploration
Life is much better with potable water
Overview
WATex is a Python-based library mainly focused on the groundwater exploration (GWE). It brings novel approaches
for reducing numerous losses during the hydro-geophysical exploration projects. It encompasses
the Direct-current (DC) resistivity ( Electrical profiling (ERP) & vertical electrical sounding (VES)),
short-periods electromagnetic (EM), geology and hydrogeology methods. From methodologies based on Machine Learning,
it allows to:
- auto-detect the right position to locate the drilling operations to minimize the rate of unsuccessful drillings and unsustainable boreholes;
- reduce the cost of permeability coefficient (k) data collection during the hydro-geophysical engineering projects,
- predict the water content in the well such as the groundwater flow rate, the level of water inrush, ...
- recover the EM loss signals in area with huge interferences noises ...
- etc.
Documentation
Visit the library website for more resources. You can also quick browse the software API reference and flip through the examples page to see some of expected results. Furthermore, the step-by-step guide is elaborated for real-world engineering problems such as computing DC parameters and predicting the k-parameter...
Licence
WATex is under BSD-3-Clause License.
Installation
The system requires preferably Python 3.9+.
- from pip
WATex can be installed from PyPI platform distribution as:
pip install watex
- from conda
The installation from conda-forge distribution channel can be achieved with :
conda install -c conda-forge watex
To get the latest development of the code, it is recommended to install it from source using:
git clone https://github.com/WEgeophysics/watex.git
Furthermore, for step-by-step guide about the installation and how to manage the dependencies, visit our installation guide page.
Some demos
1. Drilling location auto-detection
For this example, we randomly generate 50 stations of synthetic ERP resistivity data with minimum
and maximum
resistivity values equal to 1e1
and 1e4
ohm.m respectively as:
import watex as wx
data = wx.make_erp (n_stations=50, max_rho=1e4, min_rho=10., as_frame =True, seed =42 )
- Naive auto-detection (NAD)
The NAD automatically proposes a suitable location with NO restrictions (constraints) observed in the survey site
during the GWE. We may understand by suitable
, a location expecting to give a flow rate greater
than 1m3/hr at least.
robj=wx.ResistivityProfiling (auto=True ).fit(data )
robj.sves_
Out[1]: 'S025'
The suitable drilling location is proposed at station S25
(stored in the attribute sves_
).
- Auto-detection with constraints (ADC)
The constraints refer to the restrictions observed in the survey area during the DWSC. This is common
in real-world exploration. For instance, a station close to a heritage site should be discarded
since no drilling operations are authorized at that place. When many restrictions
are enumerated in the survey site, they must be listed in a dictionary with a reason and passed to the parameter
constraints
so these stations should be ignored during the automatic detection. Here is an example of constraints
application to our example.
restrictions = {
'S10': 'Household waste site, avoid contamination',
'S27': 'Municipality site, no authorization to make a drill',
'S29': 'Heritage site, drilling prohibited',
'S42': 'Anthropic polluted place, avoid contamination within a few years',
'S46': 'Marsh zone, borehole will dry up during the dry season'
}
robj=wx.ResistivityProfiling (constraints= restrictions, auto=True ).fit(data )
robj.sves_
Out[2]: 'S033'
Notice, the station S25
is no longer considered as the suitable
location and henceforth, propose S33
as the
priority for drilling operations. However, if the station is close to a restricted area, a warning should raise to
inform the user to avoid taking a risk to perform a drilling location at that location.
Note that before the drilling operations commence, make sure to carry out the DC-sounding (VES) at that point. WATex computes
another parameter called ohmic-area
(ohmS)
to detect the effectiveness of the existing fracture zone at that point. See more in
the software documentation.
2. Predict permeability coefficient k
from logging dataset using MXS approach
MXS stands for mixture learning strategy. It uses upstream unsupervised learning for
k
-aquifer similarity label prediction and the supervising learning for
final k
-value prediction. For our toy example, we use two boreholes data
stored in the software and merge them to compose a unique dataset. In addition, we dropped the
remark
observation which is subjective data not useful for k
prediction as:
import watex as wx
h= wx.fetch_data("hlogs", key='h502 h2601', drop_observations =True ) # returns log data object.
h.feature_names
Out[3]: Index(['hole_id', 'depth_top', 'depth_bottom', 'strata_name', 'rock_name',
'layer_thickness', 'resistivity', 'gamma_gamma', 'natural_gamma', 'sp',
'short_distance_gamma', 'well_diameter'],
dtype='object')
hdata = h.frame
k
is collected as continue values (m/darcies) and should be categorized for the
naive group of aquifer prediction (NGA). The latter is used to predict
upstream the MXS target ymxs
. Here, we used the default categorization
provided by the software and we assume that in the area, there are at least 2
groups of the aquifer. The code is given as:
mxs = wx.MXS (kname ='k', n_groups =2).fit(hdata)
ymxs=mxs.predictNGA().makeyMXS(categorize_k=True, default_func=True)
mxs.yNGA_ [62:74]
Out[4]: array([1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2])
ymxs[62:74]
Out[5]: array([ 0, 0, 0, 0, 12, 12, 12, 12, 12, 12, 12, 12])
To understand the transformation from NGA to MXS target (ymxs
), please, have a look
of the following paper.
Once the MXS target is predicted, we call the make_naive_pipe
function to
impute, scale, and transform the predictor X
at once into a compressed sparse
matrix ready for final prediction using the support vector machines and
random forest as examples. Here we go:
X= hdata [h.feature_names]
Xtransf = wx.make_naive_pipe (X, transform=True)
Xtransf
Out[6]:
<218x46 sparse matrix of type '<class 'numpy.float64'>'
with 2616 stored elements in Compressed Sparse Row format>
Xtrain, Xtest, ytrain, ytest = wx.sklearn.train_test_split (Xtransf, ymxs )
ypred_k_svc= wx.sklearn.SVC().fit(Xtrain, ytrain).predict(Xtest)
ypred_k_rf = wx.sklearn.RandomForestClassifier ().fit(Xtrain, ytrain).predict(Xtest)
We can now check the k
prediction scores using accuracy_score
function as:
wx.sklearn.accuracy_score (ytest, ypred_k_svc)
Out[7]: 0.9272727272727272
wx.sklearn.accuracy_score (ytest, ypred_k_rf)
Out[8]: 0.9636363636363636
As we can see, the results of k
prediction are quite satisfactory for our
toy example using only two boreholes data. Note that things can become more
interesting when using many boreholes data. For more in
depth, visit our examples page.
3. EM tensors recovering and analyses
For a basic illustration, we fetch 20 audio-frequency magnetotelluric (AMT) data
stored as EDI objects collected in a huayuan
area (Hunan province, China) with
multiple interferences noised as:
import watex as wx
e= wx.fetch_data ('huayuan', samples =20 , key='noised') # returns an EM -objets
edi_data = e.data # get the array of EDI objects
Before EM data restoration, we can analyse the quality control (QC) of the data and show the confidence interval that makes us confident about the data at each station. By default the confidence test uses the errors in the resistivity tensor. Let's getting started:
po= wx.EMProcessing ().fit(edi_data) # make a EM processing object
r= po.qc (tol =0.2 , return_ratio = True ) # consider good data from 80% significance.
r
Out[9]: 0.95
We can then visualizate the confidence interval at the 20 AMT stations as:
wx.plot_confidence_in(edi_data)
Alternatively, we can use the qc
function (more consistent) to get the valid data and
the interpolated frequencies. For instance, we want to known the number of frequencies dropped
during the control analysis. Just do it:
QCo= wx.qc (edi_data , tol=.2, return_qco =True ) # returns the quality control object
len(e.emo.freqs_) # number of frequency in noised data
Out[10]: 56
len(QCo.freqs_) # number of frequency in valid data after QC
Out[11]: 53
QCo.invalid_freqs_ # get the useless frequencies based on tol param so we can drop them into the EM data
Out[12]: array([8.19200e+04, 4.85294e+01, 5.62500e+00]) # 81920.0, 48.53 and 5.625 Hz
The plot_confidence_in
function allows to assert whether tensor values can be recovered
for these three frequencies at each station. Note that the threshold for the EM data
to be restored is set to 50%
. Below this value, data is unrecoverable.
Furthermore, if our QC rate r=95%
is not to be yet satisfactory in our AMT data, we can
process to the impedance tensor Z
restoration as:
Z=po.zrestore() # returns 3D tensors (Nfrequency, 2, 2), 2x2 for XX, XY, YX and YY components.
Now, let's evaluate the new QC ratio to verify the recovering efficaciousness such as:
r, =wx.qc (Z)
r
Out[13]: 1.0
Great! As we can see, the tensor is restored at each station with 100%
ratio and we notice
that the confidence line is above 95% in alongside the 20 investigation sites and
compare to the previous plot ( rate =75%
). The snippet below can allow to visualize
this improvement with the confidence interval as:
wx.plot_confidence_in(Z)
Besides, user can flip through the following links for more examples about EM tensor restoring,
the sknewness analysis plots,
the strike plot,
the filtering data, and else...
Citations
If the software seemed useful to you in any published work, we will appreciate to cite the paper below:
Kouadio, K.L., Liu, J., Liu, R., 2023. watex: machine learning research in water exploration. SoftwareX . 101367(2023). https://doi.org/10.1016/j.softx.2023.101367
In most situations where WATex is cited, a citation to scikit-learn would also be appropriate.
See also some case history papers using WATex.
Contributions
- Department of Geophysics, School of Geosciences & Info-physics, Central South University, China.
- Hunan Key Laboratory of Nonferrous Resources and Geological Hazards Exploration Changsha, Hunan, China
- Laboratoire de Geologie Ressources Minerales et Energetiques, UFR des Sciences de la Terre et des Ressources Minières, Université Félix Houphouët-Boigny, Cote d'Ivoire.
Developer: L. Kouadio <etanoyau@gmail.com>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file watex-0.2.4.tar.gz
.
File metadata
- Download URL: watex-0.2.4.tar.gz
- Upload date:
- Size: 8.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1a037379faaf359c5b3fe2cd6d547b91f7de88fbb6de65eadbab12f9ea1bb8b |
|
MD5 | 6ce3ef3d16fbf7e93a31068ee218767f |
|
BLAKE2b-256 | 9fc4e1ccca8984103ac32c29004e555760eb3b56276858473fe6132bb6fea5b8 |
File details
Details for the file watex-0.2.4-py3-none-any.whl
.
File metadata
- Download URL: watex-0.2.4-py3-none-any.whl
- Upload date:
- Size: 8.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2edf7fcbc2d148b98ba3262d24ad9617741c116a76d46ec9c2aa5b8a1a52bf7d |
|
MD5 | 45d12ac0dc89c3440f5be5a14937cfdc |
|
BLAKE2b-256 | a150be752fa75904c177a606bbf8928939c87a7ec13d11bc31539fd72a03b219 |