Skip to main content

Machine learning research in water exploration

Project description



WATex: machine learning research in water exploration

Life is much better with potable water

Documentation Status GitHub GitHub Workflow Status (with branch) Coverage Status GitHub release (latest SemVer including pre-releases) DOI PyPI - Python Version PyPI version Conda Version Anaconda-Server Badge

Overview

WATex is a Python-based library mainly focused on the groundwater exploration (GWE). It brings novel approaches for reducing numerous losses during the hydro-geophysical exploration projects. It encompasses the Direct-current (DC) resistivity ( Electrical profiling (ERP) & vertical electrical sounding (VES)), short-periods electromagnetic (EM), geology and hydrogeology methods. From methodologies based on Machine Learning,
it allows to:

  • auto-detect the right position to locate the drilling operations to minimize the rate of unsuccessful drillings and unsustainable boreholes;
  • reduce the cost of permeability coefficient (k) data collection during the hydro-geophysical engineering projects,
  • predict the water content in the well such as the groundwater flow rate, the level of water inrush, ...
  • recover the EM loss signals in area with huge interferences noises ...
  • etc.

Documentation

Visit the library website for more resources. You can also quick browse the software API reference and flip through the examples page to see some of expected results. Furthermore, the step-by-step guide is elaborated for real-world engineering problems such as computing DC parameters and predicting the k-parameter...

Licence

WATex is under BSD-3-Clause License.

Installation

The system requires preferably Python 3.9+.

  • from pip

WATex can be installed from PyPI platform distribution as:

pip install watex
  • from conda

The installation from conda-forge distribution channel can be achieved with :

conda install -c conda-forge watex

To get the latest development of the code, it is recommended to install it from source using:

git clone https://github.com/WEgeophysics/watex.git 

Furthermore, for step-by-step guide about the installation and how to manage the dependencies, visit our installation guide page.

Some demos

1. Drilling location auto-detection

For this example, we randomly generate 50 stations of synthetic ERP resistivity data with minimum and maximum resistivity values equal to 1e1 and 1e4 ohm.m respectively as:

import watex as wx 
data = wx.make_erp (n_stations=50, max_rho=1e4, min_rho=10., as_frame =True, seed =42 ) 
  • Naive auto-detection (NAD)

The NAD automatically proposes a suitable location with NO restrictions (constraints) observed in the survey site during the GWE. We may understand by suitable, a location expecting to give a flow rate greater than 1m3/hr at least.

robj=wx.ResistivityProfiling (auto=True ).fit(data ) 
robj.sves_ 
Out[1]: 'S025'

The suitable drilling location is proposed at station S25 (stored in the attribute sves_).

  • Auto-detection with constraints (ADC)

The constraints refer to the restrictions observed in the survey area during the DWSC. This is common in real-world exploration. For instance, a station close to a heritage site should be discarded since no drilling operations are authorized at that place. When many restrictions are enumerated in the survey site, they must be listed in a dictionary with a reason and passed to the parameter constraints so these stations should be ignored during the automatic detection. Here is an example of constraints application to our example.

restrictions = {
    'S10': 'Household waste site, avoid contamination',
    'S27': 'Municipality site, no authorization to make a drill',
    'S29': 'Heritage site, drilling prohibited',
    'S42': 'Anthropic polluted place, avoid contamination within a few years',
    'S46': 'Marsh zone, borehole will dry up during the dry season'
 }
robj=wx.ResistivityProfiling (constraints= restrictions, auto=True ).fit(data ) 
robj.sves_
Out[2]: 'S033'

Notice, the station S25 is no longer considered as the suitable location and henceforth, propose S33 as the priority for drilling operations. However, if the station is close to a restricted area, a warning should raise to inform the user to avoid taking a risk to perform a drilling location at that location.

Note that before the drilling operations commence, make sure to carry out the DC-sounding (VES) at that point. WATex computes another parameter called ohmic-area (ohmS) to detect the effectiveness of the existing fracture zone at that point. See more in the software documentation.

2. Predict permeability coefficient k from logging dataset using MXS approach

MXS stands for mixture learning strategy. It uses upstream unsupervised learning for k -aquifer similarity label prediction and the supervising learning for final k-value prediction. For our toy example, we use two boreholes data stored in the software and merge them to compose a unique dataset. In addition, we dropped the remark observation which is subjective data not useful for k prediction as:

import watex as wx
h= wx.fetch_data("hlogs", key='h502 h2601', drop_observations =True ) # returns log data object.
h.feature_names
Out[3]: Index(['hole_id', 'depth_top', 'depth_bottom', 'strata_name', 'rock_name',
           'layer_thickness', 'resistivity', 'gamma_gamma', 'natural_gamma', 'sp',
           'short_distance_gamma', 'well_diameter'],
          dtype='object')
hdata = h.frame 

k is collected as continue values (m/darcies) and should be categorized for the naive group of aquifer prediction (NGA). The latter is used to predict upstream the MXS target ymxs. Here, we used the default categorization provided by the software and we assume that in the area, there are at least 2 groups of the aquifer. The code is given as:

mxs = wx.MXS (kname ='k', n_groups =2).fit(hdata) 
ymxs=mxs.predictNGA().makeyMXS(categorize_k=True, default_func=True)
mxs.yNGA_ [62:74]
Out[4]: array([1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2])
ymxs[62:74]
Out[5]: array([ 0,  0,  0,  0, 12, 12, 12, 12, 12, 12, 12, 12])

To understand the transformation from NGA to MXS target (ymxs), please, have a look of the following paper. Once the MXS target is predicted, we call the make_naive_pipe function to impute, scale, and transform the predictor X at once into a compressed sparse matrix ready for final prediction using the support vector machines and random forest as examples. Here we go:

X= hdata [h.feature_names]
Xtransf = wx.make_naive_pipe (X, transform=True) 
Xtransf 
Out[6]: 
<218x46 sparse matrix of type '<class 'numpy.float64'>'
	with 2616 stored elements in Compressed Sparse Row format> 
Xtrain, Xtest, ytrain, ytest = wx.sklearn.train_test_split (Xtransf, ymxs ) 
ypred_k_svc= wx.sklearn.SVC().fit(Xtrain, ytrain).predict(Xtest)
ypred_k_rf = wx.sklearn.RandomForestClassifier ().fit(Xtrain, ytrain).predict(Xtest)

We can now check the k prediction scores using accuracy_score function as:

wx.sklearn.accuracy_score (ytest, ypred_k_svc)
Out[7]: 0.9272727272727272
wx.sklearn.accuracy_score (ytest, ypred_k_rf)
Out[8]: 0.9636363636363636

As we can see, the results of k prediction are quite satisfactory for our toy example using only two boreholes data. Note that things can become more interesting when using many boreholes data. For more in depth, visit our examples page.

3. EM tensors recovering and analyses

For a basic illustration, we fetch 20 audio-frequency magnetotelluric (AMT) data stored as EDI objects collected in a huayuan area (Hunan province, China) with multiple interferences noised as:

import watex as wx
e= wx.fetch_data ('huayuan', samples =20 , key='noised') # returns an EM -objets 
edi_data = e.data # get the array  of EDI objects  

Before EM data restoration, we can analyse the quality control (QC) of the data and show the confidence interval that makes us confident about the data at each station. By default the confidence test uses the errors in the resistivity tensor. Let's getting started:

po= wx.EMProcessing ().fit(edi_data)   # make a EM processing object 
r= po.qc (tol =0.2 , return_ratio = True ) # consider good data from 80% significance.  
r
Out[9]: 0.95

We can then visualizate the confidence interval at the 20 AMT stations as:

wx.plot_confidence_in(edi_data) 

Alternatively, we can use the qc function (more consistent) to get the valid data and the interpolated frequencies. For instance, we want to known the number of frequencies dropped during the control analysis. Just do it:

QCo= wx.qc (edi_data , tol=.2,  return_qco =True )  # returns the quality control object
len(e.emo.freqs_)   # number of frequency in noised data   
Out[10]: 56
len(QCo.freqs_)     # number of frequency in valid data after QC  
Out[11]: 53
QCo.invalid_freqs_  # get the useless frequencies based on tol param so we can drop them into the EM data 
Out[12]: array([8.19200e+04, 4.85294e+01, 5.62500e+00]) #  81920.0, 48.53 and 5.625 Hz 

The plot_confidence_in function allows to assert whether tensor values can be recovered for these three frequencies at each station. Note that the threshold for the EM data to be restored is set to 50%. Below this value, data is unrecoverable. Furthermore, if our QC rate r=95% is not to be yet satisfactory in our AMT data, we can process to the impedance tensor Z restoration as:

Z=po.zrestore() # returns 3D tensors (Nfrequency, 2, 2), 2x2 for XX, XY, YX and YY components. 

Now, let's evaluate the new QC ratio to verify the recovering efficaciousness such as:

r, =wx.qc (Z)
r
Out[13]: 1.0

Great! As we can see, the tensor is restored at each station with 100% ratio and we notice that the confidence line is above 95% in alongside the 20 investigation sites and compare to the previous plot ( rate =75%). The snippet below can allow to visualize this improvement with the confidence interval as:

wx.plot_confidence_in(Z)  

Besides, user can flip through the following links for more examples about EM tensor restoring,
the sknewness analysis plots, the strike plot, the filtering data, and else...

Citations

If the software seemed useful to you in any published work, we will appreciate to cite the paper below:

Kouadio, K.L., Liu, J., Liu, R., 2023. watex: machine learning research in water exploration. SoftwareX . 101367(2023). https://doi.org/10.1016/j.softx.2023.101367

In most situations where WATex is cited, a citation to scikit-learn would also be appropriate.

See also some case history papers using WATex.

Contributions

  1. Department of Geophysics, School of Geosciences & Info-physics, Central South University, China.
  2. Hunan Key Laboratory of Nonferrous Resources and Geological Hazards Exploration Changsha, Hunan, China
  3. Laboratoire de Geologie Ressources Minerales et Energetiques, UFR des Sciences de la Terre et des Ressources Minières, Université Félix Houphouët-Boigny, Cote d'Ivoire.

Developer: L. Kouadio <etanoyau@gmail.com>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

watex-0.2.4.tar.gz (8.9 MB view details)

Uploaded Source

Built Distribution

watex-0.2.4-py3-none-any.whl (8.7 MB view details)

Uploaded Python 3

File details

Details for the file watex-0.2.4.tar.gz.

File metadata

  • Download URL: watex-0.2.4.tar.gz
  • Upload date:
  • Size: 8.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for watex-0.2.4.tar.gz
Algorithm Hash digest
SHA256 e1a037379faaf359c5b3fe2cd6d547b91f7de88fbb6de65eadbab12f9ea1bb8b
MD5 6ce3ef3d16fbf7e93a31068ee218767f
BLAKE2b-256 9fc4e1ccca8984103ac32c29004e555760eb3b56276858473fe6132bb6fea5b8

See more details on using hashes here.

File details

Details for the file watex-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: watex-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 8.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for watex-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2edf7fcbc2d148b98ba3262d24ad9617741c116a76d46ec9c2aa5b8a1a52bf7d
MD5 45d12ac0dc89c3440f5be5a14937cfdc
BLAKE2b-256 a150be752fa75904c177a606bbf8928939c87a7ec13d11bc31539fd72a03b219

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page