Skip to main content

A Python package for Source Apportionment with Sediment Fingerprinting.

Project description

png

DOI

PySASF

A Python package for Source Apportionment with Sediment Fingerprinting.

PySASF was developed to provide computational resources for research aimed at identifying the contributions of various sources to fluvial sediments. More specifically, PySASF implements methods for calculating the proportions contributed by each source from a dataset and its random subsamples, as well as analyzing solution variabilities. Additionally, it includes routines for visualizing confidence regions and other plots from the complete dataset and reduced samples.

This initiative originated from a collaboration between the Department of Soil Science and the Department of Mathematics at the Federal University of Santa Maria (UFSM), with participation from other educational and research institutions. The initial motivation was to reproduce the results published in Clarke and Minella (2016) and to create a package of Python routines to facilitate the replication of the experiment with other data sources.

PySASF has been used and tested first by the Interdisciplinary Research Group on Erosion and Surface Hydrology (GIPEHS at UFSM. New analysis models, resulting from research and development efforts, will be incorporated in the future, based on this academic collaboration.

Install

Download the package from here and install it using the following command line in the directory where the file was downloaded.

$ pip3  install pysasf-0.0.5.tar.gz

You can download a Python script for testing from here. You need to download the data () file from here and store it in a folder named data in the same directory as the script. Then you can run it using the terminal command:

$ python3 cm.py

This script run a command terminal version of the example of usage quick start notebook.

Alternatively you can download the full project sources here, unzip and go to notebooks directory. Open quick_star.ipynb using Jupyter Notebook or Jupyter Lab.

If you receive a No module named 'pysasf' error message, try including the following lines in the beginning of your notebook:

import sys
sys.path.append('/your_path_to/PySASF-main')

Replace your_path_to with the path to the directory where PySASF-main was extracted.

You will needs NumPy, Scipy, MatplotLib and Pandas instaled. All dependencies can be satisfied by an Anaconda installation.

Example of usage

1. Loading the data

A good starting point is to import the BasinData object class to store data from a basin's sediment sources. An instance of BasinData should be created, and the data should be loaded from a file. It is common to store data files in the 'data' directory one level above. The import and creation of an instance of BasinData are shown below.

# If you don't have PySASF instaled, you need set the directory:
import sys
sys.path.append('/home/tiagoburiol/PySASF')
from pysasf.basindata import BasinData
arvorezinha = BasinData("../data/arvorezinha_database.xlsx")

Once the file is loaded, some information and statistics can be visualized, as shown in the following examples.

arvorezinha.infos()
Sample Sizes Fe Mn Cu Zn Ca K P
C 9 9 9 9 9 9 9
E 9 9 9 9 9 9 9
L 20 20 20 20 20 20 20
Y 24 24 24 24 24 24 24
arvorezinha.means()
Means Fe Mn Cu Zn Ca K P
C 6.21 1470.45 18.23 79.71 165.23 3885.12 0.03
E 6.76 811.95 23.28 86.02 76.10 3182.27 0.01
L 6.63 1854.05 20.05 88.28 159.17 6572.31 0.06
Y 6.16 1119.02 30.92 99.66 276.47 9445.76 0.07
arvorezinha.std()
STD Fe Mn Cu Zn Ca K P
C 0.48 548.49 2.41 7.84 82.19 1598.45 0.01
E 0.98 399.90 1.98 6.96 26.21 948.95 0.01
L 1.07 399.77 3.86 15.70 79.33 2205.99 0.01
Y 1.01 294.13 10.13 8.40 79.37 2419.21 0.02

2. Using the clarkeminela module

We can easily reproduce the Clarke and Minella (2016) method for measuring the increase in uncertainty when sampling sediment fingerprinting. A full explanation of this method is available in the paper 'Evaluating sampling efficiency when estimating sediment source contributions to suspended sediment in rivers by fingerprinting.' DOI: 10.1002/hyp.10866. The steps required to achieve the same results described in the paper can be executed with a few function calls, as shown below.

First, we need to import the clarkeminella analysis module. We will refer to it as cm.

import pysasf.clarkeminella as cm

Now we will calculate and save in a file all the possible combinations of proportions contributed by the sediment sources. The routine calculate_and_save_all_proportions() will create two files: one for all possible combinations for each sample in the database, saving their indexes, and another file for the corresponding proportions. The default method for calculation is ordinary least squares. Other methods can be chosen using arvorezinha.set_solver_option(option).

To set your output folder using arvorezinha.set_output_folder(path='/yourpath/folder')

arvorezinha.set_output_folder('../output')
Setting output folder as: ../output
Folder to save output files is: '../output'.
arvorezinha.calculate_and_save_all_proportions(load=False)
Done! Time processing: 1.893726110458374
Total combinations: 38880 , shape of proportions: (38880, 3)
Saving combinations indexes in: ../output/C9E9L20Y24_combstxt
Saving proportions calculated in: ../output/C9E9L20Y24_propstxt
Feasebles boolean array is sabed in: ../output/C9E9L20Y24_feastxt
Time for save files: 0.2960786819458008

If you want to store the proportions solutions and the combination indexes, you can choose load=True(is the defoult option) when call the rotine above. The proportions solutions and the combination indexes wil be stored on BasinDataobject class.

For read the files created and load proportions solutions and the combination indexes we can use the load_combs_and_props_from_files(combs_file, props_file) function. A example is showed below.

combs, Ps = arvorezinha.load_combs_and_props_from_files(arvorezinha.output_folder+'/C9E9L20Y24_combs.txt',
                                                        arvorezinha.output_folder+'/C9E9L20Y24_props.txt')
Loading combs and props files from: ../output

We can verify the loaded array data as follows:

display(combs, Ps)
array([[ 0,  0,  0,  0],
       [ 0,  0,  0,  1],
       [ 0,  0,  0,  2],
       ...,
       [ 8,  8, 19, 21],
       [ 8,  8, 19, 22],
       [ 8,  8, 19, 23]])



array([[ 0.445 , -0.2977,  0.8526],
       [ 0.3761,  0.128 ,  0.4959],
       [ 0.3454,  0.1248,  0.5298],
       ...,
       [ 0.4963, -0.0081,  0.5118],
       [ 0.4212, -0.6676,  1.2464],
       [-0.0679, -0.138 ,  1.206 ]])

The Clarke and Minella's criterion for considering a feasible solution is that the proportion P1 and P2 contributed by each source is less than 1 and greater than 0. We can extract the feaseble solutions usin a function cm_feasebles of clarckeminella analysis module. This is showed below.

Pfea = cm.cm_feasebles(Ps)
print("The total number of feasible solution is:", len(Pfea))
The total number of feasible solution is: 8132

A confidence region can be calculated in 2 dimentions using the $95 %$ points closest to the feaseble proportions average using Mahalanobis's distances until the mean of feaseble proportions. A more detailed explanation can be can be obtained in the Clarke and Minella's paper.

The stat module implement a function for get a confidence region, as can be seen in the example below.

from pysasf import stats
Pcr = stats.confidence_region(Pfea[:,0:2], space_dist='mahalanobis')
print("The total number of points in 95% confidence region is:", len(Pcr))
The total number of points in 95% confidence region is: 7725

Lets draw the confidence region usin the draw_hull(pts) function from plotsmodule.

from pysasf import plots
plots.draw_hull(Pcr, title = 'Confidence region')
Please, set a path to save the convex hull figure.

png

To randomly take a subset of the solutions, with a sample size of 4 for source L, for example, we can do as shown below.

from pysasf import stats
combs,Ps = stats.randon_props_subsamples(arvorezinha, 'Y', 4)
print ("Suconjunto Ps de tamanho:", Ps.shape[0])
Suconjunto Ps de tamanho: 6480

To make the plot of the points and the 95% confidence region and save it to a file, we proceed as follows:

P_cr = cm.cm_feasebles(Ps)
plots.draw_hull(P_cr, savefig = True, path=arvorezinha.output_folder,
                title = 'Confidence region 95% whith Y size = 2')
Plot figure saved in: ../output/convex_hull.png

A figure will be saved in the output folder. If we want to create several plots with a sequence of reductions in the number of samples for a given source, we can proceed as follows.

for n in [2,4,8,12,16,20,24]:
    combs,Ps = stats.randon_props_subsamples(arvorezinha, 'Y', n)
    P_feas = cm.cm_feasebles(Ps)
    P_cr = stats.confidence_region(P_feas,space_dist='mahalanobis2d')
    name = 'confidence_region_Y'+str(n)
    ax = plots.draw_hull(P_cr, savefig = True, 
                         path = arvorezinha.output_folder,filename = name)
    print('Saving figure named:', name)
    
Plot figure saved in: ../output/confidence_region_Y2.png
Saving figure named: confidence_region_Y2
Plot figure saved in: ../output/confidence_region_Y4.png
Saving figure named: confidence_region_Y4
Plot figure saved in: ../output/confidence_region_Y8.png
Saving figure named: confidence_region_Y8
Plot figure saved in: ../output/confidence_region_Y12.png
Saving figure named: confidence_region_Y12
Plot figure saved in: ../output/confidence_region_Y16.png
Saving figure named: confidence_region_Y16
Plot figure saved in: ../output/confidence_region_Y20.png
Saving figure named: confidence_region_Y20
Plot figure saved in: ../output/confidence_region_Y24.png
Saving figure named: confidence_region_Y24

3. Processing data from reductions and repetitions

As a result of Clarke and Minella's article presents table and graphs of average values ​​for 50 repetitions taking subsamples of different sizes drawn from each sample set. A 95% confidence regions are calculated for each sample reduction and the proportions $P_1$ and $P_2$, along with the standard deviations is calculated.

De full analysis can be repreduced and customized usin the routine run_repetitions_and_reduction (basindata, source_key, list_of_reductions,repetitions=50). The results is saved in a csvfile an can be stored and load later. A example is showed below.

cm.run_repetitions_and_reduction (arvorezinha, 'L',[2,4,8,12,16,20,])
Time for all runs: 7.855192184448242
Saving in C9E9L20Y24_L-2-4-8-12-16-20.csv
nSamp CV Mean Std Total Feas MeanP1 MeanP2 MeanP3
0 2 13.6022 0.3463 0.0471 162 859 0.371663 0.278888 0.349450
1 4 7.5992 0.3814 0.0290 324 1527 0.308342 0.235412 0.456241
2 8 4.0347 0.3928 0.0158 648 2821 0.369675 0.266656 0.363668
3 12 2.3799 0.4001 0.0095 972 4713 0.334568 0.230881 0.434550
4 16 1.2213 0.4010 0.0049 1296 6539 0.337595 0.243510 0.418894
5 20 0.0000 0.4024 0.0000 1620 8132 0.339917 0.245394 0.414688
cm.run_repetitions_and_reduction (arvorezinha, 'Y',[2,4,8,12,16,20,24])
Time for all runs: 8.775497436523438
Saving in C9E9L20Y24_Y-2-4-8-12-16-20-24.csv
nSamp CV Mean Std Total Feas MeanP1 MeanP2 MeanP3
0 2 15.1352 0.3603 0.0545 3240 473 0.353225 0.244306 0.402471
1 4 8.1691 0.3817 0.0312 6480 2119 0.403431 0.203006 0.393560
2 8 3.5203 0.3949 0.0139 12960 3584 0.351959 0.223128 0.424913
3 12 2.2865 0.4029 0.0092 19440 3196 0.301662 0.236558 0.461779
4 16 1.9065 0.4004 0.0076 25920 5557 0.361002 0.251664 0.387333
5 20 1.0930 0.4022 0.0044 32400 6984 0.345001 0.251578 0.403419
6 24 0.0000 0.4024 0.0000 38880 8132 0.339917 0.245394 0.414688
from pysasf import plots
files = [arvorezinha.output_folder+'/'+'C9E9L20Y24_Y-2-4-8-12-16-20-24.csv',
         arvorezinha.output_folder+'/'+'C9E9L20Y24_L-2-4-8-12-16-20.csv']

plots.plot_cm_outputs(files, 'nSamp', 'CV', savefig=False)

png

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysasf-0.0.5.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

PySASF-0.0.5-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file pysasf-0.0.5.tar.gz.

File metadata

  • Download URL: pysasf-0.0.5.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for pysasf-0.0.5.tar.gz
Algorithm Hash digest
SHA256 72170013214bb8e4c265bc43f0fdf6350b1aab9add0a586ccf716cae523e9baf
MD5 99daa8ad85edc199c443510fcb8f6bcb
BLAKE2b-256 e4828eab8265ec79760296cb00813c3a2f88ec4939b870c1ebf538cc048ef0ff

See more details on using hashes here.

File details

Details for the file PySASF-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: PySASF-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for PySASF-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 4a75617a4c8a8111643b8f3959b846d824ef9796d5530ddfec526faad014475c
MD5 e228cd2f202dbf39cf7ce6d79d911042
BLAKE2b-256 4c2c5379beccd062e041d0e4353da9429c3f361eb2bf805bf53bb48d57027e19

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page