Skip to main content

Flexible sliding window analysis

Project description

Weighslide

Weighslide is a python program to calculate sliding windows across of a list of numerical values.
The user sets the window size, and the exact weighting of each value in the window.

What is it used for?

Weighslide was developed for use in bioinformatics. Alpha-helices are common protein secondary structure, and have a periodicity of 3.6 residues per turn. Weighslide allows numerical values to be weighted according to alpha-helical peridicity.

Note that weighslide is not currently optimised for large datasets.

Citation:

Please cite as follows:
"A sliding window analysis was performed using the weighslide package in python (Mark Teese, Technical University of Munich)."

Keywords:

sliding window, rolling window, weighted window, data normalisation, data normalization, 1D array, numerical list

How it works

Weighslide takes as an input a 1D array (list) of numerical data, and applies a user-defined weighting and algorithm in a sliding-window fashion across the data.

For example:  
window = [ 2  5  2 ]  
statistic = "mean"  
dataset = [ 0  0  0  1  1  2  3  5  8  13  21]  
The window length is 3. The array will therefore be sliced as follows:  
........[ NaN  0  0 ]  
.............[ 0  0  1 ]  
................[ 0  1  1 ]  
...................[ 1  1  2 ] and so on until the final slice [ 13  21  NaN ]  
Each array slice will be "weighted" by multiplication with the window, array-style, resulting in:  
........[ NaN  0  0 ]  
.............[ 0  0  2 ]  
................[ 0  5  2 ]  
...................[ 2  5  4 ] and so on.  
If the "statistic" variable is given as "mean", a mean will be calculated for each array slice.  
..........[    0    ]  
.............[   0.66  ]  
................[   2.33  ]  
...................[  3.66  ] and so on.  
The "statistic" can be mean, std, or sum.  
The value (in this case the mean) will replace the central position in the output 1D array.  
output = [ 0.00  0.00  0.66  2.33  3.66  6.00  9.66  15.6  25.3  41.0  65.5  ]  

The first and last array slices always contain flanking "not a number" (Nan) values, which are ignored in all calculations.
The first and last output values therefore do not represent results from true, full-length windows.

Installation

pip install weighslide

Weighslide depends on the following:

  • python (tested for version 3.5)
  • numpy
  • pandas
  • matplotlib

For Windows users, we recommend Anaconda python 3.x. The Anaconda package should contain all required python packages.

To install as a python package from GitHub, download and unzip the latest release and run python setup.py install in the relevant package folder.

Test

To test the package, open the command console and navigate to the folder
containing weighslide.py. Run the following:

python weighslide.py [1,1,"x",1,1] mean -r [1,1,2,3,5,8,13,21,34]

If successful, an output list will be printed on the screen.

Usage

Here is an example of how to run weighslide within python, using an excel input file.

import weighslide  
infile = r"D:\Path\To\Your\File\infile_name.xlsx"  
# for excel files, you will need to input the sheet name containing the data  
excel_kwargs = {"sheet_name":"Sheet1"}  
# if it's an excel file with multiple columns, define which column contains the data  
column = "your data column header"  
# define the window and statistic. The following parameters are used  
# if you want to calculate mean of the four surrounding values in the sequence  
window = [1,1,"x",1,1]  
statistic = "mean"  
name = "your short sample name"  
weighslide.run_weighslide(infile, window, statistic, name=name, column=column, excel_kwargs=excel_kwargs)  

Here is an example of how to run weighslide from the command line, using a csv input file.

python weighslide.py [1,1,'x',1,1] mean -i "D:\Path\To\Your\File\infile_name.xlsx" -c "your data column header"  

In both cases the output files will be created in a subfolder within the same location as the input file.

For more help regarding the command-line options:
python weighslide.py -h

Here is an example designed for jupyter notebook, showing the power of weighslide
to smoothen a repeated element in a noisy dataset.

# create a noisy wave that repeats every 6th position. Save to csv.  
import weighslide  
import numpy as np  
import pandas as pd  
import matplotlib.pyplot as plt  
% matplotlib inline  
plt.rcParams["savefig.dpi"] = 120  
df = pd.DataFrame()  
df['wave'] = [1,1,1,3,3,3]*8  
df["random"] = np.random.random_sample(df.shape[0])  
df["noisy wave"] = df.wave + df.random*5  
df.plot(title="input data: noisy wave")  
df.to_csv("wave.csv")  

Image of input

# run weighslide with a window that averages every 6th position  
window = "9xxxxx9xxxxx9xxxxx9xxxxx9xxxxx9xxxxx9"  
weighslide.run_weighslide("wave.csv", window, "mean", name="wavetest", column="noisy wave", overwrite=True)  

Image of output

Examples of windows:

[1,1,1]

  • if "statistic" is set to "mean", this window returns the average of the central position, and the two neighbouring positions
  • the window size is 3

[1,1,"x",1,1]

  • the central position "x" has no weighting at all
  • the window size is 5, it consists of the central position, two upstream, and two downstream positions
  • the positions upstream (-1, -2) and downstream (1, 2) of the central position are all equally weighted
  • if the statistic is set to "mean", the result for each position will simply be the average of the surrounding 4 positions

[0.5, 1, 0.5, 2, 0.5, 1, 0.5]

  • the central position "2" is highly weighted (2*orig value)
  • the window size is 7, it consists of the central position, three upstream, and three downstream positions
  • the positions upstream (-1, -2, -3) and downstream (1, 2, 3) of the central position are unequally weighted
  • if the statistic is set to "mean", the result for each position will simply be the average of the surrounding 4 positions

Contribute

If you encounter a bug or weighslide doesn't work for any reason, please send me an email (available in an image below, or at my TUM website) or initiate an issue in Github.
Pull requests are welcome.

License

Weighslide is free software distributed under the permissive MIT license.

Contact

contact details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weighslide-0.2.5.tar.gz (401.3 kB view details)

Uploaded Source

Built Distribution

weighslide-0.2.5-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file weighslide-0.2.5.tar.gz.

File metadata

  • Download URL: weighslide-0.2.5.tar.gz
  • Upload date:
  • Size: 401.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for weighslide-0.2.5.tar.gz
Algorithm Hash digest
SHA256 65357f9c8dcee482e3b4fddba3197aaf1257312a793c4941c6622d0bc8d5cdd6
MD5 283ff06ea925fa18afa183bc56147383
BLAKE2b-256 ece42f158c1767fff5dfcca5a71332b076a0eff1ef958f5cf08ec84c86acf65c

See more details on using hashes here.

File details

Details for the file weighslide-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: weighslide-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for weighslide-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 6ca01bb4dd2b702c644812db4f99776d07f39c4407b6f9713a196ce3c9640b11
MD5 255b699e774b9b3df7a84f6e67608311
BLAKE2b-256 afeca34358157f31638d1b2882ea0ba2ca0e1b757ed4eaa9b6016f9eed96207e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page