Skip to main content

A program that calculates the skew of two selectable nucleotides for a genome sequence in FASTA or GenBank format.

Project description

GenSkew is an application for computing and plotting nucleotide skew data.

For a more detailed description and the online version of genskew take a look at the website: genskew.csb.univie.ac.at

GenSkew calculates the incremental and the cumulative skew of two selectable nucleotides for a given sequence according to this formula: Skew = (nucleotide1 - nucleotide2) / (nucleotide1 + nucleotide2)

The results are provided as data table and as graphical plot. The global minimum and maximum are displayed in the cumulative graph. The minimum and maximum of a GC-skew can be used to predict the origin of replication (minimum) and the terminus location (maximum) in prokaryotic genomes.

There are three versions of this Program: Genskew_univiecube (the python library described below), Genskew_cc (a commandline client) and GUIskew (the Graphical Version of Genskew).

Installing the program with the command you can copy above will install all three of them. The Graphical Interface can be started via the command python -m GUIskew. By calling python3 -m genskew -h you will see a detailed description how to use the commandline interface. It can analyze multiple sequences in one command.

For using the library you first have to specify the sequence as an object:

import genskew_univiecube as gs

sequence = "GATCCTAGATTAAGC"

name = gs.Object(sequence, "G", "C", stepsize, windowsize)

In this example the sequence is a string and the first nucleotide is G and the second is C. Stepsize and Windowsize don't have to be specified, if they are not specified they will be automatically calculated to best fit the Graph. This is usefull if multiple sequences are processed after another.

After the Object is defined, we need to generate the results:

import genskew_univiecube as gs

sequence = gs.gen_sequence(filelocation) name = gs.Object(sequence, "G", "C", stepsize, windowsize) result = gs.Object.gen_results(name)

In this example the sequence is generated by calling gen_sequence, this takes a fasta or genbank file and outputs a string with the sequence in it. The results can be retrieved as follows:

import genskew_univiecube as gs

sequence = gs.gen_sequence(filelocation) name = gs.Object(sequence, "G", "C", stepsize, windowsize) result = gs.Object.gen_results(name)

print(result.skew) gs.plot_sequence(result, filelocation, outputfolder, output_filetype, dpi, skewi)

There are different results: .skew (which will output the skew as a listof y values), .x (which will output the corresponding x values), .cumulative (which will output the cumulative skew as y values), .max_cm_position and .min_cm_position (outputs the x value of the max / min cumulative), .stepsize and .windowsize (outputs as integer), .nuc_1 and .nuc_2 (outputs the first and second nucleotide as a string)

plot_sequence plots and saves a graph of the skew. The arguments dpi, out_filetype and outputfolder are optional, the default output file type is png and the outputfolder is by default the folder in which the sequence file was (filelocation). The dpi is calculated according to the size of the graph.

The function input_files(file_locations) will check the given path for fasta , gb or .gz files and then returns everything in a list. file_location has to be a list and can contain direct paths or folders.

As of 0.1.0 the SkewIT¹ Index has been added. It can be activated by having the "skewi" parameter as "true".

Note, that in this example only one sequence can be analyzed at once.

Updatelog Genskew-univiecube

Version 0.1.3

general Bug fix

Version 0.1.2

fixed the SkewIT graph sometimes not displaying the right nucleotides on the axis label

Version 0.1.1

general Bug fix

updated the Readme

Version 0.1.0

Added the SkewIT¹ algorithm to GenSkew

Additional Parameter for plot_sequence

Version 0.0.9

First finnished release of Genskew

REFERENCES:

1: SkewIT, https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008439 (06.04.2022), SkewIT: The Skew Index Test for large scale GC Skew analysis of bacterial genomes, Jennifer Lu, Steven L. Salzberg

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Genskew-univiecube-0.1.3.tar.gz (5.4 kB view hashes)

Uploaded source

Built Distribution

Genskew_univiecube-0.1.3-py3-none-any.whl (5.6 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page