Skip to main content

A program that calculates the skew of two selectable nucleotides for a genome sequence in FASTA or GenBank format.

Project description

GenSkew is an application for computing and plotting nucleotide skew data.

GenSkew calculates the incremental and the cumulative skew of two selectable nucleotides for a given sequence according to this formula: Skew = (nucleotide1 - nucleotide2) / (nucleotide1 + nucleotide2)

The results are provided as data table and as graphical plot. The global minimum and maximum are displayed in the cumulative graph. The minimum and maximum of a GC-skew can be used to predict the origin of replication (minimum) and the terminus location (maximum) in prokaryotic genomes.

There are three versions of this Program: Genskew_univiecube (the python library described below), Genskew_cc (a commandline client) and GUIskew (the Graphical Version of Genskew).

Installing the program with the command you can copy above will install all three of them. The Graphical Interface can be started via the command python -m GUIskew. By calling python3 -m genskew -h you will see a detailed description how to use the commandline interface. It can analyze multiple sequences in one command.

For using the library you first have to specify the sequence as an object:

import genskew_univiecube as gs

sequence = "GATCCTAGATTAAGC"

name = gs.Object(sequence, "G", "C", stepsize, windowsize)

In this example the sequence is a string and the first nucleotide is G and the second is C. Stepsize and Windowsize don't have to be specified, if they are not specified they will be automatically calculated to best fit the Graph. This is usefull if multiple sequences are processed after another.

After the Object is defined, we need to generate the results:

import genskew_univiecube as gs

sequence = gs.gen_sequence(filelocation) name = gs.Object(sequence, "G", "C", stepsize, windowsize) result = gs.Object.gen_results(name)

In this example the sequence is generated by calling gen_sequence, this takes a fasta or genbank file and outputs a string with the sequence in it. The results can be retrieved as follows:

import genskew_univiecube as gs

sequence = gs.gen_sequence(filelocation) name = gs.Object(sequence, "G", "C", stepsize, windowsize) result = gs.Object.gen_results(name)

print(result.skew) gs.plot_sequence(result, filelocation, outputfolder, output_filetype, dpi)

There are different results: .skew (which will output the skew as a listof y values), .x (which will output the corresponding x values), .cumulative (which will output the cumulative skew as y values), .max_cm_position and .min_cm_position (outputs the x value of the max / min cumulative), .stepsize and .windowsize (outputs as integer), .nuc_1 and .nuc_2 (outputs the first and second nucleotide as a string)

plot_sequence plots and saves a graph of the skew. The arguments dpi, out_filetype and outputfolder are optional, the default output file type is png and the outputfolder is by default the folder in which the sequence file was (filelocation). The dpi is calculated according to the size of the graph.

The function input_files(file_locations) will check the given path for fasta , gb or .gz files and then returns everything in a list. file_location has to be a list and can contain direct paths or folders.

Note, that in this example only one sequence can be analyzed at once.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Genskew-univiecube-0.0.6.tar.gz (4.6 kB view hashes)

Uploaded Source

Built Distribution

Genskew_univiecube-0.0.6-py3-none-any.whl (4.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page