Skip to main content

A python for plotting GC-skew from DNA sequences.

Project description

gcskewer

create GC skew plots from DNA sequences in python

Installation

The easiest way to install gcskewer is though the python package index.

pip install gcskewer

This will fetch and install the latest version from: https://pypi.org/project/gcskewer/

You can also install gcskewer by cloning this repository.

gcskewer requires Bio,matplotlib and plotly. They should be installed automatically.

Usage

Input

gcskewer can take DNA sequences in .fasta or .gbk format. You can specify with -f/--fasta or -g/--gbk. You can't do both at the same time - only define you sequence one! For example:

gcskewer -s -g example.gbk

or

gcskewer -s -f example.fasta

Output

gcskewer has three output formats: .csv (a comma seperated table of the results), .svg (an editable vector format graph) and .html (an interactive graph of the results). You can specify which outputs you want with -c/--csv, -s/--svg and -p/--plot (for the .html). If you are unsure, you can just specify all three:

gcskewer -g example.gbk -c -s -p

Window and Step Size

gcskewer will automatically decide the window and step size for the analysis, however you can set these values yourself. For best results, I recommend using a step size that will result in around 1,000 steps. E.g. for a sequence of 50 kb use a step size of 50. Ensure that the window size is at least the same size as the step. You can set the window and step size with -ws/--window-size and -ss/--step-size, respectively. For example:

gcskewer -g example.gbk -ss 50 -ws 500

Example Data

Example data and output is provided in the example_data directory in this repository. There are two subdirectories fasta and genbank to illustrate how gcskewer operates on different input types. Each directry contains the .csv, .svg and .html output and the command used to generate then data is stored as command.bash.

This script was origionally inspired by Nivina et al.'s paper: GRINS: Genetic elements that recode assembly-line polyketide synthases and accelerate their diversification. As such, I used the polyketide synthase tylactone as a test case. The sequence was obtained from MiBiG.

gcskewer example output SVG

Citation

If you use gcskewer, please cite:

Gomez-Escribano, J. P., Dorai-Raj, S., Baker, D., Lacey, E., Wilkinson, B. and Booth, T. J. Evidence supporting the first secondary chromosome in actinobacteria as a hallmark of the Embleya genus. BioRxiv (2025). DOI: https://doi.org/10.1101/2025.07.03.662523

Versions

  • 1.1.2
    • added the option to write the output to a specific directory with -d or --dir
    • organised arguments into argument groups
    • added matplotlib to setup.py
  • 1.1.1
    • fixed error in midpoint calculation
  • 1.1.0
    • now also plots overall GC content
    • frame plot data is now recorded as a class as opposed to depending pandas, this significantly improves runtime
    • better naming of internal variables and functions
    • removed erroneous placeholder text from parser and added example usage
  • 1.0.0
    • initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcskewer-1.1.2.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gcskewer-1.1.2-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file gcskewer-1.1.2.tar.gz.

File metadata

  • Download URL: gcskewer-1.1.2.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for gcskewer-1.1.2.tar.gz
Algorithm Hash digest
SHA256 7f7d05b67274d423d01951c1c443bc6e151bfc2c5723ed6e052b138b30ef4fa1
MD5 84795a3e5358ff2a9235c97f29088ac5
BLAKE2b-256 65c992a688b552a7fccba142dceb822ca8b45cb7549373e66abda976e663fa1b

See more details on using hashes here.

File details

Details for the file gcskewer-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: gcskewer-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for gcskewer-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7ac4febe6ecbad8c69e4ac30f5baa8d8ee390fa90584b52826da6a9e7cc94889
MD5 a729e7c75fac8e3ff3fa1ee18978f0a5
BLAKE2b-256 e649443b0810ce9160d7a0d3745c348fda55f637467aabb4b98a6c4c8c02467e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page