The Dinucleotide Quantification Python package

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

DinuQ

The DinuQ (Dinucleotide Quantification) Python3 package provides a range of metrics for quantifying nucleotide, dinucleotide representation and synonymous codon usage in a DNA/RNA sequence. These include the recently developed corrected Synonymous Dinucleotide Usage (SDUc) and corrected Relative Synonymous Dinucleotide Usage (RSDUc).

Version 1.1 of Dinuq introduces the new corrected SDU metric along with some other related modules.

Proper documentation and web-based version under construction!

under construction

If the preliminary documentation below is unclear please don't hestitate to contact me!

Usage

Package installation

Using pip, in a Unix terminal do: pip install dinuq

Then in python do: import dinuq

Modules

Important: RSDUc and SDUc can only be calculated for coding sequences! Make sure that you fasta file doesn't have any non-coding sequences.

dinuq.SDUc()

The SDUc module will calculate the corrected Synonymous Dinucleotide Usage for all sequences in a given fasta file.

Arguments

Required arguments:

a fasta file with: : - any number of coding sequences (no internal stop codons) - a different, preferably short, fasta header for each sequence (e.g. an accession)
A list of dinucleotides of interest (still needs to be a list if it's only one, e.g. ['CpG'])

Optional arguments:

A list of dinucleotide frame positions. By default the module will calculate the SDUc for all coding positions (pos1, pos2, bridge), for each specified dinucleotide.
If you want to calculate error intervals for the SDU values, you can specify a number of iterations for the error measuring method (suggested value between 100 and 1000). Notice that this will significantly slow down the calculation.
You can finally specify custom single nucleotide compositions to base the expected dinucleotide usage for each accession in your fasta file (e.g. if you want to calculate SDUc values for all CDS in a genome, given the nt composition of the entire genome, instead of the nt composition of each CDS). This should be provided as a dictionary as so: {'acc1': {'G': fG, 'C': fC, 'A': fA, 'T': fT}, 'acc2': ...}

sduc = dinuq.SDU(fasta_file, dinucl, position = ['pos1', 'pos2', 'bridge'], boots = 'none', custom_nt = 'none')

fasta file #required
dinucl = ['CpC', 'CpG', 'CpU', 'CpA', 'GpC', 'GpG', 'GpU', 'GpA', 'UpC', 'UpG', 'UpU', 'UpA', 'ApC', 'ApG', 'ApU', 'ApA'] #required
position = ['pos1', 'pos2', 'bridge'] #default is all three positions
boots = integer #default is none
custom_nt = {'acc1': {'G': fG, 'C': fC, 'A': fA, 'T': fT}, 'acc2': ...} #default is none

Output

The output of the module is a dictionary of accessions as keys and inner dictionaries as values. The inner dictionaries have each dinucleotide position as keys (e.g. CpGbridge) and a list of calculated SDU values as the value. If the error margins are being calculated, an inner list of SDU values calculated for each random sampling (specified in the samples argument) is included.

sduc = {'accession': {'dinucleotideposition': [sdu_value, [bootstrap_value1, bootstrap_value2, bootstrap_valuen]]}}

dinuq.RSDUc()

The RSDUc module will calculate the corrected Relative Synonymous Dinucleotide Usage for all sequences in a given fasta file.

Arguments

The arguments are the same as the these for the SDU module.

rsduc = dinuq.RSDUc(fasta_file, dinucl, position = ['pos1', 'pos2', 'bridge'], boots = 'none', custom_nt = 'none')

fasta file #required
dinucl = ['CpC', 'CpG', 'CpU', 'CpA', 'GpC', 'GpG', 'GpU', 'GpA', 'UpC', 'UpG', 'UpU', 'UpA', 'ApC', 'ApG', 'ApU', 'ApA'] #required
position = ['pos1', 'pos2', 'bridge'] #default is all three positions
boots = integer #default is none
custom_nt = {'acc1': {'G': fG, 'C': fC, 'A': fA, 'T': fT}, 'acc2': ...} #default is none

Output

The output format is the same as in the SDUc module.

rsduc = {'accession': {'dinucleotideposition': [rsdu_value, [bootstrap_value1, bootstrap_value2, bootstrap_valuen]]}}

dinuq.dict_to_tsv()

This module creates a tsv file in your working directory with the sdu or rsdu dictionary information in a table format. The user can choose how to summarise the error distribution (STDEV, SEM, MIN-MAX) if that has been calculated.

Arguments

Required arguments:

a sduc or rsduc dictionary produced by the SDUc or RSDUc module respectively
A name for the output tsv file

Optional arguments:

A summary of the error distribution (given that it has been calculated by the SDUc/RSDUc module). This can be:
- The minimum and maximum value of the distribution (extrema) - The standard deviation margins around the error distribution's mean (stdev) - The standard error of the mean margins around the mean (sem)

dinuq.dict_to_tsv(dictionary, output_file, error = 'none')

dictionary = sduc or rsduc #required
output_file #required
error = 'none', #default : - 'extrema' #minimum and maximum of simulated distribution - 'stdev' #mean plus/minus the distribution's standard deviation - 'sem' #mean plus/minus the distribution's standard error of the mean

dinuq.RDA()

The RDA module will calculate the Relative Dinucleotide Abundance for all sequences in a given fasta file, either for the entire sequence or specific dinucleotide frame positions.

Arguments

Required arguments:

a fasta file with: : - any number of coding sequences (no internal stop codons) - a different, preferably short, fasta header for each sequence (e.g. an accession)
A list of dinucleotides of interest (still needs to be a list if it's only one, e.g. ['CpG'])

Optional arguments:

A list of dinucleotide frame positions. By default the module will calculate the RDA for the entire sequence (no frame position separation).

rda = dinuq.RDA(fasta_file, dinucl, position = ['all'])

fasta_file #required
dinucl = ['CpC', 'CpG', 'CpU', 'CpA', 'GpC', 'GpG', 'GpU', 'GpA', 'UpC', 'UpG', 'UpU', 'UpA', 'ApC', 'ApG', 'ApU', 'ApA'] #required
position = ['pos1', 'pos2', 'bridge', 'all'] #default is all

Output

rda = {'accession': {'dinucleotideposition': [rda_value]}}

dinuq.RDA_to_tsv()

This module creates a tsv file in your working directory with the rda dictionary information in a table format.

Arguments

Required arguments:

a rda dictionary produced by the RDA module
A name for the output tsv file

dinuq.RDA_to_tsv(dictionary, output_file)

dictionary = rda #required

output_file #required

dinuq.RSCU()

The RSCU module will calculate the Relative Synonymous Codon Usage for all sequences in a given fasta file.

Arguments

Required arguments:

a fasta file with: : - any number of coding sequences (no internal stop codons) - a different, preferably short, fasta header for each sequence (e.g. an accession)

rscu = dinuq.RSCU(fasta_file)

fasta_file #required

Output

The output of the module is a dictionary of accessions as keys and inner dictionaries as values. The inner dictionaries have each codon as keys and the calculated RSCU value as the value.

rscu = {'accession': {'codon': rscu_value}}

dinuq.RSCU_to_tsv()

This module creates a tsv file in your working directory with the rscu dictionary information in a table format.

Arguments

Required arguments:

a rscu dictionary produced by the RSCU module
A name for the output tsv file

dinuq.RSCU_to_tsv(dictionary, output_file)

dictionary = rscu #required

output_file #required

dinuq.ntcont()

The ntcont module will simply calculate the single nucleotide composition of all sequences in a fasta file. The only argument required is the name of the fasta file.

nt = dinuq.ntcont(fasta_file)

Citations:

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.2.0

Mar 12, 2022

This version

1.1.1

Oct 28, 2020

1.1.0

Oct 28, 2020

1.0.1

Mar 1, 2020

1.0.0

Feb 27, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dinuq-1.1.1.tar.gz (14.2 kB view hashes)

Uploaded Oct 28, 2020 Source

Built Distribution

dinuq-1.1.1-py3-none-any.whl (24.8 kB view hashes)

Uploaded Oct 28, 2020 Python 3

Hashes for dinuq-1.1.1.tar.gz

Hashes for dinuq-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`3bf6d22d44374b3d23bf8434e8688bad141e8a195e08092d331ae32253fe58ad`
MD5	`06a30d6a6a9797c9737fe9741d86110b`
BLAKE2b-256	`31eface0c74505e8f2c1f348461fb1570f50e5c0343cde41251749b360a26a1a`

Hashes for dinuq-1.1.1-py3-none-any.whl

Hashes for dinuq-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ca471769368ce950246525b0ca57cbe0ba16120c8716346a0e8899f254d3e61d`
MD5	`15632d9c1105a4ae2e191f1a51848203`
BLAKE2b-256	`70d44c67b8bdd2a494c6dae40d692547569907bce60d62dc73c90ebe79802419`