smallBixTools

A few small methods for bioinformatics

These details have not been verified by PyPI

Project links

Homepage

Project description

# smallBixTools a few small functions for bioinformatics

# smallBixTools a few small functions for bioinformatics.

See readme for full details.

Repo location:

https://bitbucket.org/hivdiversity/small_bix_tools

Docs: https://small-bix-tools.readthedocs.io/en/latest/

List of functions: (INCOMPLETE)

get_regions_from_panel:

Slices regions out of a fasta formatted file, joins them together, and writes the resulting fasta file to the given location. an example call might be: get_regions_from_panel(“test.fasta”, 0, 10], [20, 30, “/tmp”, “outfile.fasta”) which would, for each sequence in the input file: “test.fasta”, take the region from 0 to 10 joined with the region from 20 to 30, and write the result to the file: “/tmp/outfile.fasta”.

find_ranges

Find contiguous ranges in a list of numerical values. eg: data = [1,2,3,4,8,9,10] find_ranges(data) will return: 1, 2, 3, 4], [8, 9, 10

hamdist

Use this after aligning sequences. This counts the number of differences between equal length str1 and str2 The order of the input sequences does not matter.

fasta_to_dct

a dictionary of the contents of the file name given. Dictionary in the format: {sequence_id: sequence_string, id_2: sequence_2, etc.}

dct_to_fasta

param d:: dictionary in the form: {sequence_id: sequence_string, id_2: sequence_2, etc.}
param fn:: The file name to write the fasta formatted file to.
return:: Returns True if successfully wrote to file.

find_duplicate_ids

customdist

hyphen_to_underscore_fasta

auto_duplicate_removal

Attempts to automatically remove duplicate sequences from the specifed file. Writes results to output file specified. Uses BioPython SeqIO to parse the in file specified. Replaces spaces in the sequence id with underscores. Itterates over all sequences found - for each one, checking if its key already exists in an accumulating, if it does: check if the sequence which each specifies is the same. If they have the same key, and the same sequence - then keep the second instance encountered. Once the file has been parsed - write to the output file specified all sequences found which Will raise an exception if an error occurs during execution.

build_cons_seq

# https://www.biostars.org/p/14026/

own_cons_maker

split_file_into_timepoints

size_selector

py2_fasta_iter

from Brent Pedersen: https://www.biostars.org/p/710/#1412 given a fasta file. yield tuples of header, sequence

py3_fasta_iter

modified from Brent Pedersen: https://www.biostars.org/p/710/#1412 given a fasta file. yield tuples of header, sequence

convert_count_to_frequency_on_fasta

when running vsearch as such: vsearch –cluster_fast {} –id 0.97 –sizeout –centroids {} We get a centroids.fasta file with seqid header lines like: >ATTCCGGTATCT_9;size=1432; >CATCATCGTAAG_14;size=1; etc. This method converts those count values into frequencies. Notes: The delimiter between sections in the sequence id must be “;”. There must be a section in the sequence id which has exactly: “size=x” where x is an integer. This must be surrounded by “;“‘s

countNinPrimer

Motifbinner2 requires values to be specified for primer id length and primer length. Its tiresome to have to calculate this for many strings. So, I wrote this to help myself. An example of a primer sequence might be: NNNNNNNAAGGGCCAAAGGAACCCTTTAGAGACTATG And we would like to know how many N’s there are, how many other characters there are, and what the combined total lenght is.

compare_fasta_files

Compares two fasta files, to see if they contain the same data. The sequences must be named the same. We check if sequence A from file 1 is the same as sequence A from file 2. The order in the files does not matter. Gaps are considered.

unmake_hash_of_seqids

When calling mafft - sequence ids over 253 in length are truncated. This can result in non-unique ids if the first 253 characters of the seqid are the same, with a difference following that. To get around this - we can has the sequence ids, and write a new .fasta file for mafft to work on, then translate the sequence ids back afterwards.

This function does the translation back afterwards.

This is a sibling function to: make_hash_of_seqIDS.

Will raise an exception on error

make_hash_of_seqids

This function does the hashing and writing to file.

This is a sibling function to: unmake_hash_of_seqIDS

Will raise an exception on error

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.34

Apr 22, 2020

0.0.33

Jul 10, 2019

0.0.32

Feb 15, 2019

0.0.31

Feb 14, 2019

0.0.30

Nov 6, 2018

0.0.29

Oct 26, 2018

0.0.28

Oct 26, 2018

0.0.27

Aug 15, 2018

0.0.26

Aug 3, 2018

0.0.25

May 18, 2018

0.0.24

May 18, 2018

0.0.23

May 18, 2018

0.0.22

May 17, 2018

0.0.21

May 16, 2018

0.0.20

May 16, 2018

0.0.19

Apr 24, 2018

0.0.18

Apr 18, 2018

0.0.17

Apr 16, 2018

0.0.14

Jul 26, 2017

0.0.1

Nov 9, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smallBixTools-0.0.34.tar.gz (35.2 kB view details)

Uploaded Apr 22, 2020 Source

File details

Details for the file smallBixTools-0.0.34.tar.gz.

File metadata

Download URL: smallBixTools-0.0.34.tar.gz
Upload date: Apr 22, 2020
Size: 35.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for smallBixTools-0.0.34.tar.gz
Algorithm	Hash digest
SHA256	`cf5b2fd38b614211a8e15e2c9ec9fee75ca5246d4869d8de3269a53bd8ba4bb1`
MD5	`c42b48a938e466e88c2c8353166cadec`
BLAKE2b-256	`d30c90227b6f5b1055da7b8ab2563f61e48d00d7638dad6748ff34c33a775056`

See more details on using hashes here.

smallBixTools 0.0.34

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes