Remove outlier sequences from multiple sequence alignment
Project description
seqSieve will try to remove sequences that cause misalignments from a multiple sequence alignment(MSA). It reads a given MSA in multi-fasta format and removes sequences with the highest penalty scores, then builds the next MSA without those sequences. This process is repeated until a user-specified cutt-off is reached or less than three sequences are left to be aligned.
Usage:
###################################### # seqSieve.py ###################################### usage: seqSieve.py -f multifasta alignment options: -f, --fasta=FILE multifasta alignment (eg. "align.fas") OR -F, --fasta_dir=DIR directory with multifasta files (needs -s SUFFIX) -s, --suffix=SUFFIX will try to work with files that end with SUFFIX (eg ".fas") -a, --msa_tool=STR supported: "mafft", prank, prankf (= prank +F) [default:"mafft"] -i, --max_iterations=NUM force stop after NUM iterations -n, --num_threads=NUM max number of threads to be executed in parallel [default: 1] -m, --mode=MODE set strategy to remove outlier sequences [default: "Sites"] available modes (not case sensitive): "Sites", "Gaps", "uGaps","Insertions", "uInsertions","uInsertionsGaps", "custom" -q, --no-realign don't realign with each iteration (not recommended) -l, --log write logfile -h, --help prints this only for mode "custom": -g, --gap_penalty=NUM set gap penalty [default: 1.0] -G, --unique_gap_penalty=NUM set unique gap penalty [default: 10.0] -j, --insertion_penalty=NUM set insertion penalty [default:1.0] -J, --unique_insertion_penalty=NUM set insertion penalty [default:1.0] -M, --mismatch_penalty=NUM set mismatch penalty [default:1.0] -r, --match_reward=NUM set match reward [default: -10.0]
Currently supported multiple sequence aligners:
mafft (Katoh, Standley 2013 (Molecular Biology and Evolution 30:772-780) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. http://mafft.cbrc.jp/alignment/software/)
prank (Loytynoja, Goldman 2005 (PNAS 102:10557-10562) An algorithm for progressive multiple alignment of sequences with insertions. http://www.ebi.ac.uk/goldman-srv/prank/prank/
Requirements
matplotlib
numpy
External Programs
mafft and/or
prank
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.