Package to calculate a distance matrix from a multiple sequence file
Project description
FastaDist
This small utility package will calculate number of differences between all samples in a fasta alignment file. It will count any position where there is a G,A,T or C (case insensitive) in both sequences that differ as 1 SNV.
Output formats are a square distance matrix in tsv, csv or phylip formats It is fast since it first converts sequences to bit arrays and then uses fast bit operations to calculate the differences.
On a mid-range laptop a distance matrix was produced in 11 minutes from a 764 sequence alignment of length 1,082,859 using -p 1 and 4.5 minutes with -p 4
Installation
FastaDist is available as PyPi package for Python3
pip3 install fastadist
Usage
usage: fastadist [-h] -i ALIGNMENT_FILEPATH [-t TREE_FILEPATH] -o
OUTPUT_FILEPATH [-f FORMAT] [-p PARALLEL_PROCESSES] [-v]
A script to calculate distances by converting sequences to bit arrays.
Specify number of processes as -p N to speed up the calculation
optional arguments:
-h, --help show this help message and exit
-i ALIGNMENT_FILEPATH, --alignment_filepath ALIGNMENT_FILEPATH
path to multiple sequence alignment input file
-t TREE_FILEPATH, --tree_filepath TREE_FILEPATH
path to newick tree for distance matrix ordering
-o OUTPUT_FILEPATH, --output_filepath OUTPUT_FILEPATH
path to distance matrix output file
-f FORMAT, --format FORMAT
output format for distance matrix (one of tsv
[default], csv and phylip
-p PARALLEL_PROCESSES, --parallel_processes PARALLEL_PROCESSES
number of parallel processes to run (default 1)
-v, --version print out software version
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file FastaDist-1.0.1.tar.gz
.
File metadata
- Download URL: FastaDist-1.0.1.tar.gz
- Upload date:
- Size: 18.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.25.1 setuptools/57.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | db3d3e4dd82e68dd6ee3832add9280bd80ac28bb494075e8cd0d213c95750e63 |
|
MD5 | 2409b71df4ace9922028aefa7191a7c1 |
|
BLAKE2b-256 | 1cb28f441111d5608155558652b13c0636a2d9009f7a0d4eb0e540b12e868ab3 |
File details
Details for the file FastaDist-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: FastaDist-1.0.1-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.25.1 setuptools/57.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5d53d36d3868c955252cdf7bf561d96ed64561ddcc7301c69325a509098b2c5 |
|
MD5 | 583e602995810c171b09be1a9dca2eee |
|
BLAKE2b-256 | 6149bcc8161f22453765b98775fd35a167b5af5e3b5385256d0f734e9b840ab3 |