Skip to main content

Package to calculate a distance matrix from a multiple sequence file

Project description

FastaDist

build coverage
Github repository

This small utility package will calculate number of differences between all samples in a fasta alignment file. It will count any position where there is a G,A,T or C (case insensitive) in both sequences that differ as 1 SNV.

Output formats are a square distance matrix in tsv, csv or phylip formats It is fast since it first converts sequences to bit arrays and then uses fast bit operations to calculate the differences.

On a mid-range laptop a distance matrix was produced in 11 minutes from a 764 sequence alignment of length 1,082,859 using -p 1 and 4.5 minutes with -p 4

Installation

FastaDist is available as PyPi package for Python3

pip3 install fastadist

Usage

usage: fastadist [-h] -i ALIGNMENT_FILEPATH [-t TREE_FILEPATH] -o
                 OUTPUT_FILEPATH [-f FORMAT] [-p PARALLEL_PROCESSES] [-v]

    A script to calculate distances by converting sequences to bit arrays.
    Specify number of processes as -p N to speed up the calculation


optional arguments:
  -h, --help            show this help message and exit
  -i ALIGNMENT_FILEPATH, --alignment_filepath ALIGNMENT_FILEPATH
                        path to multiple sequence alignment input file
  -t TREE_FILEPATH, --tree_filepath TREE_FILEPATH
                        path to newick tree for distance matrix ordering
  -o OUTPUT_FILEPATH, --output_filepath OUTPUT_FILEPATH
                        path to distance matrix output file
  -f FORMAT, --format FORMAT
                        output format for distance matrix (one of tsv
                        [default], csv and phylip
  -p PARALLEL_PROCESSES, --parallel_processes PARALLEL_PROCESSES
                        number of parallel processes to run (default 1)
  -v, --version         print out software version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FastaDist-1.0.1.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

FastaDist-1.0.1-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file FastaDist-1.0.1.tar.gz.

File metadata

  • Download URL: FastaDist-1.0.1.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.25.1 setuptools/57.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.7.0

File hashes

Hashes for FastaDist-1.0.1.tar.gz
Algorithm Hash digest
SHA256 db3d3e4dd82e68dd6ee3832add9280bd80ac28bb494075e8cd0d213c95750e63
MD5 2409b71df4ace9922028aefa7191a7c1
BLAKE2b-256 1cb28f441111d5608155558652b13c0636a2d9009f7a0d4eb0e540b12e868ab3

See more details on using hashes here.

File details

Details for the file FastaDist-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: FastaDist-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.25.1 setuptools/57.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.7.0

File hashes

Hashes for FastaDist-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a5d53d36d3868c955252cdf7bf561d96ed64561ddcc7301c69325a509098b2c5
MD5 583e602995810c171b09be1a9dca2eee
BLAKE2b-256 6149bcc8161f22453765b98775fd35a167b5af5e3b5385256d0f734e9b840ab3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page