Calculate allele frequency from a sequence multialignment.
Project description
allfreqs
Calculate allele frequencies from a sequence multialignment.
Free software: MIT license
Documentation: https://allfreqs.readthedocs.io
GitHub repo: https://github.com/robertopreste/allfreqs
Features
Calculate allele frequencies from a nucleotide multialignment in fasta or csv format.
Allele frequencies will be returned as a table in which each row is a nucleotide position (based on the provided reference sequence) and columns are A, C, G, T frequencies as well as gaps and other non-canonical nucleotides.
For example, given the following multialignment:
ID |
Sequence |
---|---|
ref |
ACGTACGT |
seq1 |
A-GTAGGN |
seq2 |
ACCAGCGT |
the resulting allele frequencies will be:
position |
A |
C |
G |
T |
gap |
oth |
---|---|---|---|---|---|---|
1.0_A |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
2.0_C |
0.0 |
0.5 |
0.0 |
0.0 |
0.5 |
0.0 |
3.0_G |
0.0 |
0.5 |
0.5 |
0.0 |
0.0 |
0.0 |
4.0_T |
0.5 |
0.0 |
0.0 |
0.5 |
0.0 |
0.0 |
5.0_A |
0.5 |
0.0 |
0.5 |
0.0 |
0.0 |
0.0 |
6.0_C |
0.0 |
0.5 |
0.5 |
0.0 |
0.0 |
0.0 |
7.0_G |
0.0 |
0.0 |
1.0 |
0.0 |
0.0 |
0.0 |
8.0_T |
0.0 |
0.0 |
0.0 |
0.5 |
0.0 |
0.5 |
Frequencies of non-canonical (ambiguous) nucleotides are by default squashed into the oth column, but they can also be shown separately using a simple flag.
allfreqs can be used either as a command line tool or through its Python API.
For more information, please refer to the Usage section of the documentation.
Installation
PLEASE NOTE: allfreqs only supports Python >= 3.6!
The preferred installation method for allfreqs is using pip:
$ pip install allfreqs
For more information, please refer to the Installation section of the documentation.
Credits
This package was created with Cookiecutter and the cc-pypackage project template.
History
0.1.0 (2019-07-08)
First release.
0.1.1 (2019-08-08)
Read and process multialignments from fasta and csv files (Python module only).
0.1.2 (2019-10-17)
Add tests with and without reference included in multialignments;
Add tests with real datasets (coming from haplogroup-specific multialignments).
0.1.3 (2019-10-18)
Add more detailed tests for real datasets;
Implement more efficient frequency calculation;
Add dunder methods and sanity checks;
Fix requirements and testing framework;
Clean code.
0.2.0 (2020-03-07)
Remove numpy and pandas from requirements as they are installed by scikit-bio;
Move tests module inside allfreqs;
Add ci module for internal management;
Clean code.
0.3.0 (2020-04-02)
Add option to allow ambiguous nucleotides shown separately.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for allfreqs-0.3.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf196005366b685c302df43010d7c5cee12146dad0bd93cab22cf898e6897889 |
|
MD5 | e07ba7a721eb4ca757ba61d8bc7ef81b |
|
BLAKE2b-256 | 2ec3d62b125d47f20ead0b258a4198f7f277fb991fd32d3020d65e1a927ca47b |