Counts file library and conversion scripts.

These details have not been verified by PyPI

Project links

Homepage

Project description

Counts file library

This python library cflib provides scripts to convert between fasta, VCF and counts files. Counts files are used by PoMo, an implementation of a polymorphism-aware phylogenetic model. We advice you to use PoMo implemented in IQ-TREE.

For a reference, please see and cite:

Schrempf, D., Minh, B. Q., De Maio, N., von Haeseler, A., &
Kosiol, C. (2016). Reversible Polmorphism-Aware Phylotenetic
Models and their Application to Tree Inference. Journal of
Theoretical Biology, in press.

Requirements

cflib requires python (Version 3.x) to be installed. cflib also uses the following python libraries that will be automatically pulled when installing cflib:

scipy,
numpy and
pysam.

Installation

Install cflib and the conversion scripts with

pip install --user cflib-pomo

Note that the name of cflib on the PyPI repository (which is used by pip) is cflib-pomo, since the name cflib was taken!

If the standard Python version of your operation system is still 2.x (e.g., OSX), make sure that you use, pip3.

The --user flag is optional and tells Python to install cflib and the scripts only for this user but not system wide.

If you want to uninstall cflib,

pip uninstall cflib-pomo

The [conversion scripts](#Conversion scripts) should be directly available if your PATH environment variable is setup correctly. For my Linux installation, the Python path ~/.local/bin had to be included. This may vary for your operating system.

Example

Sample data can be found in examples. Assuming that have installed cflib we will now convert example.fasta to a counts file named example_from_fasta.cf. The script that we will use is called FastaToCounts.py. First, we have a look at the help message:

FastaToCounts.py --help

usage: FastaToCounts.py [-h] [-v] [--iupac] fastaFile output

Convert fasta to counts format.

The (aligned) sequences in the fasta file are read in and the data is
written to a counts format file.

Sequence names are stripped at the first dash.  If the stripped
sequence name coincide, individuals are put into the same population.

E.g., homo_sapiens-XXX and homo_sapiens-YYY will be in the same
population homo_sapiens.

Take care with large files, this uses a lot of memory.

The input as well as the output files can additionally be gzipped
(indicated by a .gz file ending).

If heterozygotes are encoded with IUPAC codes (e.g., 'r' for A or G),
homozygotes need to be counted twice so that the level of polymorphism
stays correct.  This can be done with the `--iupac` flag.

positional arguments:
  fastaFile      path to (gzipped) fasta file
  output         name of (gzipped) outputfile in counts format

optional arguments:
  -h, --help     show this help message and exit
  -v, --verbose  turn on verbosity (-v or -vv)
  --iupac        heteorzygotes are encoded with IUPAC codes

As requested, the sequence names in example.fasta are, e.g., Sheep-1, Sheep-2, and so on. The following code converts the file example.fasta into the counts file example_from_fasta.cf:

FastaToCounts.py example.fasta example_from_fasta.cf

Note on IUPAC codes

IUPAC codes are supported and handled adequately.

In particular,

N can be used to denote any base or that the base is unknown; the letter * can also be used in this case, although it is non-standard;
- or . denote a gap or a deletion.

Also the other IUPAC codes are supported.

Conversion scripts

CountsToFasta.py: Convert a counts file to a fasta file.
FastaToCounts.py: Convert a fasta file to counts format.
FastaToVCF.py: Convert a fasta file to variant call format.
FastaVCFToCounts.py: Convert a fasta reference with VCF files to counts format.
FilterMSA.py: Filter a multiple sequence alignment file (apply standard filters; cf. libPoMo).
GPToCounts.py: Experimental. Convert gene prediction files with reference to counts format.
MSAToCounts.py: Convert multiple sequence alignments with VCF files to counts format.

Each script comes with its own documentation. Please execute, e.g.,

FastaToCounts.py --help

All conversion scripts can be found in the scripts folder.

Documentation

If you are interested in cflib itself, please refer to the cflib reference manual.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.3.0.0

Feb 11, 2021

1.2.2.1

Jul 24, 2020

1.2.2

Jul 24, 2020

1.2.1

Dec 4, 2018

1.2

Dec 4, 2018

1.1

Dec 4, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cflib-pomo-1.3.0.0.tar.gz (680.9 kB view details)

Uploaded Feb 11, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cflib_pomo-1.3.0.0-py3-none-any.whl (47.6 kB view details)

Uploaded Feb 11, 2021 Python 3

File details

Details for the file cflib-pomo-1.3.0.0.tar.gz.

File metadata

Download URL: cflib-pomo-1.3.0.0.tar.gz
Upload date: Feb 11, 2021
Size: 680.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.1.post0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.7

File hashes

Hashes for cflib-pomo-1.3.0.0.tar.gz
Algorithm	Hash digest
SHA256	`da5bf473f56c040c0fd36fb590b5944bfcdbc071c9472ad301a8b2df2c82e700`
MD5	`19f613362ab4ac44d99c1c3db75e911c`
BLAKE2b-256	`f348d5042188ec38f299da9b38ef0e9b322c2b799847a0cd1a995ba16df12ad4`

See more details on using hashes here.

File details

Details for the file cflib_pomo-1.3.0.0-py3-none-any.whl.

File metadata

Download URL: cflib_pomo-1.3.0.0-py3-none-any.whl
Upload date: Feb 11, 2021
Size: 47.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.1.post0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.7

File hashes

Hashes for cflib_pomo-1.3.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a219f4bc73a795824bea25c3d2e9222c8b67319359fe7798ef7b08b62658dd88`
MD5	`efc7d309b546a994699b1443e7347233`
BLAKE2b-256	`0c829a8ef1756667aed6ed6a415c8e7de486ade3e4d4620b183bef622a82756f`

See more details on using hashes here.

cflib-pomo 1.3.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Counts file library

Requirements

Installation

Example

Note on IUPAC codes

Conversion scripts

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes