dnacol

Display FASTA/FASTQ/SAM/VCF files with colored DNA/RNA bases and quality scores (`dnacol`) or a protein sequence with colored amino acid codes (`pcol`) in the terminal

These details have not been verified by PyPI

Project links

Project description

Color DNA/RNA bases, protein amino acid codes and quality scores in terminal output

About

This is a python script to color DNA, RNA and protein sequences in the terminal. If called using dnacol, it will read lines from STDIN or from a file and color all strings of DNA/RNA it can find. In addition, it can also color phred-encoded quality scores in FASTQ/SAM files. If called using pcol, it will instead color protein sequences encoded as amino acid one-letter codes.

Screenshots

Format-specific coloring

By default, dnacol will find and color all strings of one or more DNA/RNA letters and pcol will color all strings of the twenty standard amino acid letters. However, they will also recognize a few standard file formats and apply more targeting coloring. When reading a file, these formats will automatically be recognized based on their file extensions. When reading from STDIN, dnacol and pcol will try to identify the format based on the data itself (for FASTQ/SAM/VCF files). The format can also be specified using the --format option.

SAM format (--format=sam, automatically enabled when filename ends in .sam or a line matching the SAM format is found)
- Ignore headers, color the SEQ column as DNA and the QUAL column as quality scores
FASTQ format (--format=fastq, automatically enabled when filename ends in .fastq or .fq or the first four lines match the FASTQ format)
- Color the second line of every read as DNA
- Color the fourth line of every read as quality scores
VCF format (--format=vcf, automatically enabled when filename ends in .vcf or a VCF header line is found)
- Ignore comments, only color the REF and ALT column
FASTA format (--format=fasta, automatically enabled when filename ends in .fasta or .fa)
- Ignore sequence identifiers

Colormaps

The script support different colormaps, which specify a color for each possible letter of the sequence. These are shown in dnacol --help. When called using dnacol, the script will use the dna_brgy colormap by default, while pcol will use the protein colormap. You can change the dnacol colormap using a configuration file (see below).

Options

-w, --wide
    wide output (add spaces around each base)
-f FORMAT, --format FORMAT
    file format (auto|text|sam|vcf|fastq|fasta)

Configuration

You can create a configuration file in YAML format called /etc/dnacol or ~/.dnacol to change the behavior of this script. At the moment, the only setting available is the colormap to use for DNA sequences. See see dnacol --help for examples of the colormaps that are available.

To use the gbyr instead of the brgy colormap, set the dna_colormap option like this:

dna_colormap: gbyr

Download/Install

To install, use pip:

pip install dnacol

If the system-wide directory is not writable, you can install to your home directory with:

pip install dnacol --user

Alternatively, you can clone this git repository and use the provided setup.py script.

git clone https://github.com/koelling/dnacol.git
cd dnacol && python setup.py install

dnacol has been tested with Python 2.7 and Python 3.5 and 3.6.

Examples

#read gzipped file
dnacol examples/phix.fa.gz | head

#pipe from stdin
head examples/reads.txt | dnacol --wide

#use `pcol` for protein sequences
pcol examples/hras.fa

#use `less -R` to display colors in less
dnacol examples/phix.fa.gz | less -R

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4

Aug 30, 2017

0.3.2

Aug 1, 2017

0.3.1

Aug 1, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dnacol-0.4.tar.gz (8.1 kB view details)

Uploaded Aug 30, 2017 Source

File details

Details for the file dnacol-0.4.tar.gz.

File metadata

Download URL: dnacol-0.4.tar.gz
Upload date: Aug 30, 2017
Size: 8.1 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for dnacol-0.4.tar.gz
Algorithm	Hash digest
SHA256	`3f9e4d22a28993ba40473f6698767a551ccc55479d8b74c85fe4b0247f181bc4`
MD5	`2aecf1c35fef40ba9790e53974c361e0`
BLAKE2b-256	`ee39cece018d186c50f9cf58df3467291b42c5858373371f813218670e03d728`

See more details on using hashes here.

dnacol 0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

About

Screenshots

Format-specific coloring

Colormaps

Options

Configuration

Download/Install

Examples

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes