decode a one-hot numpy array to biological sequences
Project description
Description
onehot2seq
is a command-line tool decoding a one-hot numpy array to DNA/RNA/protein sequences.
To encode sequences to a one-hot numpy array, use seq2onehot
.
https://github.com/akikuno/seq2onehot
Installation
You can install onehot2seq
using pip:
pip install onehot2seq
Usage
onehot2seq [options] -t/--type <dna/rna/protein> -i/--input <in.npy> -o/--output <out.txt/fasta>
Options
-a/--ambiguous: include ambiguous characters
-f/--format <txt/fasta>: output as a FASTA format (default: txt)
The ambigous characters are:
XBZJ
for amino acidNVHDBMRWSYK
for DNA and RNA
The detail of ambiguous characters is described here:
https://meme-suite.org/meme/doc/alphabets.html
The header IDs of FASTA format are sequential numbers (e.g. >seq1
, >seq2
)
Examples
# Output DNA sequences
onehot2seq -t dna -i example/dna.npy -o dna.txt
# Output DNA sequences as a FASTA format
onehot2seq -t dna -f fasta -i example/dna.npy -o dna.fasta
# RNA sequences
onehot2seq -t rna -i example/rna.npy -o rna.txt
# Protein sequences
onehot2seq -t protein -i example/protein.npy -o protein.txt
One-hot array
The input file must contain 3d one-hot array of RxNxL
(Read x Nucreotide/Amino acid x Letter)
- The order of nucreotide is
ACGT
(+NVHDBMRWSYK
) for DNA,ACGU
(+NVHDBMRWSYK
) for RNA - The order of amino acid is
ACDEFGHIKLMNPQRSTVWY
(+XBZJ
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
onehot2seq-0.0.1.tar.gz
(3.9 kB
view hashes)
Built Distribution
Close
Hashes for onehot2seq-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 578db26587e333e05771af8fc24e3ff6d2f8d65c44b2a3498a48018bb67b6989 |
|
MD5 | 55092f6feb1dd83209516dfae230b393 |
|
BLAKE2b-256 | fc8e1d1135d63034bde406ba3d0c6f2584dc639d9f8efe4b215aacc37db8eb92 |