Translate DNA sequences to protein sequences using different genetic codes and translation tables
Project description
Genetic Codes
A Python and C library with no external dependencies for translating DNA sequences into protein sequences using different translation tables (aka genetic codes).
The NCBI Genetic Codes are central to working with alternate genetic codes. This Python tool kit includes a library that exposes the genetic codes so you can query a codon and get its variants or query a code and get its table. We also provide fast mechanisms to translate DNA sequences into protein sequences using the translation table of your choice.
Current genetic codes:
- The Standard Code (transl_table=1). By default all transl_table in GenBank flatfiles are equal to id 1, and this is not shown. When transl_table is not equal to id 1, it is shown as a qualifier on the CDS feature.
- The Vertebrate Mitochondrial Code (transl_table=2)
- The Yeast Mitochondrial Code (transl_table=3)
- The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code (transl_table=4)
- The Invertebrate Mitochondrial Code (transl_table=5)
- The Ciliate, Dasycladacean and Hexamita Nuclear Code (transl_table=6)
- The Echinoderm and Flatworm Mitochondrial Code (transl_table=9)
- The Euplotid Nuclear Code (transl_table=10)
- The Bacterial, Archaeal and Plant Plastid Code (transl_table=11)
- The Alternative Yeast Nuclear Code (transl_table=12)
- The Ascidian Mitochondrial Code (transl_table=13)
- The Alternative Flatworm Mitochondrial Code (transl_table=14)
- Blepharisma Nuclear Code (transl_table=15)
- Chlorophycean Mitochondrial Code (transl_table=16)
- Trematode Mitochondrial Code (transl_table=21)
- Scenedesmus obliquus Mitochondrial Code (transl_table=22)
- Thraustochytrium Mitochondrial Code (transl_table=23) It is the similar to the bacterial code (transl_table 11) but it contains an additional stop codon (TTA) and also has a different set of start codons.
- Rhabdopleuridae Mitochondrial Code (transl_table=24)
- Candidate Division SR1 and Gracilibacteria Code (transl_table=25)
- Pachysolen tannophilus Nuclear Code (transl_table=26)
- Karyorelict Nuclear Code (transl_table=27)
- Condylostoma Nuclear Code (transl_table=28)
- Mesodinium Nuclear Code (transl_table=29)
- Peritrich Nuclear Code (transl_table=30)
- Blastocrithidia Nuclear Code (transl_table=31)
- Cephalodiscidae Mitochondrial UAA-Tyr Code (transl_table=33)
Installation
We recommend installing pygenetic_code with bioconda:
mamba create -n pygenetic_code -c bioconda pygenetic_code
pygenetic_code --version
Alternatively, you can install pygenetic_code with pip.
pip install pygenetic_code
pygenetic_code --version
You can also clone this repository and install it from there:
git clone https://github.com/linsalrob/genetic_codes.git
cd genetic_codes
python -m venv venv
source venv/bin/activate
pip install .
Usage
There is a command line application, Python example code, and a library that you can use. The command line application and examples show you how to use the library.
Example code
These examples show you how to incorporate pygenetic_code into your own Python code.
We have a very simple translate function that you can use if you want to translate one (or more) ORFs. The signature is
translate(dna_sequence, translation_table)
and we have a simple example that translates a sequence:
python examples/translate_a_sequence.py
We can also translate DNA sequences in all six reading frames, and here is an example that reads a fasta file and translates all six frames using the bacterial genetic code (translation table 11):
python examples/translate_sequence_in_all_frames.py -f tests/JQ995537.fna -t 11
or an alternate genetic code (translation table 15):
python examples/translate_sequence_in_all_frames.py -f tests/JQ995537.fna -t 15
Or you can translate the E. coli K-12 sequence, and so you can identify all the ORFs in that genome:
python examples/translate_sequence_in_all_frames.py -f tests/U00096.3.fna.gz -t 11
(yes, you can use gzip files without decompressing them).
This will take about 0.1 seconds to do the actual translation, but starting python and all the other overheads make it almost 3/4 second to run.
You can also look at the effect of translation tables on the same sequences by running
python examples/average_translation_length.py -f tests/JQ995537.fna # for crassphage
python examples/average_translation_length.py -f tests/U00096.3.fna.gz # for E. coli K-12
We recommend using our easy Python wrappers to access the translate functions
from pygenetic_code import translate, six_frame_translation
But you can also access our C library directly, using the PyGeneticCode module (see below)
Command line applications
pygenetic_code translates DNA sequences either in one reading frame or in all six reading frames using the translation table of your choice.
To translate a sequence in the current reading frame, you can use
pygenetic_code --translate
First, make sure you have a DNA sequence. We provide a few in tests/ including a very short sequence, crAssphage, and [E. coli])(tests/U00096.3.fna.gz).
Library
Using the C library directly in Python
You can import the C library by importing PyGeneticCode.
There are two main methods that you can call:
The first function just returns the translation of your DNA sequence in 5' -> 3' format, so for example, this is the method you might use to translate an ORF.
PyGeneticCode.translate(DNA_sequence, translation_table)
(See examples/translate_a_sequence.py for an example.
The second method returns all the 6 frame translations.
PyGeneticCode.translate_six_frames(DNA_sequence, translation_table, verbose)
(See examples/translate_sequence_in_all_frames.py for an example invocation.)
The DNA sequence is the DNA sequence you want to translate. The translation table must be one of the valid translation tables (see pygenetic_code/genetic_code.translation_tables for the valid tables).
Translate a codon
Another way to access the code in your python application is to access the translate_codon() function, that has this signature:
amino_acid = translate_codon(codon, translation_table=1, one_letter=False)
The codon is the codon that you want to translate as either an RNA (e.g. AUG) or DNA (e.g. ATG) sequence. The translation_table is your required translation table (see the NCBI website for valid tables), and one_letter is whether to return a three letter amino acid code (e.g. Met or Ter) or a one letter amino acid code (e.g. M or *).
The library provides other ways to access the genetic codes, and those are exemplified in the pytest files in tests/
Viewing translation tables
You can print the translation tables using the pygenetic_code command. There are currently a couple of options:
jsonprints the table in machine readable json format.differenceprints a.tsvfile with the the difference from the standard (translation table 1) codemaxdifferenceprints a.tsvfile with the difference from the most common amino acid. The main difference is thatTGAis more frequently tryptophan than a stop.
Citing
Please cite this repository as:
Edwards, Robert A. 2023. pygenetic_code. https://github.com/linsalrob/genetic_codes. DOI: 10.5281/zenodo.10453453
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pygenetic_code-0.21.3.tar.gz.
File metadata
- Download URL: pygenetic_code-0.21.3.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e64f87005a4b560967d45f762892f21bd0b087ac696e3cc9b015e85f83fa184c
|
|
| MD5 |
e95ef78be6a32678c0d3987c158a2925
|
|
| BLAKE2b-256 |
95397c68f16596b88ec566487d5bda1aa5c00b2332ca1654c8b3265677f9f065
|
Provenance
The following attestation bundles were made for pygenetic_code-0.21.3.tar.gz:
Publisher:
python-publish.yml on linsalrob/genetic_codes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pygenetic_code-0.21.3.tar.gz -
Subject digest:
e64f87005a4b560967d45f762892f21bd0b087ac696e3cc9b015e85f83fa184c - Sigstore transparency entry: 815080662
- Sigstore integration time:
-
Permalink:
linsalrob/genetic_codes@1d91a906be547b84fc6ce2c66a1051a18baeb5be -
Branch / Tag:
refs/heads/main - Owner: https://github.com/linsalrob
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@1d91a906be547b84fc6ce2c66a1051a18baeb5be -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pygenetic_code-0.21.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: pygenetic_code-0.21.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 26.7 kB
- Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89bfbdee2d5f4c789d45deeffa48b497b9daad3b0d5a56e2427642beb2ce52d2
|
|
| MD5 |
682e0cf8242e8a8b698f327b89698e59
|
|
| BLAKE2b-256 |
fae7e73eef9ed1ea458460de81143135f4b85828ec48bca471d028b87f372f33
|
Provenance
The following attestation bundles were made for pygenetic_code-0.21.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:
Publisher:
python-publish.yml on linsalrob/genetic_codes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pygenetic_code-0.21.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl -
Subject digest:
89bfbdee2d5f4c789d45deeffa48b497b9daad3b0d5a56e2427642beb2ce52d2 - Sigstore transparency entry: 815080692
- Sigstore integration time:
-
Permalink:
linsalrob/genetic_codes@1d91a906be547b84fc6ce2c66a1051a18baeb5be -
Branch / Tag:
refs/heads/main - Owner: https://github.com/linsalrob
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@1d91a906be547b84fc6ce2c66a1051a18baeb5be -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pygenetic_code-0.21.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: pygenetic_code-0.21.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 26.7 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1185cfaa1c79de8bf1391e233f1784d5b12ad2bfb888bc159aa9c304bd2a51e8
|
|
| MD5 |
41e3f19a685990f450be6585bbd4a0f2
|
|
| BLAKE2b-256 |
8e019a575d827317353ab9f3629a38bf65e094eb33072c39d5453ce596a3fe65
|
Provenance
The following attestation bundles were made for pygenetic_code-0.21.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:
Publisher:
python-publish.yml on linsalrob/genetic_codes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pygenetic_code-0.21.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl -
Subject digest:
1185cfaa1c79de8bf1391e233f1784d5b12ad2bfb888bc159aa9c304bd2a51e8 - Sigstore transparency entry: 815080678
- Sigstore integration time:
-
Permalink:
linsalrob/genetic_codes@1d91a906be547b84fc6ce2c66a1051a18baeb5be -
Branch / Tag:
refs/heads/main - Owner: https://github.com/linsalrob
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@1d91a906be547b84fc6ce2c66a1051a18baeb5be -
Trigger Event:
workflow_dispatch
-
Statement type: