Translates and transcribes an arbitrary genetic sequence, generates FASTA-formatted files, and interfaces with BLAST databases to identify genetic and protein sequences.
Project description
Installation
The following command installs Codons in a command prompt/terminal environment:
pip install codons
__init__
The data environment, in a Python IDE, is defined:
import codons
cd = codons.Codons(sequence = None, codons_table = 'standard', amino_acids_form = 'full_name', hyphenated = None, verbose = False, printing = True)
sequence str: specifies the genetic sequence that will be processed through subsequent functions, which can alternatively be provided in each function ad hoc.
codons_table str: specifies the framework for translating codons into amino acids, where the standard translation table is used by default.
amino_acids_form str: specifies whether the amino acid full_name, three_letter, or one_letter nomenclature will be used in the protein sequence.
hyphenated bool: specifies whether amino acid residues of the protein sequence are delimited by hyphens, where None defaults to True for amino_acids_for = full_name and amino_acids_for = three_letter and False for amino_acids_for = one_letter.
verbose & printing bool: specifies whether troubleshooting information or MW results will be printed, respectively.
read_fasta()
A genetic sequence is converted from DNA -> RNA, or RNA -> DNA, where the directionality of the conversion is automatically listed in the FASTA description:
sequences, descriptions, fasta_file = transcribed_sequence = cd.read_fasta(fasta_path = None, fasta_link = None):
fasta_path str: The path to a FASTA file that will be loaded, parsed, and returned.
fasta_link str: The URL link to a FASTA file that will be imported, parsed, and returned.
Returns:
sequences & descriptions list: The sequences and descriptions that are contained within the FASTA file.
fasta_file str: The FASTA file as a string that is specified by the path or URL link argument.
make_fasta()
A simple function that returns, and optionally exports, a FASTA-formatted file from the parameterized description and sequence:
fasta_file = cd.make_fasta(sequence, description = 'sequence', export_path = None):
sequence str: The genetic or protein sequence that will constitute the FASTA file.
description str: A description of the sequence that will be the first line of the FASTA file.
export_path str: The path to which the FASTA file will be exported, where None specifies that the file will not be exported.
Returns:
fasta_file str: The FASTA-formatted file as a string, based upon the parameterized sequence and description.
transcribe()
A genetic sequence is converted from DNA -> RNA, or RNA -> DNA, where the directionality of the conversion is automatically listed in the FASTA description:
transcribed_sequence = cd.transcribe(sequence = None, description = '', fasta_path = None, fasta_link = None)
sequence str: The genetic seqeuence that will be transcribed. The sequence is case-insensitive, and can even possess line numbers or column-spaces, which the code ignores. The absence of a passed sequence executes the sequence that is loaded into the Codons object.
description str: A description of the genetic sequence that will be added to the FASTA-formatted output of the function.
fasta_path & fasta_link str: The path or URL link to a FASTA file that will be transcribed.
Returns:
transcribed_sequence str: The translated sequence as a single string.
translate()
A genetic sequence is translated into a FASTA-formatted sequence of amino acids for each protein that is coded by the genetic code:
proteins = cd.translate(sequence = None, fasta_path = None, fasta_link = None)
sequence str: The genetic sequence , of either DNA or RNA, that will be translated into a protein sequence. The sequence is case-insensitive, and can even possess line numbers or column-spaces, which the code ignores. The absence of a passed sequence executes the sequence that is loaded into the Codons object.
fasta_path & fasta_link str: The path or URL link to a FASTA file that will be translated.
blast_protein()
A protein sequence or a FASTA-formatted file of protein sequences is searched in through the BLAST database of the NIH for information about the protein(s):
blast_results = cd.blast_protein(sequence = None, database = 'nr', description = 'Protein sequence description', fasta_path = None, fasta_link = None, export_name = None, export_directory = None)
sequence str: The genetic seqeuence, of either DNA or RNA, that will be parsed and translated into a protein sequence. The sequence is case-insensitive, and can even possess line numbers or column-spaces, which the code ignores. The absence of a passed sequence executes the sequence that is loaded into the Codons object.
database str: The BLAST database that will be searched for the protein sequence. Permissible options include: nr, refseq_select, refseq_protein, landmark, swissprot, pataa, pdb, env_nr, tsa_nr.
description str: A description of the genetic sequence that will be added to the FASTA-formatted output of the function.
fasta_path & fasta_link str: The path or URL link to a protein FASTA or multi-FASTA file that will be systematically searched.
export_name & export_directory str: The name of the folder and directory to which the scraped BLAST data will be saved in a file: nucleotide_blast_results.xml. The None values enable the code to construct a unique folder name that describes the contents and saves it to the current working directory.
Returns
blast_results Bio.Blast.NCBIXML: An API accessible format that facilitates investigating the acquired BLAST from the search content.
blast_nucleotide()
A genetic sequence is translated into a FASTA-formatted sequence of amino acids for each protein that is coded by the genetic code:
cd.blast_nucleotide(sequence = None, database= 'nt', description = 'Genetic sequence description', export_name = None, export_directory = None)
sequence str: The genetic sequence, of either DNA or RNA, that will be parsed and translated into a protein sequence. The sequence is case-insensitive, and can even possess line numbers or column-spaces, which the code ignores. The absence of a passed sequence executes the sequence that is loaded into the Codons object.
database str: The BLAST database that will be searched for the nucleotide sequence. Permissible options include: nr, nt, refseq_select, refseq_rna, refseq_representative_genomes, wgs, refseq_genomes, est, SRA, TSA, HTGS, pat, pdb, RefSeq_Gene, gss, dbsts.
description str: A description of the genetic sequence that will be added to the FASTA-formatted output of the function.
fasta_path & fasta_link str: The path or URL link to a protein FASTA or multi-FASTA file that will be systematically searched.
export_name & export_directory str: The name of the folder and directory to which the scraped BLAST data will be saved in a file: protein_blast_results.xml. The None values enable the code to construct a unique folder name that describes the contents and saves it to the current working directory.
Returns
blast_results Bio.Blast.NCBIXML: An API accessible format that facilitates investigating the acquired BLAST from the search content.
export()
The genetic sequence and any corresponding protein or nucleotide content from the aforementioned functions, which reside in the Codons object, are exported:
cd.export(export_name = None, export_directory = None)
export_name str: optionally specifies a name for the folder of exported content, where None enables the code to design a unique folder name for simulation and descriptive tags of the contents.
export_directory str: optionally specifies a path to where the content will be exported, where None selects the current working directory.
Accessible content
The Codons object retains numerous components that are accessible to the user:
genes dict: A dictionary of all genes in the genetic sequence, with sub-content of a list of all coding Codons in the gene and the corresponding protein sequence and mass.
protein_fasta & gene_fasta str: Assembled FASTA-formatted files for the translated proteins from a parameterized genetic sequence and for a genetic sequence, respectively.
transcribed_sequence & sequence str: The transcribed genetic sequence from the transcription() function, and the genetical sequence that is used in any of the Codons functions.
amino_acid_synonyms dict: The synonyms for each amino acid, with keys of the full amino acid name.
codons_table & changed_codons dict: The translation table between genetic codons and amino acid residues, which is accessed with case-insensitivity, and the translation conversions that were changed based upon the user’s specification.
missed_codons dict: A collections of the codons that were parsed yet never matched with a codons_table key.
paths & parameters dict: Collections of the paths and parameters are are defined for the simulation.
export_path str: The complete export path for the Codons contents.
protein_blast_results & nucleotide_blast_results str: The BLAST search results for the passed proteins and nucleotides, respectively.
Execution
Codons is executed through the following sequence of the aforementioned functions, which is exemplified in the example Notebook of our GitHub repository:
import codons
cd = codons.Codons(sequence = None, codons_table = 'standard', amino_acids_form = 'full_name', hyphenated = None, verbose = False, printing = True)
# < Codons function(s) >
cd.export(export_name = None, export_directory = None)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file Codons-0.0.7.tar.gz
.
File metadata
- Download URL: Codons-0.0.7.tar.gz
- Upload date:
- Size: 39.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d0b7cfce3d99910acbae38548f6d7e9642800ccfc9d7f6340132dfac84b7919 |
|
MD5 | 83413aaedce10776deafe09556758cba |
|
BLAKE2b-256 | b320fd95d341731d0f8b4604dbb2885bea1b56b0e2e4cbe97d63501a9c75958a |