SynGenes is a Python class for standardizing gene nomenclatures, this class is capable of recognizing and converting the different nomenclature variations into a standardized form.
Project description
SynGenes
Welcome to SynGenes documentation!
SynGenes is a Python class for standardizing gene nomenclatures, this class is capable of recognizing and converting the different nomenclature variations into a standardized form.
Getting Started
1. Install SynGenes
Before installing SynGenes, you need to make sure that you have the following prerequisites installed:
-
Python Environment
-
Dependencies
- requests *
- pandas *
- openpyxl *
These dependencies are automaticatically installed using the pip commands below.
* obrigatory
There are three ways to install SynGenes:
1.1. Through pip: You can install SynGenes directly through pip using the following command:
pip install SynGenes
This will install SynGenes and its dependencies in your Python environment.
1.2. By cloning the source code from GitHub: You can clone the source code of SynGenes from GitHub using the following command:
git clone https://github.com/luanrabelo/SynGenes.git
This will clone the repository to your local machine. You can then navigate to the cloned directory and install SynGenes and its dependencies using pip:
cd SynGenes
pip install -r requirements.txt
1.3. Through conda: You can install SynGenes through conda using the following command:
conda create -n SynGenes -c conda-forge -c bioconda SynGenes
conda activate SynGenes
This will install SynGenes and its dependencies in your conda environment.
2. Usage
from SynGenes import SynGenes
geneNames = SynGenes()
To update SynGenes database in your computer, run:
geneNames.updateSynGenes()
This command will delete the database from your computer and download a new one from the SynGenes repository.
Basic Example
# Mitochondrial
# Convert nomenclature 'cytochrome oxidase subunit I' to 'COI'
# Convert nomenclature 'cytochrome c oxidase subunit I' to 'COI'
FullGeneName1 = "cytochrome oxidase subunit I"
FullGeneName1 = geneNames.FixGeneName(geneName=FullGeneName1, type='mt')
FullGeneName2 = "cytochrome c oxidase subunit I"
FullGeneName2 = geneNames.FixGeneName(geneName=FullGeneName2, type='mt')
print(FullGeneName1)
print(FullGeneName2)
# output
COI
COI
# Chloroplast
# Convert nomenclature 'ATPsynthaseCF1 alpha subunit' to 'atpA'
# Convert nomenclature 'ATP synthase CF1, subunit alpha' to 'atpA'
FullGeneName1 = "ATPsynthaseCF1 alpha subunit"
FullGeneName1 = geneNames.FixGeneName(geneName=FullGeneName1, type='cp')
FullGeneName2 = "ATP synthase CF1, subunit alpha"
FullGeneName2 = geneNames.FixGeneName(geneName=FullGeneName2, type='cp')
print(FullGeneName1)
print(FullGeneName2)
# output
atpA
atpA
Here, the user needs to provide the geneName parameter (str) and the type parameter (str), where type='mt' for mitochondrial genes and type='cp' for chloroplast genes.
Usage Example with Biopython
from SynGenes import SynGenes
from Bio import SeqIO
# Start SynGenes class
geneNames = SynGenes()
# Update SynGenes database
geneNames.updateSynGenes()
# Read Example_File_1 file with SeqIO with BioPython
gbFile = SeqIO.read("Example_File_1.gb", "genbank")
for feature in gbFile.features:
if feature.type == "CDS" or feature.type == "rRNA":
# Print Genes Names
print(feature.qualifiers['product'][0])
# Output
12S ribosomal RNA
16S ribosomal RNA
NADH dehydrogenase subunit 1
NADH dehydrogenase subunit 2
cytochrome c oxidase subunit I
cytochrome c oxidase subunit II
ATP synthase F0 subunit 8
ATP synthase F0 subunit 6
cytochrome c oxidase subunit III
NADH dehydrogenase subunit 3
NADH dehydrogenase subunit 4L
NADH dehydrogenase subunit 4
NADH dehydrogenase subunit 5
NADH dehydrogenase subunit 6
cytochrome b
# Print Genes Names standardized by SynGenes
print(geneNames.FixGeneName(geneName=feature.qualifiers['product'][0], type='mt'))
# Output
12S
16S
NADH-1
NADH-2
COI
COII
ATP-8
ATP-6
COIII
NADH-3
NADH-4L
NADH-5
NADH-6
CYTB
# Read Example_File_2 file with SeqIO with BioPython
gbFile2 = SeqIO.read("Example_File_2.gb", "genbank")
for feature in gbFile.features:
if feature.type == "CDS" or feature.type == "rRNA":
# Print Genes Names
print(feature.qualifiers['product'][0])
# Output
12S ribosomal RNA subunit
16S ribosomal RNA subunit
NADH dehydrogenase subunit I
NADH dehydrogenase subunit II
cytochrome oxidase subunit I
cytochrome oxidase subunit II
ATPase subunits 8
ATPase subunits 6
cytochrome oxidase subunit III
NADH dehydrogenase subunit III
NADH dehydrogenase subunit IVL
NADH dehydrogenase subunit IV
NADH dehydrogenase subunit V
NADH dehydrogenase subunit VI
cytochrome b protein
# Print Genes Names standardized by SynGenes
print(geneNames.FixGeneName(geneName=feature.qualifiers['product'][0], type='mt'))
# Output
12S
16S
NADH-1
NADH-2
COI
COII
ATP-8
ATP-6
COIII
NADH-3
NADH-4L
NADH-5
NADH-6
CYTB
Through an example with two genomes, it is possible to observe that they have different nomenclatures for genes. However, by using SynGenes, it is possible to standardize these nomenclatures. This allows, for example, writing the standardized forms in fasta files or in input files of other tools such as CREx. This way, it is possible to ensure the consistency and compatibility of genomic data when performing subsequent analyzes.
3. Web Form
We also created a web form (https://luanrabelo.github.io/SynGenes) to researchers who wish to perform individual searches using different names associated with the same gene. This web form generates a command that incorporates multiple names, enabling precise searches on the National Center for Biotechnology Information (NCBI) - GenBank platform.
Developers
Citation
Rabelo et al.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
SynGenes-1.0.1.tar.gz
(6.9 kB
view hashes)