Skip to main content

SynGenes is a Python class for standardizing gene nomenclatures, this class is capable of recognizing and converting the different nomenclature variations into a standardized form.

Project description

SynGenes

SynGenes

Welcome to SynGenes documentation!

License Release Stars Forks Releases Language Commits Users

SynGenes is a Python class for standardizing gene nomenclatures, this class is capable of recognizing and converting the different nomenclature variations into a standardized form.

Getting Started


1. Install SynGenes


Before installing SynGenes, you need to make sure that you have the following prerequisites installed:

These dependencies are automaticatically installed using the pip commands below.

* obrigatory

There are three ways to install SynGenes:

1.1. Through pip: You can install SynGenes directly through pip using the following command:

pip install SynGenes
This will install SynGenes and its dependencies in your Python environment.

1.2. By cloning the source code from GitHub: You can clone the source code of SynGenes from GitHub using the following command:

git clone https://github.com/luanrabelo/SynGenes.git
This will clone the repository to your local machine. You can then navigate to the cloned directory and install SynGenes and its dependencies using pip:
cd SynGenes
pip install -r requirements.txt

1.3. Through conda: You can install SynGenes through conda using the following command:

conda create -n SynGenes -c conda-forge -c bioconda SynGenes
conda activate SynGenes
This will install SynGenes and its dependencies in your conda environment.

Usage

from SynGenes import SynGenes
geneNames = SynGenes()

To update SynGenes database in your computer, run:

geneNames.updateSynGenes()
This command will delete the database from your computer and download a new one from the SynGenes repository.

Basic Example

# Mitochondrial
# Convert nomenclature 'cytochrome oxidase subunit I' to 'COI'
# Convert nomenclature 'cytochrome c oxidase subunit I' to 'COI'
FullGeneName1 = "cytochrome oxidase subunit I"
FullGeneName1 = geneNames.FixGeneName(geneName=FullGeneName1, type='mt')
FullGeneName2 = "cytochrome c oxidase subunit I"
FullGeneName2 = geneNames.FixGeneName(geneName=FullGeneName2, type='mt')
print(FullGeneName1)
print(FullGeneName2)
# output
COI
COI


# Chloroplast 
# Convert nomenclature 'ATPsynthaseCF1 alpha subunit' to 'atpA'
# Convert nomenclature 'ATP synthase CF1, subunit alpha' to 'atpA'
FullGeneName1 = "ATPsynthaseCF1 alpha subunit"
FullGeneName1 = geneNames.FixGeneName(geneName=FullGeneName1, type='cp')
FullGeneName2 = "ATP synthase CF1, subunit alpha"
FullGeneName2 = geneNames.FixGeneName(geneName=FullGeneName2, type='cp')
print(FullGeneName1)
print(FullGeneName2)
# output
atpA
atpA
Here, the user needs to provide the geneName parameter (str) and the type parameter (str), where type='mt' for mitochondrial genes and type='cp' for chloroplast genes.

Usage Example with Biopython

from SynGenes import SynGenes
from Bio import SeqIO

# Start SynGenes class
geneNames = SynGenes()
# Update SynGenes database 
geneNames.updateSynGenes()
# Read Example_File_1 file with SeqIO with BioPython
gbFile = SeqIO.read("Example_File_1.gb", "genbank")
for feature in gbFile.features:
    if feature.type == "CDS" or feature.type == "rRNA":
        # Print Genes Names
        print(feature.qualifiers['product'][0])
        # Output
        12S ribosomal RNA
        16S ribosomal RNA
        NADH dehydrogenase subunit 1
        NADH dehydrogenase subunit 2
        cytochrome c oxidase subunit I
        cytochrome c oxidase subunit II
        ATP synthase F0 subunit 8
        ATP synthase F0 subunit 6
        cytochrome c oxidase subunit III
        NADH dehydrogenase subunit 3
        NADH dehydrogenase subunit 4L
        NADH dehydrogenase subunit 4
        NADH dehydrogenase subunit 5
        NADH dehydrogenase subunit 6
        cytochrome b
        # Print Genes Names standardized by SynGenes
        print(geneNames.FixGeneName(geneName=feature.qualifiers['product'][0], type='mt'))
        # Output
        12S
        16S
        NADH-1
        NADH-2
        COI
        COII
        ATP-8
        ATP-6
        COIII
        NADH-3
        NADH-4L
        NADH-5
        NADH-6
        CYTB

# Read Example_File_2 file with SeqIO with BioPython
gbFile2 = SeqIO.read("Example_File_2.gb", "genbank")
for feature in gbFile.features:
    if feature.type == "CDS" or feature.type == "rRNA":
        # Print Genes Names
        print(feature.qualifiers['product'][0])
        # Output
        12S ribosomal RNA subunit
        16S ribosomal RNA subunit
        NADH dehydrogenase subunit I
        NADH dehydrogenase subunit II
        cytochrome oxidase subunit I
        cytochrome oxidase subunit II
        ATPase subunits 8
        ATPase subunits 6
        cytochrome oxidase subunit III
        NADH dehydrogenase subunit III
        NADH dehydrogenase subunit IVL
        NADH dehydrogenase subunit IV
        NADH dehydrogenase subunit V
        NADH dehydrogenase subunit VI
        cytochrome b protein
        # Print Genes Names standardized by SynGenes
        print(geneNames.FixGeneName(geneName=feature.qualifiers['product'][0], type='mt'))
        # Output
        12S
        16S
        NADH-1
        NADH-2
        COI
        COII
        ATP-8
        ATP-6
        COIII
        NADH-3
        NADH-4L
        NADH-5
        NADH-6
        CYTB

Through an example with two genomes, it is possible to observe that they have different nomenclatures for genes. However, by using SynGenes, it is possible to standardize these nomenclatures. This allows, for example, writing the standardized forms in fasta files or in input files of other tools such as CREx. This way, it is possible to ensure the consistency and compatibility of genomic data when performing subsequent analyzes.


2. Web Form

We also created a web form (https://luanrabelo.github.io/SynGenes) to researchers who wish to perform individual searches using different names associated with the same gene. This web form generates a command that incorporates multiple names, enabling precise searches on the National Center for Biotechnology Information (NCBI) - GenBank platform.


Authors

Developers

Citation

Rabelo et al.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SynGenes-1.0.0.tar.gz (6.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page