SynGenes is a Python class for standardizing mitochondrial/chloroplast gene nomenclatures.
Project description
Contents Overview
- System Overview
- License
- The Hitchhiker's Guide to SynGenes
- SynGenes Development Team
- Citing SynGenes
- Contact
System Overview
:rocket: Go to Contents Overview
SynGenes is a Python class designed to standardize gene nomenclatures for mitochondrial and chloroplast genes. It recognizes various nomenclature variations and converts them into a consistent, standardized format. This tool simplifies the integration and comparison of genetic data from different sources by unifying gene names.
Licence
:rocket: Go to Contents Overview
SynGenes is released under the MIT License. This license permits reuse within proprietary software provided that all copies of the licensed software include a copy of the MIT License terms and the copyright notice.
For more details, please see the MIT License.
The Hitchhiker's Guide to SynGenes
Getting Started
:rocket: Go to Contents Overview
-
Prerequisites
Before you run SynGenes, make sure you have the following prerequisites installed on your system:
- Python Environment
- Python version 3.10 or higher
- conda (optional)
- Dependencies (automatically installed with pip)
requests
pandas
biopython
openpyxl
Installation
:rocket: Go to Contents Overview
There are two ways to install SynGenes:
- Through pip: Install SynGenes dependencies directly using pip:
- 1.1. Open the Terminal or Python Environment
- 1.2. Execute the following command:
pip install SynGenes
or
pip install SynGenes --upgrade
[!NOTE] This command will install SynGenes and its dependencies in your Python environment.
- By cloning the SynGenes GitHub repository:
- 2.1. Open the Terminal or Python Environment
- 2.2. Execute the following command:
git clone https://github.com/luanrabelo/SynGenes.git
cd SynGenes
pip install -r requirements.txt
[!NOTE] This will clone the repository, then you should navigate to the cloned directory to install SynGenes and its dependencies using pip.
Functions
__init__
:rocket: Go to Contents Overview
__init__(self, **kwargs)
Initializes the SynGenes class. This function is the constructor of the class and is called when a new instance of the SynGenes class is created.
When an instance of the SynGenes class is created, the constructor checks if the SynGenes.xlsx
database exists at the specified path.
If it does not exist, it will attempt to create the SynGenes directory and download the database from the GitHub repository.
If verbose is True, status messages will be printed in the terminal to inform the user about the progress of these operations.
Parameters:
verbose (bool)
: IfTrue
, messages will be printed during execution. The default isFalse
.
Returns:
None
Notes:
- This function requires the
requests
library to be imported. - The
SynGenes
database is available at github.com/luanrabelo/SynGenes.
Usage Example:
from SynGenes import SynGenes
sg = SynGenes(verbose=False)
update
:rocket: Go to Contents Overview
update(self, **kwargs)
Updates the SynGenes database by downloading it from the GitHub repository's stable branch. If an existing database is found, it is removed before downloading the new one.
The update
function checks if the SynGenes.xlsx
database file exists in the user’s computer.
If it does, the file is removed.
Then, the function attempts to download the latest version of the database from the specified GitHub repository URL.
If the verbose parameter is set to True, the function will print messages to the console to inform the user of the progress, including the removal of the old database and the download of the new one.
Parameters:
verbose (bool)
: IfTrue
, messages will be printed during execution. The default isFalse
.
Returns:
- The updated
SynGenes
database saved in theSynGenes
folder.
Notes:
- This function requires the
requests
library to be imported. - The
SynGenes
database is available at github.com/luanrabelo/SynGenes.
Usage Example:
from SynGenes import SynGenes
sg = SynGenes()
sg.update()
fix_gene_name
:rocket: Go to Contents Overview
fix_gene_name(self, **kwargs)
Corrects the gene name according to the SynGenes database, ensuring it adheres to the standardized nomenclature.
The fix_gene_name
function takes a gene name and corrects it based on the entries in the SynGenes database.
It supports both mitochondrial (mt) and chloroplast (cp) genes.
If the provided gene name is found in the database, it is replaced with the standardized short name.
If not found, the original name is returned, and a log entry is created.
The function provides verbose output if the verbose parameter is set to True.
Parameters:
geneName (str)
: The gene name to be corrected.type (str)
: The type of gene (mt
for Mitochondrial,cp
for Chloroplast). The default ismt
.verbose (bool)
: If set toTrue
, messages will be printed during execution. The default isFalse
.
Returns:
ShortName (str)
: The corrected gene name.
Notes:
- This function requires the
pandas
library to be imported. - The
SynGenes
database can be found at github.com/luanrabelo/SynGenes.
Usage Example:
from SynGenes import SynGenes
sg = SynGenes()
# Mitocondrial
_geneName = sg.fix_gene_name(geneName='cytochrome c oxidase subunit I', type='mt')
print(_geneName)
# Output: 'COI'
# Chloroplast
_geneName = sg.fix_gene_name(geneName='ATPsynthaseCF1 alpha subunit', type='cp')
print(_geneName)
# Output: 'atpA'
build_query
:rocket: Go to Contents Overview
build_query(self, **kwargs)
Builds a query for Entrez search in GenBank or PubMed using the SynGenes database.
The build_query
function constructs a query string that can be used for searching specific gene information in GenBank or PubMed databases.
It ensures that the gene name is in the correct format by referencing the predefined lists for mitochondrial and chloroplast genes.
The search type is also validated against a list of acceptable formats.
If the verbose parameter is True, the function will print informative messages during the query construction process.
Parameters:
geneName (str)
: The gene name to search. The gene name must be in the correct format; use thefix_gene_name()
function to correct the gene name.type (str)
: The type of gene (mt
for Mitochondrial,cp
for Chloroplast). The default ismt
.searchType (str)
: The type of search (Title, Abstract, All Fields, MeSH Terms). The default is All Fields.verbose (bool)
: IfTrue
, messages will be printed during execution. The default isFalse
.
Returns:
query (str)
: The query for Entrez search in GenBank or PubMed.
Notes:
- This function requires the
pandas
library to be imported. - The
SynGenes
database is available at github.com/luanrabelo/SynGenes. - Predefined lists
_listGenes_mt
and_listGenes_cp
contain the correct formats for mitochondrial and chloroplast genes, respectively. - The
_listTypes
contains the valid formats for the search type.
Usage Example:
from SynGenes import SynGenes
sg = SynGenes()
query = sg.build_query(geneName='COI', type='mt', searchType='Title')
print(query)
# Output: '"COI"[Title] OR "cytochrome c oxidase subunit I"[Title] OR "cytochrome c oxidase subunit 1"[Title] OR "chytochrome c oxidase subunit I"[Title]...'
build_json
:rocket: Go to Contents Overview
build_json(self, **kwargs)
Creates a JSON file containing the data from the SynGenes database.
The build_json
function generates a JSON file that encapsulates the SynGenes database’s data.
It takes the name of the file and the path where it should be saved as parameters.
If the file already exists, it is removed, and a new one is created.
The function provides verbose output if the verbose parameter is set to True, informing the user about the file creation process.
During the creation of the JSON file, the function writes the data for mitochondrial and chloroplast genes into separate objects within the file. It also records the date when the file was updated. The verbose output will notify the user when the JSON file is being created and once it has been successfully created.
Parameters:
fileName (str)
: The name of the JSON file. The default isSynGenes.js
.pathSaveFile (str)
: The path where the JSON file will be saved. The default is theSynGenes
folder in the current working directory.verbose (bool)
: If set toTrue
, messages will be printed during execution. The default isFalse
.
Returns:
- A
SynGenes.js
file in theSynGenes
folder.
Notes:
- This function requires the
pandas
library to be imported. - The
SynGenes
database is available at github.com/luanrabelo/SynGenes. - The function checks if the specified JSON file already exists and removes it before creating a new one.
Usage Example:
from SynGenes import SynGenes
sg = SynGenes()
sg.build_json()
version_syngenes
version_syngenes(self)
:rocket: Go to Contents Overview
Displays the current version of the SynGenes database.
The version_syngenes
function outputs the version number of the SynGenes database. It does not take any parameters and does not return any value. Instead, it prints the version number directly to the console.
Parameters:
- None
Returns:
- None
Notes:
- The
SynGenes
database is available at github.com/luanrabelo/SynGenes.
Usage Example:
from SynGenes import SynGenes
sg = SynGenes()
version = sg.version_syngenes()
print(version)
# Output: '1.0'
cite_syngenes
cite_syngenes(self)
:rocket: Go to Contents Overview
Provides the citation format for the SynGenes database.
The cite_syngenes
function outputs the correct citation format for referencing the SynGenes database in academic work or publications.
It does not take any parameters and does not return any value. Instead, it prints the citation instructions directly to the console.
Parameters:
- None
Returns:
- None
Notes:
- The
SynGenes
database is available at github.com/luanrabelo/SynGenes.
Usage Example:
from SynGenes import SynGenes
sg = SynGenes()
howCite = sg.cite_syngenes()
print(howCite)
# Output: Please, cite the SynGenes database as: ...
Web Form for SynGenes
:rocket: Go to Contents Overview
We have developed a user-friendly web form available at (https://luanrabelo.github.io/SynGenes) for researchers who wish to perform individual searches using various names associated with the same gene. This web form generates a command that incorporates multiple names, enabling precise searches on platforms such as the National Center for Biotechnology Information (NCBI) - GenBank and PubMed Central.
SynGenes Development Team
:rocket: Go to Contents Overview
- Luan Rabelo
- Clayton Sodré
- Rodrigo Sousa
- Luciana Watanabe
- Grazielle Gomes
- Iracilda Sampaio
- Marcelo Vallinoto
Citing SynGenes
:rocket: Go to Contents Overview
When referencing the SynGenes class, please cite it appropriately in your academic or professional work.
Rabelo, L.P., Sodré, D., de Sousa, R.P.C. et al. SynGenes: a Python class for standardizing nomenclatures of mitochondrial and chloroplast genes and a web form for enhancing searches for evolutionary analyses. BMC Bioinformatics 25, 160 (2024). https://doi.org/10.1186/s12859-024-05781-y
Contact
:rocket: Go to Contents Overview
For reporting bugs, requesting assistance, or providing feedback, please reach out to Luan Rabelo:
luanrabelo@outlook.com
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file SynGenes-1.1.0.tar.gz
.
File metadata
- Download URL: SynGenes-1.1.0.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39046ecbc6676f456202d8c4ee27b90f8b1c8dc27b5a5a194b8e5a2dde9c7d7f |
|
MD5 | 741ea6baaee8d65b559057629981c9ec |
|
BLAKE2b-256 | 9bf8c20145a909fa455e93ea17f11712ef64ac0b9697bbdc8ff31b163d881468 |
File details
Details for the file SynGenes-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: SynGenes-1.1.0-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e7727bf13e6b424516b36028d4e8cf55c6e7a3ef6323aaf363f91611a098671 |
|
MD5 | bd040d0877c84b6195b23494d37908c1 |
|
BLAKE2b-256 | 23dce19f0696f39389514752131261522ce49e677b416103ddce94cebaffa850 |