Skip to main content

This package aids in the analysis of orthologous genes.

Project description

CI PyPI Documentation Coverage Last Commit

OrthoEvolution

OrthoEvolution is an easy to use and comprehensive Python package which aids in the analysis and visualization of comparative evolutionary genetics related projects such as the inference of orthologs.

Current Version: 1.0.0b2

Overview

This package focuses on inferring orthologs using NCBI's blast, various sequence alignment strategies, and phylogenetics analyses including PAML, PhyML, ete3, and more tools.

Ultimately, the goal of this project is to create a reusable pipeline for the inference of orthologs in order to ensure reproducibility of data as well as improve the management and analysis of (what can be) large datasets. The Cookies, Manager, Pipeline, and Tools modules act as a framework for our workflow, while the Orthologs module provides access to specific functions for our various ortholog inference projects.

View our read the docs and feel free to also read this related paper to gain more insight into this project/python package.

Installation

View the below methods for installing this package. Python 3.9 or higher is required.

PyPI

pip install --upgrade pip
pip install OrthoEvol

GitHub

git clone https://github.com/datasnakes/OrthoEvolution.git
cd OrthoEvolution
pip install --upgrade pip
pip install .

Development Code

WARNING : This code is actively under development and may not be reliable. Please create an issue for questions about development.

git clone -b dev https://github.com/datasnakes/OrthoEvolution.git
cd OrthoEvolution
pip install --upgrade pip
pip install .

Examples

Please view the examples directory for working examples and scripts demonstrating how to utilize this package.

The examples include:

  • Standalone scripts for common workflows
  • Example data files
  • GUI implementations (Tkinter and PyWebView)
  • Pipeline demonstrations

Running a pre-configured local blast

from OrthoEvol.Orthologs.Blast import OrthoBlastN

# Use an existing list of gpcr genes
gpcr_blastn = OrthoBlastN(project="orthology-gpcr", method=1,
                         save_data=True, acc_file="gpcr.csv", 
                         copy_from_package=True)

# Run blast
gpcr_blastn.run()

Simple project creation

from OrthoEvol.Manager.management import ProjectManagement

ProjectManagement(repo="test-repo", user=None,
                  project="test-project",
                  research=None,
                  research_type='comparative_genetics',
                  new_repo=False, new_user=False, new_project=True,
                  new_research=False)

Simple blast database downloading

from OrthoEvol.Tools.ftp import NcbiFTPClient

ncbiftp = NcbiFTPClient(email='somebody@gmail.com')
ncbiftp.getblastdb(database_name='refseq_rna', v5=True)

Creating projects and databases dynamically

from OrthoEvol.Manager.management import ProjectManagement
from OrthoEvol.Manager.database_dispatcher import DatabaseDispatcher
from OrthoEvol.Manager.config import yml
from pkg_resources import resource_filename
from pathlib import Path
import yaml
import getpass
from datetime import datetime as d
import os

# Define job name
job_name = "jobname"

# Function to load configuration from YAML file
def load_config(file_name):
    file_path = resource_filename(yml.__name__, file_name)
    with open(file_path, 'r') as file:
        return yaml.load(file, Loader=yaml.FullLoader)

# Load project management configuration
pm_config = load_config("initialize_new.yml")
project_manager = ProjectManagement(**pm_config["Management_config"])

# Load and update database management configuration
db_config = load_config("databases.yml")
db_config.update(pm_config)

# Configure NCBI RefSeq release settings
ncbi_config = db_config['Database_config']['Full']['NCBI']['NCBI_refseq_release']
ncbi_config['upload_number'] = 12
ncbi_config['pbs_dict'] = {
    'author': getpass.getuser(),
    'description': 'This is a default pbs job.',
    'date': d.now().strftime('%a %b %d %I:%M:%S %p %Y'),
    'proj_name': 'OrthoEvol',
    'select': '1',
    'memgb': '6gb',
    'cput': '72:00:00',
    'wt': '2000:00:00',
    'job_name': job_name,
    'outfile': job_name + '.o',
    'errfile': job_name + '.e',
    'script': job_name,
    'log_name': job_name,
    'pbsworkdir': os.getcwd(),
    'cmd': f'python3 {os.path.join(os.getcwd(), job_name + ".py")}',
    'email': 'n/a'
}

# Save the updated configuration to a YAML file
config_file_path = project_manager.user_log / Path("upload_config.yml")
with open(str(config_file_path), 'w') as config_file:
    yaml.dump(db_config, config_file, default_flow_style=False)

# Initialize database dispatcher and execute dispatch functions
db_dispatcher = DatabaseDispatcher(config_file_path, project_manager)
db_dispatcher.dispatch(db_dispatcher.strategies, db_dispatcher.dispatcher, db_dispatcher.configuration)

Tests

To run tests, first install the test dependencies:

pip install pytest pytest-cov

Then run the test suite:

pytest tests

Contributors

This package was created by the Datasnakes.

If you would like to contribute to this package, install the package in development mode:

pip install -e .

Check out our contributing guidelines for more information.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citations

We're thankful to have a resource such as Biopython, which inspired this package.

Cock, P.J.A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009 Jun 1; 25(11) 1422-3 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orthoevol-1.0.0b2.tar.gz (23.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orthoevol-1.0.0b2-py3-none-any.whl (24.0 MB view details)

Uploaded Python 3

File details

Details for the file orthoevol-1.0.0b2.tar.gz.

File metadata

  • Download URL: orthoevol-1.0.0b2.tar.gz
  • Upload date:
  • Size: 23.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for orthoevol-1.0.0b2.tar.gz
Algorithm Hash digest
SHA256 565800b222ac283726817af386e75f6a722c475c4b4d5c35188cde0ef57570d8
MD5 8928bef2d37c9202118ee6a999715c24
BLAKE2b-256 18f4294f2a73442f2ba038558df08c4aa3587b40a42ece4a5aa37eefe425a2ce

See more details on using hashes here.

File details

Details for the file orthoevol-1.0.0b2-py3-none-any.whl.

File metadata

  • Download URL: orthoevol-1.0.0b2-py3-none-any.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for orthoevol-1.0.0b2-py3-none-any.whl
Algorithm Hash digest
SHA256 2193cbf5223c7b99049fb6ffbb822ccfd3e7e6c5ee2184e92a123b7803db79d6
MD5 4376f07efd4622d2fdec352ae83061a6
BLAKE2b-256 cf0fa3cc5109629664428da91fc01ac682739f00c377ad85c80594a657c1a003

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page