A Python package for DNA-based evolution of protien sequences. Holistic apprOach for SequAnces To underZtand evolutIoN.
Project description
hoatzin
A Python package for DNA-based evolution of protien sequences. Holistic apprOach for SequAnces To underZtand evolutIoN.
Last updated August 2023
Current version: v0.12
Installation
The current stable version of hoatzin is available through GitHub or the Python Package Index (PyPI).
To install from PyPI, run:
pip install idptools-hoatzin
You can also install the current development version from
pip install git+https://git@github.com/idptools/hoatzin
To clone the GitHub repository and gain the ability to modify a local copy of the code, run
git clone https://github.com/idptools/hoatzin.git
cd hoatzin
pip install -e .
Usage
First import hoatzin
from hoatzin import evolve
Evolving Sequences
The evolve.sequence()
function lets you evolve protein or DNA sequences. If you input a protein sequence, it will be turned into a DNA sequence using the codon usage frequencies from humans. The probabilities for each mutation then use nucleotide mutation probabilities that are from COSMIC 2023, which examined the frequencies of non-synonymous mutations in the human genome. The evolve.sequence()
function requires that you input a sequence as the first argument and then the number of generations as the second argument. 1 DNA mutation per generation is assumed.
sequence='QQQGSRGSGSGRRRGSGSGQGS'
evolved_sequence = evolve.sequence(sequence, number_generations=10)
print(evolved_sequence)
Which would return something like:
QQQGPSGSRNGRRRGFSGGLDS
Optional Arguments:
Using the evolve.sequence()
function, you can specify additional parameters.
mutations_per_generation - the number of DNA mutations in each 'mutation' generation.
mutation_probs - The probabilities of each mutation. You can specify your own dictionary of mutations. See NUCLEOTIDE_MUTATION_PROBS in hoatzin_parameters to specify this.
sequence_type - Lets you specify if you want to mutate a DNA sequence or a protein sequence. You must specify as 'nucleotide_sequence' if you are inputting a nucleotide sequence.
codon_probs - Lets you specify the probabilities of each codon when going from a protein sequence to a DNA sequence.
return_all_seqs - Lets you specify whether to return all sequences generated (one sequence per generation) or just get back a single final sequence.
Example
sequence='QQQGSRGSGSGRRRGSGSGQGS'
evolved_sequence = evolve.sequence(sequence, number_generations=10, return_all_seqs=True)
print(evolved_sequence)
Would return something like...
{'original': 'QQQGSRGSGSGRRRGSGSGQGS', 1: 'QQQGSRGSGSGHRRGSGSGQGS', 2: 'QQQGSRGSGSGHRRGSGSGQGS', 3: 'QQQGSRGSGSGHKRGSGSGQGS', 4: 'QQQGSRGSGSGDKRGSGSGQGS', 5: 'QRQGSRGSGSGDKRGSGSGQGS', 6: 'QRQGSRGSGSGDKRGSGSGQGL', 7: 'QRQGSRGFGSGDKRGSGSGQGL', 8: 'QRQGSRGFRSGDKRGSGSGQGL', 9: 'QREGSRGFRSGDKRGSGSGQGL', 10: 'QREG*RGFRSGDKRGSGSGQGL'}
Copyright
Copyright (c) 2023, Ryan Emenecker - Holehouse Lab
Acknowledgements
Project based on the Computational Molecular Science Python Cookiecutter version 1.1.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file idptools-hoatzin-0.12.tar.gz
.
File metadata
- Download URL: idptools-hoatzin-0.12.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9b1b35fd3329cc533c674b85ef88cbbf62c9936020f01e4f4867abca8a886ad5 |
|
MD5 | c062393d467cc75172d3b48d2ec91ffe |
|
BLAKE2b-256 | 6306468ae6f25f5cda06ff9c8093e8090839389a755add7a3e01b21e2121107b |
File details
Details for the file idptools_hoatzin-0.12-py3-none-any.whl
.
File metadata
- Download URL: idptools_hoatzin-0.12-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bca183acf4bf72217d7140244025a48738277c83d1f760ad558840f223a5c7bc |
|
MD5 | 0170b1e93f0fdeb3f6a413ff6d3ea1cd |
|
BLAKE2b-256 | 40746e87915d57a7e639ef85715c6dcd5e85805cc4c77a4b2fa027b83cf799d2 |