Prediction of Genomic Islands
Project description
TreasureIsland
TreasureIsland python package is a machine learning-based Genomic Island prediction software, that uses an unsupervised representation of DNA for its prediction.
TreasureIsland was constructed from Benbow dataset.
Dependency :
Python >= 3.7
Installation:
Option1 - Use pip to install the package :
TreasureIsland can be installed by python package management system "pip" :
pip install treasureisland
Option2 - Locally install package:
git clone https://github.com/priyamayur/GenomicIslandPrediction.git
python -m pip install -e GenomicIslandPrediction
Usage:
The treasureisland package is used to find genomic island predictions which can be downloaded as csv, xlsx, txt files demonstrated in TreasureIsland package
Or, run script locally to get predicitons quickly:
Clone the github repository if not cloned before:
git clone https://github.com/priyamayur/GenomicIslandPrediction.git
cd GenomicIslandPrediction
python run_treasureisland.py <DNA file>
Input file:
DNA sequence files in fasta format with a sequenceID.
example: >NC_002620.2 Chlamydia muridarum str. Nigg, complete sequence CACATAGCAAAACACTCAAAGTTTTTCAGCAAAAAAGCTTGTTGAAAAAATTGTTGACCGCCTGTTCACA....
Performance:
TreasureIsland takes 2-5 mins to run depending on the size of the input.
Output :
Can be downloaded in csv, xlsx, txt formats. The results are shown in the following format for each genomic island:
example : NC_002620.2 1.0 130000.0 0.95597
The sample outputs can be found in the repository - output_NC_002620.2.txt, output_NC_002620.2.csv, output_NC_002620.2.xlsx
Testing:
Repository contains some sample DNA files that can be used to test the TreasureIsland.
example :
cd GenomicIslandPrediction
python run_treasureisland.py genome/ecoli.fasta
TreasureIsland package:
import the sequence class from treasureisland package:
from treasureisland.dna_sequence import sequence
Instantiate the sequence with the DNA sequence file path as the argument. The DNA file used can be a fasta or genbank file.
seq = sequence("C:/Users/USER/GenomicIslandPrediction/genome/bsub.fasta") # enter local path for sequence file
Get prediction data frame from sequence by running the predict method.
pred = seq.predict()
The predictions can be downloaded in text, csv, excel formats.
seq.predictions_to_csv(pred)
seq.predictions_to_excel(pred)
seq.predictions_to_text(pred)
Contact:
Feel free to contact at banerjee.p1104@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file treasureisland-1.0.2.tar.gz
.
File metadata
- Download URL: treasureisland-1.0.2.tar.gz
- Upload date:
- Size: 5.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 624a95de72589067d0a1d9c4e2b43dc1de1ed2b5d5ea29be17d5dcc638382436 |
|
MD5 | 661694163301a8c741733521c0832509 |
|
BLAKE2b-256 | 79b7228e3a5c0c3445db1508f0f03d4df1e85e9c45e53ee38370c3e7217a3575 |
File details
Details for the file treasureisland-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: treasureisland-1.0.2-py3-none-any.whl
- Upload date:
- Size: 5.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e69bf2e32742991b2635d29db276ff1d8eff965efa54d927ce1581861851cf4a |
|
MD5 | d8422d6db98cb69f28fa9e1f5cf6d15f |
|
BLAKE2b-256 | 2412292c2b0a93a38816fbf6950ededfc4e1123e5dd70797b89c34d60984fbad |