Skip to main content

Prediction of Genomic Islands

Project description

TreasureIsland

TreasureIsland python package is a machine learning-based Genomic Island prediction software, that uses an unsupervised representation of DNA for its prediction.

TreasureIsland was constructed from Benbow dataset.

Dependency :

Python >= 3.7

Installation:

Option1 - Use pip to install the package :

TreasureIsland can be installed by python package management system "pip" :

pip install treasureisland

Option2 - Locally install package:

git clone https://github.com/priyamayur/GenomicIslandPrediction.git
python -m pip install -e GenomicIslandPrediction

Usage:

The treasureisland package is used to find genomic island predictions which can be downloaded as csv, xlsx, txt files demonstrated in TreasureIsland package

Or, run script locally to get predicitons quickly:

Clone the github repository if not cloned before:

git clone https://github.com/priyamayur/GenomicIslandPrediction.git
cd GenomicIslandPrediction
python run_treasureisland.py <DNA file>     

Input file:

DNA sequence files in fasta format with a sequenceID.

example: >NC_002620.2 Chlamydia muridarum str. Nigg, complete sequence CACATAGCAAAACACTCAAAGTTTTTCAGCAAAAAAGCTTGTTGAAAAAATTGTTGACCGCCTGTTCACA....

Performance:

TreasureIsland takes 2-5 mins to run depending on the size of the input.

Output :

Can be downloaded in csv, xlsx, txt formats. The results are shown in the following format for each genomic island:

example : NC_002620.2 1.0 130000.0 0.95597

The sample outputs can be found in the repository - output_NC_002620.2.txt, output_NC_002620.2.csv, output_NC_002620.2.xlsx

Testing:

Repository contains some sample DNA files that can be used to test the TreasureIsland.

example :

cd GenomicIslandPrediction
python run_treasureisland.py genome/ecoli.fasta    

TreasureIsland package:

import the sequence class from treasureisland package:

from treasureisland.dna_sequence import sequence 

Instantiate the sequence with the DNA sequence file path as the argument. The DNA file used can be a fasta or genbank file.

seq = sequence("C:/Users/USER/GenomicIslandPrediction/genome/bsub.fasta") # enter local path for sequence file

Get prediction data frame from sequence by running the predict method.

pred = seq.predict()

The predictions can be downloaded in text, csv, excel formats.

seq.predictions_to_csv(pred)
seq.predictions_to_excel(pred)
seq.predictions_to_text(pred)

Contact:

Feel free to contact at banerjee.p1104@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

treasureisland-1.0.2.tar.gz (5.1 MB view details)

Uploaded Source

Built Distribution

treasureisland-1.0.2-py3-none-any.whl (5.2 MB view details)

Uploaded Python 3

File details

Details for the file treasureisland-1.0.2.tar.gz.

File metadata

  • Download URL: treasureisland-1.0.2.tar.gz
  • Upload date:
  • Size: 5.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5

File hashes

Hashes for treasureisland-1.0.2.tar.gz
Algorithm Hash digest
SHA256 624a95de72589067d0a1d9c4e2b43dc1de1ed2b5d5ea29be17d5dcc638382436
MD5 661694163301a8c741733521c0832509
BLAKE2b-256 79b7228e3a5c0c3445db1508f0f03d4df1e85e9c45e53ee38370c3e7217a3575

See more details on using hashes here.

File details

Details for the file treasureisland-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: treasureisland-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5

File hashes

Hashes for treasureisland-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e69bf2e32742991b2635d29db276ff1d8eff965efa54d927ce1581861851cf4a
MD5 d8422d6db98cb69f28fa9e1f5cf6d15f
BLAKE2b-256 2412292c2b0a93a38816fbf6950ededfc4e1123e5dd70797b89c34d60984fbad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page