Skip to main content

Prediction of Genomic Islands

Project description

TreasureIsland

TreasureIsland python package is a machine learning-based Genomic Island prediction software, that uses an unsupervised representation of DNA for its prediction.

TreasureIsland is constructed from the Benbow dataset.

Dependencies :

Python >= 3.7 Tested on Linux, mac machine For mac, make sure to run: python3 -m ensurepip --upgrade

Installation:

python3 -m venv venv
source venv/bin/activate

Option1 - Use pip:

python3 -m pip install treasureisland

if treasureisland is already installed :

python3 -m pip install treasureisland --upgrade

Option2 - Locally install package:

git clone https://github.com/priyamayur/GenomicIslandPrediction.git
python3 -m pip install -e GenomicIslandPrediction

Usage:

Option1 - Run TreasureIsland directly from commandline :

Run TreasureIsland from commandline on your DNA fasta file (example DNA files provided here), output is given in csv format:

treasureisland mypath/<DNA file>.fasta [-o <output_file_path>] [-ut <upper threshold value>] 

Option2 - Run TreasureIsland from python :

The TreasureIsland package is used to find genomic island predictions which can be downloaded in csv, xlsx, txt file formats demonstrated in TreasureIsland package

Input file:

DNA sequence files in fasta format with a sequenceID.

example: >NC_002620.2 Chlamydia muridarum str. Nigg, complete sequence CACATAGCAAAACACTCAAAGTTTTTCAGCAAAAAAGCTTGTTGAAAAAATTGTTGACCGCCTGTTCACA....

Performance:

TreasureIsland takes 2-5 mins to run depending on the size of the input.

Output :

The results are shown in the following format for each genomic island:

example : NC_002620.2 1.0 130000.0 0.95597

Upper Threshold:

User also has the ability to change the upper threshold value to change the precision and recall tradeoff. upper threshold is set to 0.80 by default.

Example :

treasureisland ecoli.fasta -o gei_output -ut 0.95 Setting the upper threshold to 0.95 would increase the precision and decrease the recall performance.

Testing:

A repository containing some sample DNA files that can be downloaded to test TreasureIsland. Note : github downloads fasta file in txt format (filename.fasta.txt).

example :

treasureisland ecoli.fasta -o gei_output -ut 0.95 

Running TreasureIsland package from python:

import the Predictor class from treasureisland package:

from treasureisland.Predictor import Predictor

Instantiate the sequence with the DNA sequence file path as the argument. The DNA file used can be a fasta file.

seq = Predictor("<Path to DNA fasta file>/ecoli.fasta", "<output_file_path>") 

Optionally, change the upper threshold value.

seq.change_upper_threshold(0.9)

Get prediction data frame from sequence by running the predict method.

pred = seq.predict()

The predictions can be downloaded in text, csv, excel formats.

seq.predictions_to_csv(pred)
seq.predictions_to_excel(pred)
seq.predictions_to_text(pred)

Contact:

Feel free to contact at banerjee.p1104@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

treasureisland-1.1.4.tar.gz (20.0 MB view details)

Uploaded Source

Built Distribution

treasureisland-1.1.4-py3-none-any.whl (20.1 MB view details)

Uploaded Python 3

File details

Details for the file treasureisland-1.1.4.tar.gz.

File metadata

  • Download URL: treasureisland-1.1.4.tar.gz
  • Upload date:
  • Size: 20.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.1

File hashes

Hashes for treasureisland-1.1.4.tar.gz
Algorithm Hash digest
SHA256 fff30cce06fdd329e8effbeb12ad116f2847b79eeb13f9ecfabd2deed21b6846
MD5 d4332c4f20fbceee48a6925a2883d7e1
BLAKE2b-256 ca7fc8ad9766761c9d886ff6aaf5cb4a492249e334c8e1c107e50cb01c5c35fb

See more details on using hashes here.

File details

Details for the file treasureisland-1.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for treasureisland-1.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c96f9df7d2aeac37ac54076a791324982d177168d9393a57e6dd87ac59891b8a
MD5 9a3ffe6aaf2fc6fbc1bf994eb733e490
BLAKE2b-256 b2e11a1e10976c35d7f0383e06cfc5bd9ab91465acab1acaacb92ea11476b55b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page