Skip to main content

Prediction of Genomic Islands

Project description

TreasureIsland

TreasureIsland python package is a machine learning-based Genomic Island prediction software, that uses an unsupervised representation of DNA for its prediction.

TreasureIsland was constructed from Benbow dataset.

Dependency :

Python >= 3.7

Installation:

Option1 - Use pip to install the package :

TreasureIsland can be installed by python package management system "pip" :

python -m pip install treasureisland

if treasureisland is already installed :

python -m pip install treasureisland --upgrade

Option2 - Locally install package:

git clone https://github.com/priyamayur/GenomicIslandPrediction.git
python -m pip install -e GenomicIslandPrediction

Usage:

Option1 - Run TreasureIsland directly from commandline :

Run TreasureIsland from commandline on your DNA fasta file (example DNA files provided here), output is given in csv format:

treasureisland mypath/<DNA file>.fasta [-o <output_file_path>]     

Option2 - Run TreasureIsland from python :

The TreasureIsland package is used to find genomic island predictions which can be downloaded in csv, xlsx, txt file formats demonstrated in TreasureIsland package

Input file:

DNA sequence files in fasta format with a sequenceID.

example: >NC_002620.2 Chlamydia muridarum str. Nigg, complete sequence CACATAGCAAAACACTCAAAGTTTTTCAGCAAAAAAGCTTGTTGAAAAAATTGTTGACCGCCTGTTCACA....

Performance:

TreasureIsland takes 2-5 mins to run depending on the size of the input.

Output :

The results are shown in the following format for each genomic island:

example : NC_002620.2 1.0 130000.0 0.95597

Testing:

Repository contains some sample DNA files that can be downloaded to test the TreasureIsland. Note : github downloads fasta file in txt format (filename.fasta.txt).

example :

treasureisland ecoli.fasta -o gei_output   

Running TreasureIsland package from python:

import the Predictor class from treasureisland package:

from treasureisland.Predictor import Predictor

Instantiate the sequence with the DNA sequence file path as the argument. The DNA file used can be a fasta or genbank file.

seq = Predictor("<Path to DNA fasta file>/ecoli.fasta", "<output_file_path>") 

Get prediction data frame from sequence by running the predict method.

pred = seq.predict()

The predictions can be downloaded in text, csv, excel formats.

seq.predictions_to_csv(pred)
seq.predictions_to_excel(pred)
seq.predictions_to_text(pred)

Contact:

Feel free to contact at banerjee.p1104@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

treasureisland-1.0.4.tar.gz (5.1 MB view details)

Uploaded Source

Built Distribution

treasureisland-1.0.4-py3-none-any.whl (5.2 MB view details)

Uploaded Python 3

File details

Details for the file treasureisland-1.0.4.tar.gz.

File metadata

  • Download URL: treasureisland-1.0.4.tar.gz
  • Upload date:
  • Size: 5.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for treasureisland-1.0.4.tar.gz
Algorithm Hash digest
SHA256 9aa53cae4844a10531a9141087309a03c283a2d958826cf7e636a2e708b5d21f
MD5 9a95d26deda9cfa2057a328b26cfbada
BLAKE2b-256 82dd98516202a403a9bc161e10ae8c00b6e1df9b178ac0900542eaa73898b3a1

See more details on using hashes here.

File details

Details for the file treasureisland-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: treasureisland-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 5.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for treasureisland-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 128a6ae9ec681ac80ee3d834f9937e4e0109ff2c3e48b1e8d13eacbace4c853b
MD5 747c5220308ba599cd517867744e9fbb
BLAKE2b-256 f9b3b779231360a501181a031c344ce5c94d9fc0ba499aa60e4b7bbbdd3372cd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page