Skip to main content

Classification of plasmid sequences

Project description

PlasClass

This module allows for easy classification of sequences as either plasmid or chromosomal. For example, it can be used to classify the contigs in a (metagenomic) assembly.

Installation

plasclass is written in Python3 and requires NumPy and scikit-learn and their dependencies. These will be installed by the setup.py script.

We recommend using a virtual environment. For example, in Linux, before running setup.py:

python -m venv classification-env
source classification-env/bin/activate

In Windows:

pip install virtualenv
virtualenv classification-env
classification-env\Scripts\activate

To install, download and run setup.py:

git clone https://github.com/Shamir-Lab/PlasClass.git
cd PlasClass
python setup.py install

It is possible to install as a user without root permissions:

python setup.py install --user

After installing, run the tests:

python test/test.py

Usage

The script classify_fasta.py can be used to classify the sequences in a fasta file:

python classify_fasta.py -f <fasta file> [-o <output file> default: <fasta file>.probs.out] [-p <num processes> default: 8]

The command line options for this script are:

-f/--fasta: The fasta file to be classified

-o/--outfile: The name of the output file. If not specified, <input filename>.probs.out

-p/--num_processes: The number of processes to use. Default=8

The output file is a tab separated file with each line containing a sequence header and the corresponding score. The sequences are in the same order as in the input fasta file.

The classifier can also be imported and used directly in your own python code. For example, once the plasclass module has been installed you can use the following lines in your own code:

from plasclass import plasclass
my_classifier = plasclass()
my_classifier.classify(seqs)

The plasclass() constructor takes optional parameters:

n_procs - number of processes to use for classification. Default=1.

scales - array of the scales for the sequence lengths. Default=[1000,10000,100000,500000]

ks - array of the k-mer lengths. Default=[3,4,5,6,7]

The sequence(s) to classify, seqs, can be either a single string or a list of strings. The strings must be uppercase.

The function plasclass.classify(seqs) returns a list of plasmid scores, one per input sequence, in the same order as the input.

Training new models

The script train.py can be used to train new models:

python train.py -p <plasmid file> -c <chromosome file> -o <output directory> [-n <num processes> default: 16] [-k <kmer lengths> default: 3,4,5,6,7] [-l <sequence lengths> default: 1000,10000,100000,500000]

The command line options for this script are:

-p/--plasmid: The fasta file of the plasmid references.

-c/--chromosome: The fasta file of the chromosome references.

-n/--num_processes: Number of processes to use.

-o/--outdir: The path of the output directory. Default=bin.

-k/--kmers: Comma separated list of the k-mer sizes to use. Default=3,4,5,6,7.

-l/--lengths: Comma separated list of the sequence lengths to use. Default=1000,10000,100000,500000.

The models should be put into the data directory.

Note that if k-mer and sequence lengths other than the default are used, then these must be specified when calling the plasclass() constructor.

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plasclass-0.1.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plasclass-0.1-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file plasclass-0.1.tar.gz.

File metadata

  • Download URL: plasclass-0.1.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.1

File hashes

Hashes for plasclass-0.1.tar.gz
Algorithm Hash digest
SHA256 276fdbdec12ab19dcd660b1cd6fc004276124f2ed0766ba6275b04c1cbfc2435
MD5 0934cd26a070d39031de4d7a98605a09
BLAKE2b-256 c675db62969b99ead82fb07216e8a60a0afdeae7722fb6f1efdf8cbcb708e74c

See more details on using hashes here.

File details

Details for the file plasclass-0.1-py3-none-any.whl.

File metadata

  • Download URL: plasclass-0.1-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.1

File hashes

Hashes for plasclass-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4c39cd94ac25de836efd588d06ce0e2b0ae84076cf1c58221d1cced790917554
MD5 21880288526164c4b6d019c045e00faa
BLAKE2b-256 1c4ea04c81e8e38b788bd1215e4254df2125e027a5d9cdebcb6d8faaa79a3bc8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page