Skip to main content

Sequence Idenification using Decision tRees; a tool to classify DNA reads using machine learning models.

Project description

https://travis-ci.org/damurdock/SIDR.svg?branch=master

SIDR (pronounced: cider) is a tool to filter Next Generation Sequencing (NGS) data based on a chosen target organism. SIDR uses data fron BLAST (or similar classifiers) to train a decision tree model to classify sequence data as either belonging to the target organism, or belonging to something else. This classification can be used to filter the data for later assembly.

Note: SIDR is alpha software. Features are currently incomplete and subject to major change.

Installation

To install SIDR, clone this repository and run setup.py, or use pip to install.

pip install sidr

See the documentation for more details.

Usage

SIDR has two main modes. Default mode takes several bioinformatics files as input, and computes a decision tree based on percentage GC content and per-base sequencing coverage. To run it, use:

sidr default -d [taxdump path] -b [bamfile] -f [assembly FASTA] -r [BLAST results] -k tokeep.contigids -x toremove.contigids -t [target phylum]

Runfile mode takes a tab-delimited file of contigs, variables, and classification as input. To run it, use:

sidr runfile -i [runfile] -k tokeep.contigids -x toremove.contigids -t [target phylum]

See the documentation for more details.

TODO

  • More complete documentation

  • More unit tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SIDR-0.0.2a2.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

SIDR-0.0.2a2-py2.py3-none-any.whl (13.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file SIDR-0.0.2a2.tar.gz.

File metadata

  • Download URL: SIDR-0.0.2a2.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.13.0 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.19.4 CPython/2.7.15

File hashes

Hashes for SIDR-0.0.2a2.tar.gz
Algorithm Hash digest
SHA256 23a1da88d3d531db4140e7a3eac829ab29c987cdd11ddc8d5411006cdd8d0550
MD5 f2e5a56d884f449ec39fd9a1c6f92e8d
BLAKE2b-256 85beee925b7b4cd266d888ac823a5b1997f86e61c27810124ef8edab2710719b

See more details on using hashes here.

File details

Details for the file SIDR-0.0.2a2-py2.py3-none-any.whl.

File metadata

  • Download URL: SIDR-0.0.2a2-py2.py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.13.0 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.19.4 CPython/2.7.15

File hashes

Hashes for SIDR-0.0.2a2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 56c67bad18661170caa780261e1b2262117d86db2e76c1122ad146cc1b433349
MD5 b52f0fa333fdaca8c5b8ecbf7e6529f7
BLAKE2b-256 efbffa0f5db0599004814d44750b15e1b5431aad201c4ea0152434c990c2dfb6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page