Sequence Idenification using Decision tRees; a tool to classify DNA reads using machine learning models.
Project description
SIDR (pronounced: cider) is a tool to filter Next Generation Sequencing (NGS) data based on a chosen target organism. SIDR uses data fron BLAST (or similar classifiers) to train a decision tree model to classify sequence data as either belonging to the target organism, or belonging to something else. This classification can be used to filter the data for later assembly.
Note: SIDR is alpha software. Features are currently incomplete and subject to major change.
Installation
To install SIDR, clone this repository and run setup.py, or use pip to install.
pip install sidr
See the documentation for more details.
Usage
SIDR has two main modes. Default mode takes several bioinformatics files as input, and computes a decision tree based on percentage GC content and per-base sequencing coverage. To run it, use:
sidr default -d [taxdump path] -b [bamfile] -f [assembly FASTA] -r [BLAST results] -k tokeep.contigids -x toremove.contigids -t [target phylum]
Runfile mode takes a tab-delimited file of contigs, variables, and classification as input. To run it, use:
sidr runfile -i [runfile] -k tokeep.contigids -x toremove.contigids -t [target phylum]
See the documentation for more details.
TODO
More complete documentation
More unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file SIDR-0.0.2a2.tar.gz
.
File metadata
- Download URL: SIDR-0.0.2a2.tar.gz
- Upload date:
- Size: 20.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.13.0 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.19.4 CPython/2.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 23a1da88d3d531db4140e7a3eac829ab29c987cdd11ddc8d5411006cdd8d0550 |
|
MD5 | f2e5a56d884f449ec39fd9a1c6f92e8d |
|
BLAKE2b-256 | 85beee925b7b4cd266d888ac823a5b1997f86e61c27810124ef8edab2710719b |
File details
Details for the file SIDR-0.0.2a2-py2.py3-none-any.whl
.
File metadata
- Download URL: SIDR-0.0.2a2-py2.py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.13.0 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.19.4 CPython/2.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56c67bad18661170caa780261e1b2262117d86db2e76c1122ad146cc1b433349 |
|
MD5 | b52f0fa333fdaca8c5b8ecbf7e6529f7 |
|
BLAKE2b-256 | efbffa0f5db0599004814d44750b15e1b5431aad201c4ea0152434c990c2dfb6 |