CFIA OLC Genome Quality Assessment with Machine Learning
Project description
[](https://travis-ci.org/OLC-LOC-Bioinformatics)
# GenomeQAML: Genome Quality Assesment with Machine Learning
The GenomeQAML is a script that uses a pre-computed ExtraTreesClassifier model in order to
classify FASTA-formatted _de novo_ assemblies as bad, good, or very good. It's easy to use,
and has minimal dependencies.
## External Dependencies
- [Mash (v2.0 or greater)](https://github.com/marbl/mash)
- [Prodigal (>=2.6.2)](https://github.com/hyattpd/Prodigal)
Both of these need to be downloaded and included on your $PATH.
## Installation
All you need to do is install with pip: `pip install genomeqaml`.
Usage of a virtualenv
is highly recommended.
## Usage
GenomeQAML takes a directory containing uncompressed fasta files as input - these will be classified and a
report written to a CSV-formatted file for your inspection.
To run, type `classify.py -t /path/to/fasta/folder`
This will create a report, by default called `QAMLreport.csv`. You can change the name
of the report with the `-r` argument.
```
usage: classify.py [-h] -t TEST_FOLDER [-r REPORT_FILE]
optional arguments:
-h, --help show this help message and exit
-t TEST_FOLDER, --test_folder TEST_FOLDER
Path to folder containing FASTA files you want to
test.
-r REPORT_FILE, --report_file REPORT_FILE
Name of output file. Default is QAMLreport.csv.
```
# GenomeQAML: Genome Quality Assesment with Machine Learning
The GenomeQAML is a script that uses a pre-computed ExtraTreesClassifier model in order to
classify FASTA-formatted _de novo_ assemblies as bad, good, or very good. It's easy to use,
and has minimal dependencies.
## External Dependencies
- [Mash (v2.0 or greater)](https://github.com/marbl/mash)
- [Prodigal (>=2.6.2)](https://github.com/hyattpd/Prodigal)
Both of these need to be downloaded and included on your $PATH.
## Installation
All you need to do is install with pip: `pip install genomeqaml`.
Usage of a virtualenv
is highly recommended.
## Usage
GenomeQAML takes a directory containing uncompressed fasta files as input - these will be classified and a
report written to a CSV-formatted file for your inspection.
To run, type `classify.py -t /path/to/fasta/folder`
This will create a report, by default called `QAMLreport.csv`. You can change the name
of the report with the `-r` argument.
```
usage: classify.py [-h] -t TEST_FOLDER [-r REPORT_FILE]
optional arguments:
-h, --help show this help message and exit
-t TEST_FOLDER, --test_folder TEST_FOLDER
Path to folder containing FASTA files you want to
test.
-r REPORT_FILE, --report_file REPORT_FILE
Name of output file. Default is QAMLreport.csv.
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
GenomeQAML-0.0.9.tar.gz
(8.1 MB
view details)
File details
Details for the file GenomeQAML-0.0.9.tar.gz.
File metadata
- Download URL: GenomeQAML-0.0.9.tar.gz
- Upload date:
- Size: 8.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
706a4b4c019073d66e79689100950099c3c3fe9bddc35386cbb6091d492efd0e
|
|
| MD5 |
94f02f52b2107c346863bdca827f915b
|
|
| BLAKE2b-256 |
14e8515546ce52ce1a1eda80e6f5106225fd0104e3f1b2f8ff967b0f02c1098d
|