Skip to main content

CFIA OLC Genome Quality Assessment with Machine Learning

Project description

[![Build status](https://travis-ci.org/OLC-LOC-Bioinformatics/GenomeQAML.svg?master)](https://travis-ci.org/OLC-LOC-Bioinformatics)
# GenomeQAML: Genome Quality Assesment with Machine Learning

The GenomeQAML is a script that uses a pre-computed ExtraTreesClassifier model in order to
classify FASTA-formatted _de novo_ assemblies as bad, good, or very good. It's easy to use,
and has minimal dependencies.

## External Dependencies

- [Mash (v2.0 or greater)](https://github.com/marbl/mash)
- [Prodigal (>=2.6.2)](https://github.com/hyattpd/Prodigal)

Both of these need to be downloaded and included on your $PATH.

## Installation

All you need to do is install with pip: `pip install genomeqaml`.

Usage of a virtualenv
is highly recommended.

## Usage

GenomeQAML takes a directory containing uncompressed fasta files as input - these will be classified and a
report written to a CSV-formatted file for your inspection.

To run, type `classify.py -t /path/to/fasta/folder`

This will create a report, by default called `QAMLreport.csv`. You can change the name
of the report with the `-r` argument.

```
usage: classify.py [-h] -t TEST_FOLDER [-r REPORT_FILE]

optional arguments:
-h, --help show this help message and exit
-t TEST_FOLDER, --test_folder TEST_FOLDER
Path to folder containing FASTA files you want to
test.
-r REPORT_FILE, --report_file REPORT_FILE
Name of output file. Default is QAMLreport.csv.

```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GenomeQAML-0.0.9.tar.gz (8.1 MB view details)

Uploaded Source

File details

Details for the file GenomeQAML-0.0.9.tar.gz.

File metadata

  • Download URL: GenomeQAML-0.0.9.tar.gz
  • Upload date:
  • Size: 8.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for GenomeQAML-0.0.9.tar.gz
Algorithm Hash digest
SHA256 706a4b4c019073d66e79689100950099c3c3fe9bddc35386cbb6091d492efd0e
MD5 94f02f52b2107c346863bdca827f915b
BLAKE2b-256 14e8515546ce52ce1a1eda80e6f5106225fd0104e3f1b2f8ff967b0f02c1098d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page