Skip to main content

OMArk - Proteome quality assesment based on OMAmer placements

Project description

OMArk

OMArk is a software of proteome (protein-coding gene repertoire) quality assessment. It provides measure of proteome completeness, characterize all protein coding genes in the light of existing homologs, and identify the presence of contamination from other species. OMArk rely on the OMA orthology database, from which it exploits orthology relationships, and on the OMAmer software for fast placement of all proteins into gene families.

Installation

You can use OMArk by installing the package through PyPi:

pip install omark

Or by cloning this repository and installing it manually with your Python installer.

Example command from the git directory: python setup.py install or pip install .

You can then use it on your Python environment by calling it as a command line tool. OMArk rely on an OMAmer database to run. For all OMArk features to work correctly, it is better for this database to cover a wide range of species. We recommend using one constructed from the whole OMA database. You can download one manually on this link : DOI - File :OMAmerDB.tar.gz

Usage

Required arguments: -f (--file), -d (--database)

usage: omark [-h] -f FILE -d DATABASE [-o OUTPUTFOLDER] [-t TAXID] [-of OG_FASTA] [-i ISOFORM_FILE] [-v]

Arguments

Flag Default Description
-f --file Path to an OMAmer search output file
-d --db Path to an OMAmer database
-o --outputFolder ./omark_output/ Path to the folder into which OMArk results will be output. OMArk will create it if it does not exist.
-t --taxid None NCBI taxid corresponding to the input proteome (Optional).
-of --og_fasta None The original proteomes file. Provide if you want optional FASTA file to be outputted by OMArk (Sequences by categories, sequences by detected species, etc)
-i, --isoform_file None A semi-colon separated file, listing all isoforms of each genes, with one gene per line. Use if your input proteome include more than one protein per gene.
-v --verbose False Turn on logging information about OMArk progress.

Output

A default OMAmer output consists of 4 files with the same name but different extensions.

OMArk output the main results of the analysis in two complementary files: a machine-readable one, identified by its .sum extension, and a human-readable summary ending with _detailed_summary.txt. These commented files reports:

  • The reference lineage that was used for quality assessment
  • The number of conserved Hierarchical Orthologous Groups (HOGs) used for completeness assessment
  • The completeness assessment results (Single, Duplicated, Missing)
  • The whole proteome quality assessment results (Consistent placements, Inconsistent Placements, Contaminants, Missing genes)
  • The species and contaminant detected in the proteome

The file with the .pdf extension is a graphical representation of the completeness and whole proteome quality assesment.

The file with the .tax extension indicate: the closest taxonomic lineage in the OMA database and the selected reference lineage.

The file with the .omq extension recapitulates the HOGs identifier used in the completeness analysis, and the category to which they were attributed.

The file with the .ump extensions recapitulates the identifier for all proteins that were not mapped in OMAmer.

Example

You can run OMArk on an example files stored on the example_data folder. Remember to download an OMAmer databqse as indicated in the installation section.

First: you can run OMAmer on the proteome FASTA. (For more documentation about installing OMAmer: see its Github This step should take less than 15 minutes.

omamer search --db  LUCA.h5 --query example_data/UP000005640_9606.fasta  --score sensitive --out example_data/UP000005640_9606.omamer

Then, use OMArk (Should take less than 10 minutes) after creating an empty output folder:

mkdir example_data/omark_output

omark -f example_data/UP000005640_9606.omamer -d LUCA.h5 -o example_data/omark_output

You can now explore OMArk results in the omark_output folder

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omark-0.2.1.tar.gz (36.8 kB view hashes)

Uploaded Source

Built Distribution

omark-0.2.1-py3-none-any.whl (46.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page