OMArk - Proteome quality assesment based on OMAmer placements

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

OMArk

OMArk is a software of proteome (protein-coding gene repertoire) quality assessment. It provides measure of proteome completeness, characterize all protein coding genes in the light of existing homologs, and identify the presence of contamination from other species. OMArk rely on the OMA orthology database, from which it exploits orthology relationships, and on the OMAmer software for fast placement of all proteins into gene families.

Installation

You can use OMArk by installing the package through PyPi:

pip install omark

Or by cloning this repository and installing it manually with your Python installer.

Example command from the git directory: python setup.py install or pip install .

You can then use it on your Python environment by calling it as a command line tool. OMArk rely on an OMAmer database to run. For all OMArk features to work correctly, it is better for this database to cover a wide range of species. We recommend using one constructed from the whole OMA database. You can download one manually on this link : - File :OMAmerDB.tar.gz

Usage

Required arguments: -f (--file), -d (--database)

usage: omark [-h] -f FILE -d DATABASE [-o OUTPUTFOLDER] [-t TAXID] [-of OG_FASTA] [-i ISOFORM_FILE] [-v]

Arguments

Flag	Default	Description
`-f` `--file`		Path to an OMAmer search output file
`-d` `--db`		Path to an OMAmer database
`-o` `--outputFolder`	./omark_output/	Path to the folder into which OMArk results will be output. OMArk will create it if it does not exist.
`-t` `--taxid`	None	NCBI taxid corresponding to the input proteome (Optional).
`-of` `--og_fasta`	None	The original proteomes file. Provide if you want optional FASTA file to be outputted by OMArk (Sequences by categories, sequences by detected species, etc)
`-i`, `--isoform_file`	None	A semi-colon separated file, listing all isoforms of each genes, with one gene per line. Use if your input proteome include more than one protein per gene.
`-v` `--verbose`	False	Turn on logging information about OMArk progress.

Output

A default OMAmer output consists of 4 files with the same name but different extensions.

OMArk output the main results of the analysis in two complementary files: a machine-readable one, identified by its .sum extension, and a human-readable summary ending with _detailed_summary.txt. These commented files reports:

The reference lineage that was used for quality assessment
The number of conserved Hierarchical Orthologous Groups (HOGs) used for completeness assessment
The completeness assessment results (Single, Duplicated, Missing)
The whole proteome quality assessment results (Consistent placements, Inconsistent Placements, Contaminants, Missing genes)
The species and contaminant detected in the proteome

The file with the .pdf extension is a graphical representation of the completeness and whole proteome quality assesment.

The file with the .tax extension indicate: the closest taxonomic lineage in the OMA database and the selected reference lineage.

The file with the .omq extension recapitulates the HOGs identifier used in the completeness analysis, and the category to which they were attributed.

The file with the .ump extensions recapitulates the identifier for all proteins that were not mapped in OMAmer.

Example

You can run OMArk on an example files stored on the example_data folder. Remember to download an OMAmer databqse as indicated in the installation section.

First: you can run OMAmer on the proteome FASTA. (For more documentation about installing OMAmer: see its Github This step should take less than 15 minutes.

omamer search --db  LUCA.h5 --query example_data/UP000005640_9606.fasta  --score sensitive --out example_data/UP000005640_9606.omamer

Then, use OMArk (Should take less than 10 minutes) after creating an empty output folder:

mkdir example_data/omark_output

omark -f example_data/UP000005640_9606.omamer -d LUCA.h5 -o example_data/omark_output

You can now explore OMArk results in the omark_output folder

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.3.0

Oct 18, 2023

0.2.5

Feb 28, 2023

0.2.4

Feb 23, 2023

0.2.3

Feb 10, 2023

0.2.2

Oct 12, 2022

This version

0.2.1

Aug 16, 2022

0.2.0

Jul 6, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omark-0.2.1.tar.gz (36.8 kB view hashes)

Uploaded Aug 16, 2022 Source

Built Distribution

omark-0.2.1-py3-none-any.whl (46.2 kB view hashes)

Uploaded Aug 16, 2022 Python 3

Hashes for omark-0.2.1.tar.gz

Hashes for omark-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`e82c4d13b9f6ba877ffbff86257e7fd171a88de49b11aef3d3b110c98ca7b00f`
MD5	`de2c6e31164de7eed8fb88177189493d`
BLAKE2b-256	`35d8cae1ab63f0ed4ea495683d511db105873e82bed2d46b7176157bf0c36c9d`

Hashes for omark-0.2.1-py3-none-any.whl

Hashes for omark-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f1af4d94e5ffa5d9b24446182719b68bf18e0eff411de032d2146608301b1015`
MD5	`1ddb40e86fec80e12c073078b939c4dc`
BLAKE2b-256	`94ed0dd5652abb4f241f798d27ed4647cf0c1351fdcf9c9d887ab65dcc362e03`