Skip to main content

A tool to assess protein coding gene annotation

Project description

PSAURON

install with bioconda PyPI Downloads

License Pypi Release codecov

PSAURON is a machine learning model for rapid assessment of protein coding gene annotation.

M. J. Sommer, A. V. Zimin, S. L. Salzberg, PSAURON: a tool for assessing protein annotation across a broad range of species. NAR Genom. Bioinform. 7, lqae189 (2025). https://academic.oup.com/nargab/article/7/1/lqae189/7944703

Installation

$ pip install psauron

PSAURON can run on GPU or CPU and depends on PyTorch, which can be annoying :disappointed:

It may help to install PSAURON in a virtual enviromment :slightly_smiling_face:

$ python3 -m venv /path/to/new/virtual/environment
$ source /path/to/new/virtual/environment/bin/activate
$ pip install psauron

Quickstart

PSAURON takes as input a single multi-fasta file and outputs a .csv with scores for all reading frames.

By default, PSAURON uses all six frames of the nucleotide coding sequences (CDS).

$ psauron -i path_to_your_CDS.fa -o path_to_output.csv

You may also provide a multi-fasta with protein (amino acid) sequence.

$ psauron -i path_to_your_protein.faa -o path_to_output.csv -p 

...or request PSAURON score only the in-frame nucleotide sequence.

$ psauron -i path_to_your_CDS.fa -o path_to_output.csv -s

Note: internal stop codons are ignored by PSAURON. A high PSAURON score does not guarantee a sequence contains a valid ORF. This is intended behavior, as alternate frame scores are used by default to boost the power of the model.

Usage

psauron [-h] -i INPUT_FASTA [-o OUTPUT_PATH] [-m MINIMUM_LENGTH] [-e EXCLUDE] [--inframe INFRAME] [--outframe OUTFRAME] [-c] [-s] [-p] [-v]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FASTA, --input-fasta INPUT_FASTA
                        REQUIRED path to FASTA with spliced CDS sequence or protein sequence. A spliced CDS fasta can be created from a GTF/GFF and a reference FASTA by using gffread.
  -o OUTPUT_PATH, --output-path OUTPUT_PATH
                        OPTIONAL path to output results file, default=./psauron_score.csv
  -m MINIMUM_LENGTH, --minimum-length MINIMUM_LENGTH
                        OPTIONAL exclude all proteins shorter than m amino acids, default=5
  -e EXCLUDE, --exclude EXCLUDE
                        OPTIONAL exclude any CDS where FASTA description contains given text (case invariant), e.g. "hypothetical", default=None
  --inframe INFRAME     OPTIONAL probability threshold used to determine final psauron score, in-frame, higher number decreases sensitivity and increases specificity, default=0.5, range=[0,1]
  --outframe OUTFRAME   OPTIONAL probability threshold used to determine final psauron score, out-of-frame, higher number increases sensitivity and decreases specificity, default=0.5, range=[0,1]
  -c, --use-cpu         OPTIONAL set -c to force usage of CPU instead of GPU, default=False
  -s, --single-frame    OPTIONAL set -s to score only the in-frame CDS, which may lower accuracy of the model, default=False
  -p, --protein         OPTIONAL set -p if your FASTA contains amino acid protein sequence, which may lower accuracy of the model, default=False
  -v, --verbose         OPTIONAL set -v for verbose output with progress bars etc., default=False

 -i INPUT_FASTA, REQUIRED path to FASTA with spliced CDS sequence. This fasta can be created from a GTF/GFF and a reference FASTA by using gffread.

Example gffread commands to get CDS FASTA:

gffread -x CDS_FASTA.fa -g genome.fa input.gff
gffread -x CDS_FASTA.fa -g genome.fa input.gtf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psauron-1.1.1.tar.gz (751.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

psauron-1.1.1-py3-none-any.whl (749.0 kB view details)

Uploaded Python 3

File details

Details for the file psauron-1.1.1.tar.gz.

File metadata

  • Download URL: psauron-1.1.1.tar.gz
  • Upload date:
  • Size: 751.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for psauron-1.1.1.tar.gz
Algorithm Hash digest
SHA256 73d996308db81b1d88f62c35854cc46c1e336c19743e961e4f2e37e3ae457fd1
MD5 7cbd9ce9471d8ae17f8b169122eea290
BLAKE2b-256 ec536d34d7200e4b2988cbc64530f8657dcd0f1fd691bc61f88fdbc2585da781

See more details on using hashes here.

File details

Details for the file psauron-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: psauron-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 749.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for psauron-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ada40a7ce9d0be312dbd1122e2a36a501b19eeff17a7d1f9fddfd9119f99e9c1
MD5 fff2c2b2091c47302620d5bb1dcddbd2
BLAKE2b-256 32d6f02c56499544581bbcc9bb9cb793622c4dd7eecc3e377e8acb4bac9ec952

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page