Skip to main content

No project description provided

Project description

annotate-e

Annotator for enzymes using an ensemble of tools to annoatte function to genes.

overview

Install:

pip install annotatee

Note! requires enzymetk to also be installed: enzymetk.

Run:

Setup:

Download sequences for your database or use ours. e.g. the fasta file from UniProt SwissProt with annotations.

Arguments:

Pass a fasta file as the database and a fasta file that you seek to search against the database.

The database will first get searched for existing anontations, and otherwise ML methods will be used.

Example command:

annotatee input_df.csv Uniprot_reviewed_catalytic_activity_06032025.fasta --methods blast --output-folder output/ --run-name omgprot50

Help

annotattee --help

 Usage: annotatee [OPTIONS] QUERY_FASTA DATABASE                                                                                                                                                       
                                                                                                                                                                                                       
 Find similar proteins based on sequence or structural identity in order to annotate these using  BLAST and FoldSeek. Also annotate with ProteInfer and CLEAN.                                         
                                                                                                                                                                                                       
╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    query_fasta      TEXT  Full path to query fasta or csv (note have simple IDs otherwise we'll remove all funky characters.) [default: None] [required]                                          │
│ *    database         TEXT  Full path to database fasta (for BLAST and FoldSeek) [default: None] [required]                                                                                         │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --output-folder                           TEXT  Where to store results (full path!) [default: Current Directory]                                                                                    │
│ --run-name                                TEXT  Name of the run [default: annotatee]                                                                                                                │
│ --clean-dir                               TEXT  Directory (full path) to CLEAN - if not using the default)                                                                                          │
│ --proteinfer-dir                          TEXT  Directory (full path) to proteiner - if not using the default)                                                                                      │
│ --run-method                              TEXT  Run method (filter or complete) i.e. filter = only annotates with the next tool those that couldn't be found. [default: complete]                   │
│ --keep-dups             --no-keep-dups          Whether or not to keep multiple predicted values if False only the top result is retained. [default: no-keep-dups]                                  │
│ --args-blast                              TEXT  comma separated list (no spaces) of arguments to pass to Diamond BLAST                                                                              │
│ --args-foldseek                           TEXT  comma separated list (no spaces) of arguments to pass to foldseek                                                                                   │
│ --args-proteinfer                         TEXT  comma separated list (no spaces) of arguments to pass to ProteInfer                                                                                 │
│ --args-clean                              TEXT  comma separated list (no spaces) of arguments to pass to CLEAN                                                                                      │
│ --methods                                 TEXT  comma separated list (no spaces) of methods to run (e.g. could just pass ['foldseek', 'proteinfer']) to pass to CLEAN                               │
│ --foldseek-db                             TEXT  Database for foldseek to override fasta before (e.g. path to all pdbs as per foldseek docs.)                                                        │
│ --id-col                                  TEXT  id column in df if df passed (csv) rather than fasta [default: id]                                                                                  │
│ --seq-col                                 TEXT  Database for foldseek to override fasta before (e.g. path to all pdbs as per foldseek docs.) [default: seq]                                         │
│ --install-completion                            Install completion for the current shell.                                                                                                           │
│ --show-completion                               Show completion for the current shell, to copy it or customize the installation.                                                                    │
│ --help                                          Show this message and exit.                                                                                                                         │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

annotatee-0.0.1.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

annotatee-0.0.1-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file annotatee-0.0.1.tar.gz.

File metadata

  • Download URL: annotatee-0.0.1.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for annotatee-0.0.1.tar.gz
Algorithm Hash digest
SHA256 21168512cb80e180d48bb69818869bcea8f1671ced106db32dc48a1c0c3a2b9c
MD5 06c724edae9880bb6623a8d46625254b
BLAKE2b-256 3e05a75c1c0f2f450c3600d9403c08f089e2b595b2d435e95ac4c98cbfd4d3cf

See more details on using hashes here.

File details

Details for the file annotatee-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: annotatee-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for annotatee-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e082f46cb49bc24e62185aa965bc12a1dba04aaec15defbd22c3c847682455e0
MD5 d45506cd08d8428ccb661397048945ba
BLAKE2b-256 eaefe891f466dde3a8f1b3c8d94c4f5c4a7ca547de413f1d1974fca0eb616b7e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page