Skip to main content

A tool to discover and annotate tandem protein kinases

Project description

tkp-finder

PyPI - Version PyPI - Python Version

tkp-finder is a CLI tool to discover and annotate tandem protein kinases.

It's based on lXtractor -- a general-purpose library for data mining from sequences and structures. The latter is under active development, so bugs are possible.


Table of Contents

Installation

pip install tkp-finder

License

tkp-finder is distributed under the terms of the MIT license.

Usage

The installation should make the script tkp-finder globally available. The interface has two commands:

The setup command will download and prepare HMM models for annotation.

→ tkp-finder setup --help

Usage: tkp-finder setup [OPTIONS]

  Command to initialize the HMM data needed for TKPs' annotation.

Options:
  -H, --hmm_dir DIRECTORY  Path to a directory to store hmm-related data.
                           [required]
  -d, --download           If True, download the Pfam data from interpro.
  -q, --quiet              Disable verbose output.
  --path_pfam_a FILE       A path to downloaded Pfam-A HMM profiles. By
                           default, if `download` is ``False``,will try to
                           find it within the `hmm_dir`.
  --path_pfam_dat FILE     A path to downloaded Pfam-A (meta)data file. By
                           default, if `download` is ``False``,will try to
                           find it within the `hmm_dir`.
  -h, --help               Show this message and exit.

For the first-time usage, invoke

→ tkp-finder setup -H hmm -d

This will download Pfam-A HMMs and accompanying metadata, and split the models into categories. The resulting directory:

→ tree -L 2 hmm

hmm
├── PF00069.hmm
├── Pfam-A.hmm
├── Pfam-A.hmm.dat
├── pfam_entries.tsv
└── profiles
    ├── Coiled-coil
    ├── Disordered
    ├── Domain
    ├── Family
    ├── Motif
    ├── Repeat
    └── unknown

To dicover and annotate TKPs, refer to tkp-finder find command:

→ tkp-finder find --help

Usage: tkp-finder find [OPTIONS] [FASTA]...

Options:
  -H, --hmm_dir DIRECTORY    Directory with HMM profiles. Expected to contain
                             `profiles` dir and target PK profile
                             (PF00069.hmm). See `tkp-finder setup` on how to
                             prepare this dir.
  -t, --hmm_type TEXT        Which HMM types to use for annotating the
                             discovered TKPs. The names must correspond to
                             folders within he `hmm_dir`.  [default: Family,
                             Domain, Motif]
  -p, --pk_profile FILE      A path to the PK HMM profile. By default, will
                             try to find it within the `hmm_dir`.
  -m, --motif TEXT           A motif to discriminate between PKs and pseudo
                             PKs. This corresponds to the following conserved
                             elements::  (1) b3-Lys(2) aC-helix Glu(3-4-5) HRD
                             motif(6-7-8) DFG motif  [default: KEXXDDXX]
  -o, --output DIRECTORY     Output directory to store the results. Be
                             default, will store within `./tkp-finder`.
  -n, --num_proc INTEGER     The number of cpus for data parallelism: each
                             input fasta will be annotated within separate
                             process. HINT: one may split large fasta files
                             for faster processing.
  -q, --quiet                Disable logging and progress bar
  --pk_map_name TEXT         Use this name for the protein kinase domain.
                             [default: PK]
  --ppk_map_name TEXT        Use this name for pseudo protein kinases.
                             [default: PPK]
  --min_domain_size INTEGER  The minimum number of amino acid residues within
                             a PK domain.  [default: 150]
  --min_domains INTEGER      The number of domains to classify a protein as
                             TKP.
  --timeout INTEGER          For parallel processing, indicate timeout for
                             getting results of a single process.
  -h, --help                 Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tkp_finder-0.2.tar.gz (14.3 kB view hashes)

Uploaded Source

Built Distribution

tkp_finder-0.2-py3-none-any.whl (14.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page