Skip to main content

A simple python package for annotating protein sequences

Project description

AnnoPRO

AUR python pypi keras PMID

AnnoPRO generation

  • step 1: input proteins sequeces
  • step 2: features extraction by Profeat
  • step 3: Feature pairwise distance calculation --> cosine, correlation, jaccard
  • Step4: Feature 2D embedding --> umap, tsne, mds
  • Step5: Feature grid arrangement --> grid, scatter
  • Step5: Transform --> minmax, standard image

AnnoPRO architecture

  • Encoding layers: Protein features was learned by CNNs and Protein similarity was learned by FCs.
  • Decoding layers: LSTMs image

Installation

  1. install compilers

dependency lapjv requires g++ or other Cpp compiler, and annopro contains fortran extensional module and require gfortran or other fortran compiler. Here is an example of installing them on Ubuntu.

sudo apt install gcc g++ gfortran
# or you can install by conda in your virtual env
# command name is like 
# gcc: x86_64-conda_cos6-linux-gnu-cc
# g++: x86_64-conda_cos6-linux-gnu-c++
# gfortran: x86_64-conda_cos6-linux-gnu-gfortran
conda install gcc_linux-64 gxx_linux-64 gfortran_linux-64
  1. install annopro

You can install it directly by pip install annopro or install from source code as following steps. But you should install numpy first if you install it from source code because we need numpy.f2py to help us build fortran extension submodule.

git clone https://github.com/idrblab/AnnoPRO.git
cd AnnoPRO
conda create -n annopro python=3.8
conda activate annopro
pip install .

Usage

  • Use it as a terminal command. For all parameters, type annopro -h.
annopro -i test_proteins.fasta -o output
  • Use it as a python executable package
python -m annopro -i test_proteins.fasta -o output
  • Use it as a library to integrated with your project.
from annopro import main
main("test_proteins.fasta", "output")

The result is displayed in the ./output/bp(cc,mf)_result.csv.

Notice: if you use annopro for the first time, annopro will automatically download required resources when they are used (lazy download mechanism)

Possible problems

  1. pip is looking at multiple versions of XXX to determine which version is compatible with other requirements. this could take a while.

Your pip is latest, back to old version such as 20.2, or just add --use-deprecated=legacy-resolver param.

  1. Argument mismatch when building source code.

Because your gfortran is latest and imcompatible, edit setup.py and uncomment -fallow-argument-mismatch or just use a earlier version of gfortran such as 4.8.5, 8.4

Contact

If any questions, please create an issue on this repo, we will deal with it as soon as possible.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

annopro-0.2.tar.gz (23.8 kB view hashes)

Uploaded Source

Built Distribution

annopro-0.2-py3-none-any.whl (26.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page