A simple python package for annotating protein sequences
Project description
AnnoPRO
AnnoPRO generation
- step 1: input proteins sequeces
- step 2: features extraction by Profeat
- step 3: Feature pairwise distance calculation --> cosine, correlation, jaccard
- Step4: Feature 2D embedding --> umap, tsne, mds
- Step5: Feature grid arrangement --> grid, scatter
- Step5: Transform --> minmax, standard
AnnoPRO architecture
- Encoding layers: Protein features was learned by CNNs and Protein similarity was learned by FCs.
- Decoding layers: LSTMs
Installation
- install compilers
dependency lapjv
requires g++
or other Cpp compiler, and annopro contains fortran extensional module and require gfortran
or other fortran compiler. Here is an example of installing them on Ubuntu.
sudo apt install gcc g++ gfortran
# or you can install by conda in your virtual env
# command name is like
# gcc: x86_64-conda_cos6-linux-gnu-cc
# g++: x86_64-conda_cos6-linux-gnu-c++
# gfortran: x86_64-conda_cos6-linux-gnu-gfortran
conda install gcc_linux-64 gxx_linux-64 gfortran_linux-64
- install annopro
You can install it directly by pip install annopro
or install from source code as following steps.
But you should install numpy first if you install it from source code because we need numpy.f2py
to help us build fortran extension submodule.
git clone https://github.com/idrblab/AnnoPRO.git
cd AnnoPRO
conda create -n annopro python=3.8
conda activate annopro
pip install .
Usage
- Use it as a terminal command. For all parameters, type
annopro -h
.
annopro -i test_proteins.fasta -o output
- Use it as a python executable package
python -m annopro -i test_proteins.fasta -o output
- Use it as a library to integrated with your project.
from annopro import main
main("test_proteins.fasta", "output")
The result is displayed in the ./output/bp(cc,mf)_result.csv
.
Notice: if you use annopro for the first time, annopro will automatically download required resources when they are used (lazy download mechanism)
Possible problems
- pip is looking at multiple versions of XXX to determine which version is compatible with other requirements. this could take a while.
Your pip is latest, back to old version such as 20.2, or just add --use-deprecated=legacy-resolver
param.
- Argument mismatch when building source code.
Because your gfortran is latest and imcompatible,
edit setup.py and uncomment -fallow-argument-mismatch
or
just use a earlier version of gfortran such as 4.8.5, 8.4
Contact
If any questions, please create an issue on this repo, we will deal with it as soon as possible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file annopro-0.2.tar.gz
.
File metadata
- Download URL: annopro-0.2.tar.gz
- Upload date:
- Size: 23.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d92cfb008be38e4b0778c75b28beaedd02d17073eeb116a6142caa287b90700c |
|
MD5 | 8dcbacd762a00621d5a04127301fda95 |
|
BLAKE2b-256 | 4e461d906b9171573bec04cdccb92d5ade222135240a297d5396a40862d7f108 |
File details
Details for the file annopro-0.2-py3-none-any.whl
.
File metadata
- Download URL: annopro-0.2-py3-none-any.whl
- Upload date:
- Size: 26.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bdf2b3f02e16bd50a4966c75b13903c689abe5adb99bbf5d5facc7d5ee1ec391 |
|
MD5 | a747b08b19cb93ff8c6f98d911de8e2e |
|
BLAKE2b-256 | 884580a17409579a5f87bed12602e5a80097de2695af3ad675c3a54c411468fb |