Skip to main content

PICNIC (Proteins Involved in CoNdensates In Cells) is a machine learning-based model that predicts proteins involved in biomolecular condensates.

Project description


PICNIC (Proteins Involved in CoNdensates In Cells)

Build Status Coverage Status PyPI Version PyPI Downloads Nat Commun 15, 10668 (2024) Python Versions License

PICNIC (Proteins Involved in CoNdensates In Cells) is a machine learning-based model that predicts proteins involved in biomolecular condensates. The first model (PICNIC) is based on sequence-based features and structure-based features derived from Alphafold2 models. Another model includes extended set of features based on Gene Ontology terms (PICNIC-GO). Although this model is biased by the already available annotations on proteins, it provides useful insights about specific protein properties that are enriched in proteins of biomolecular condensate. Overall, we recommend using PICNIC that is an unbiased predictor, and using PICNIC-GO for specific cases, for example for experimental hypothesis generation.

External software

IUPred2A

IUPred2A is a tool that predicts disordered protein regions. It is available for download via the link https://iupred2a.elte.hu/download_new The downloaded archive should be unpacked into the "src/files/" directory.

STRIDE

STRIDE is a software for protein secondary structure assignment Installation guide can be found here https://webclu.bio.wzw.tum.de/stride/

Installation instructions

A binary installer for the latest released version is available at the Python Package Index (PyPI).

Requirements

  • Python versions >=3.9,<3.13
  • Download and unpack IUPred2A
    • Add IUPred2A to PYTHONPATH
  • Download and unpack STRIDE
    • Add STRIDE binary to your system PATH

Install external requirements

How to install STRIDE?

A complete installation guide can be found here or simply run the following commands:

mkdir stride
cd stride
curl -OL https://webclu.bio.wzw.tum.de/stride/stride.tar.gz
tar -zxf stride.tar.gz
make
export PATH="$PATH:$PWD"

How to install IUPred2A?

IUPred2A software is available for free only for academic users and it cannot be used for commercial purpose. If you are an academic user, then you can download IUPred2A by filling out the following form here.

# Step 1: Fill out the form above and download the IUPred2A tar ball
tar -zxf iupred2a.tar.gz
cd iupred2a
export PYTHONPATH="$PWD"

PICNIC is available on PyPI

PICNIC officially supports Python versions >=3.9,<3.13.

python3 --version
Python 3.11.5

python3 -m venv picnic-env
source picnic-env/bin/activate
(picnic-env) % python -m pip install --upgrade pip
(picnic-env) % python -m pip install picnic_bio

PICNIC is also installable from source

git clone git@git.mpi-cbg.de:atplab/picnic.git

Once you have a copy of the source, you can embed it in your own Python package, or install it into your site-packages easily

cd picnic
python3 -m venv picnic-env
source picnic-env/bin/activate
(picnic-env) % python -m pip install --upgrade pip
(picnic-env) % python -m pip install .

How to install PICNIC using Conda?

There isn't any binary installer available on Conda yet. Though it is possible to install PICNIC within a virtual Conda environment.

Please note that in a conda environment you have to pre-install catboost, before installing picnic-bio itself, otherwise the installation will fail when compiling the catboost package from source code. Also it is recommended to use and set up conda-forge to fetch pre-compiled versions of catboost.

We have documented how to get around the catboost installation issue.

conda config --add channels conda-forge
conda config --set channel_priority strict

# Choose one of the supported Python versions, when creating the Conda environment: >=3.9,<3.13
# conda create -n myenv python=[3.9, 3.10, 3.11, 3.12] catboost
# e.g.
conda create -n myenv python=3.11 catboost
conda activate myenv
(myenv) % python -m pip install picnic_bio

How to use?

Usage - Using PICNIC from command line

picnic <is_automated> <path_af> <protein_id> <is_go> --path_fasta_file <file>

usage: PICNIC [-h] [--path_fasta_file PATH_FASTA_FILE]
              is_automated path_af protein_id is_go

PICNIC (Proteins Involved in CoNdensates In Cells) is a machine learning-based
model that predicts proteins involved in biomolecular condensates.

positional arguments:
  is_automated          True if automated pipeline (works for proteins with
                        length < 1400 aa, with precalculated Alphafold2 model,
                        deposited to UniprotKB), else manual pipeline
                        (protein_id, Alphafold2 model(s) and fasta file are
                        needed to be provided as input)
  path_af               directory with pdb files, created by Alphafold2 for
                        the protein in the format. For smaller proteins ( <
                        1400 aa length) AlphaFold2 provides one model, that
                        should be named: AF-protein_id-F1-v{j}.pdb, where j is
                        a version number. In case of large proteins Alphafold2
                        provides more than one file, and all of them should be
                        stored in one directory and named: 'AF-
                        protein_id-F{i}-v{j}.pdb', where i is a number of
                        model, j is a version number.
  protein_id            protein identifier in UniprotKB (should correspond to
                        the name 'protein_id' for Alphafold2 models, stored in
                        directory_af_models)
  is_go                 boolean flag; if 'True', picnic_go score (picnic
                        version with Gene Ontology features) will be
                        calculated, Gene Ontology terms are retrieved in this
                        case from UniprotKB by protein_id identifier;
                        otherwise default picnic score will be printed
                        (without Gene Ontology annotation)

options:
  -h, --help            show this help message and exit
  --path_fasta_file PATH_FASTA_FILE
                        directory with sequence file in fasta format

Examples

Run automated pipeline for a given UniProt Id:

picnic True notebooks/test_files/Q99720/ Q99720 True

Run manual pipeline for a given UniProt Id:

picnic False 'notebooks/test_files/O95613/' 'O95613' False --path_fasta_file 'notebooks/test_files/O95613/O95613.fasta.txt'

Run manual pipeline for your own protein sequence called MY_PROTEIN, which has no reference to UniProt:

picnic False 'notebooks/test_files/MY_PROTEIN/' 'MY_PROTEIN' False --path_fasta_file 'notebooks/test_files/MY_PROTEIN/my_protein.fasta'

Examples of using PICNIC are shown in a jupyter-notebook in notebooks folder.

How to run the provided Jupyter notebook?

Examples of how to use and run PICNIC are shown in a provided Jupyter notebook. The notebook can be found under the notebooks folder.

What is Jupyter Notebook?

Please read documentation here.

How to create a virtual environment and install all required Python packages.

Create a virtual environment by executing the command venv:

python -m venv /path/to/new/virtual/environment
# e.g.
python -m venv my_jupyter_env

Then install the classic Jupyter Notebook with:

source my_jupyter_env/bin/activate

pip install notebook

Also install picnic-bio from source in the same virtual environment...

pip install .

How to Launch Jupyter Notebook from Your Terminal?

In your terminal source the previously created virtual environment...

source my_jupyter_env/bin/activate

Launch Jupyter Notebook...

jupyter notebook

Open the example notebook called 'picnic_examples.ipynb' under the notebooks folder.

Publication

PICNIC accurately predicts condensate-forming proteins regardless of their structural disorder across organisms. Anna Hadarovich, Hari Raj Singh, Soumyadeep Ghosh, Maxim Scheremetjew, Nadia Rostam, Anthony A. Hyman & Agnes Toth-Petroczy. Nature Communications volume 15, Article number: 10668 (2024). doi: 10.1038/s41467-024-55089-x. PMID: 39663388.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

picnic_bio-1.0.1.tar.gz (2.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

picnic_bio-1.0.1-py3-none-any.whl (2.3 MB view details)

Uploaded Python 3

File details

Details for the file picnic_bio-1.0.1.tar.gz.

File metadata

  • Download URL: picnic_bio-1.0.1.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for picnic_bio-1.0.1.tar.gz
Algorithm Hash digest
SHA256 0268710bb90c5fec7975429d928f3c536bc326000802ea1562a4d2b0d40568f1
MD5 77293e7fa05e2c0ffbcaff6509a91c9b
BLAKE2b-256 520f2564076dbaab63a0a303d23c461cc51ef53fed1a88c77ff85b15efc3feb8

See more details on using hashes here.

File details

Details for the file picnic_bio-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: picnic_bio-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 2.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for picnic_bio-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2bbea2bbab7f0a525d939b5c1dcc7ef0de3855ef6386645a2ed27ffe4f85ac63
MD5 7a98120f02d84350d89e99ca697ed971
BLAKE2b-256 0734c27715c3f10448b09a889afb708f2da8b40e8276af5f977d6742f921cca3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page