Skip to main content

PICNIC (Proteins Involved in CoNdensates In Cells) is a machine learning-based model that predicts proteins involved in biomolecular condensates.

Project description


PICNIC

PICNIC (Proteins Involved in CoNdensates In Cells) is a machine learning-based model that predicts proteins involved in biomolecular condensates. The first model (PICNIC) is based on sequence-based features and structure-based features derived from Alphafold2 models. Another model includes extended set of features based on Gene Ontology terms (PICNIC-GO). Although this model is biased by the already available annotations on proteins, it provides useful insights about specific protein properties that are enriched in proteins of biomolecular condensate. Overall, we recommend using PICNIC that is an unbiased predictor, and using PICNIC-GO for specific cases, for example for experimental hypothesis generation.

External software

IUPred2A

IUPred2A is a tool that predicts disordered protein regions. It is available for download via the link https://iupred2a.elte.hu/download_new The downloaded archive should be unpacked into the "src/files/" directory.

STRIDE

STRIDE is a software for protein secondary structure assignment Installation guide can be found here https://webclu.bio.wzw.tum.de/stride/

Installation instructions

Requirements

  • Python version 3.10+
  • Download and unpack IUPred2A
    • Add IUPred2A to PYTHONPATH
  • Download and unpack STRIDE
    • Add STRIDE binary to your system PATH

Install external requirements

How to install STRIDE?

A complete installation guide can be found here or simply run the following commands:

$ mkdir stride
$ cd stride
$ curl -OL https://webclu.bio.wzw.tum.de/stride/stride.tar.gz
$ tar -zxf stride.tar.gz
$ make
$ export PATH="$PATH:$PWD"

How to install IUPred2A?

IUPred2A software is available for free only for academic users and it cannot be used for commercial purpose. If you are an academic user, then you can download IUPred2A by filling out the following form here.

# Step 1: Fill out the form above and download the IUPred2A tar ball
$ tar -zxf iupred2a.tar.gz
$ cd iupred2a
$ export PYTHONPATH="$PWD"

PICNIC is available on PyPI

$ python -m pip install picnic_bio

PICNIC officially supports Python 3.10+.

PICNIC is also installable from source

$ git clone git@git.mpi-cbg.de:atplab/picnic.git

Once you have a copy of the source, you can embed it in your own Python package, or install it into your site-packages easily

$ cd picnic
$ python3 -m venv picnic-env
$source picnic-env/bin/activate
(venv) $ python -m pip install .

How to use?

Usage - Using PICNIC from command line

$ picnic <is_automated> <directory_af_models> <uniprot_id> <is_go> --path_fasta_file <file>

usage: PICNIC [-h] [--path_fasta_file PATH_FASTA_FILE]
              is_automated path_af uniprot_id is_go

PICNIC (Proteins Involved in CoNdensates In Cells) is a machine learning-based
model that predicts proteins involved in biomolecular condensates.

positional arguments:
  is_automated          True if automated pipeline (works for proteins with
                        length < 1400 aa, with precalculated Alphafold2 model,
                        deposited to UniprotKB), else manual pipeline
                        (uniprot_id, Alphafold2 model(s) and fasta file are
                        needed to be provided as input)
  path_af               directory with pdb files, created by Alphafold2 for
                        the protein in the format. For smaller proteins ( <
                        1400 aa length) AlphaFold2 provides one model, that
                        should be named: AF-uniprot_id-F1-v{j}.pdb, where j is
                        a version number. In case of large proteins Alphafold2
                        provides more than one file, and all of them should be
                        stored in one directory and named: 'AF-
                        uniprot_id-F{i}-v{j}.pdb', where i is a number of
                        model, j is a version number.
  uniprot_id            protein identifier in UniprotKB (should correspond to
                        the name 'uniprot_id' for Alphafold2 models, stored in
                        directory_af_models)
  is_go                 boolean flag; if 'True', picnic_go score (picnic
                        version with Gene Ontology features) will be
                        calculated, Gene Ontology terms are retrieved in this
                        case from UniprotKB by uniprot_id identifier;
                        otherwise default picnic score will be printed
                        (without Gene Ontology annotation)

options:
  -h, --help            show this help message and exit
  --path_fasta_file PATH_FASTA_FILE
                        directory with sequence file in fasta format

Examples

Run automated pipeline:

$ picnic True notebooks/test_files/Q99720/ Q99720 True

Run manual pipeline:

$ picnic False 'notebooks/test_files/O95613/' 'O95613' False --path_fasta_file 'notebooks/test_files/O95613/O95613.fasta.txt'

Examples of using PICNIC are shown in a jupyter-notebook in notebooks folder. Your working directory should be the project root folder.

Link to paper

DOI: 10.1101/2023.06.01.543229

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

picnic-bio-1.0.0b1.tar.gz (2.3 MB view hashes)

Uploaded Source

Built Distribution

picnic_bio-1.0.0b1-py3-none-any.whl (2.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page