Skip to main content

Generate a missingness-free protein quantification matrix from multiple diaPASEF runs through deep learning-based scoring.

Project description

Full-DIA

Full-DIA, a freely available software for single-cell diaPASEF data analysis that leverages deep learning to improve proteome coverage, quantitative accuracy and analysis speed. Most notably, Full-DIA is the first to automatically generate a missing-value-free protein matrix under global FDR control, which may offer superior biological interpretability and insight into single-cell proteomics data compared to conventional matrices with missing values.


Contents

Installation
Usage
Output


Installation

We recommend using Conda to create a Python environment for using Full-DIA, whether on Windows or Linux.

  1. Create a Python environment with version 3.9.18.

    conda create -n full_env python=3.12
    conda activate full_env
    
  2. Install the corresponding PyTorch and CuPy packages based on your CUDA version (which can be checked using the nvidia-smi command). Full-DIA requires an NVIDIA GPU with more than 10 GB of VRAM, a minimum of 64 GB RAM, and a high-performance Intel CPU.

  • CUDA-12
    pip install torch==2.3.1 --index-url https://download.pytorch.org/whl/cu121
    conda install cudatoolkit
    
  • CUDA-11
    pip install torch==2.3.1 --index-url https://download.pytorch.org/whl/cu118
    conda install cudatoolkit
    
  1. Install Full-DIA
    pip install full_dia[cuda11] or pip install full_dia[cuda12]
    
  • Alternatively, you can create a Conda environment with Full-DIA in one command:
    conda env create -f https://raw.githubusercontent.com/JianSong2018/full_dia/main/requirements/fulldia_cuda12.yml
    

Usage

full_dia -lib "Absolute path of the spectral library" -ws "Absolute path of the .d folder or a folder containing multiple .d folders"

(Please note that the path needs to be enclosed in quotes if running on a Windows platform.)

  • -lib
    This parameter is used to specify the absolute path of the spectral library. Full-DIA currently supports spectral libraries with the .parquet or .tsv suffix, provided that their column names are consistent with those of the DIA-NN (> v1.9) predicted spectral library. We recommend generating the predicted spectral library using DIA-NN and then converting it to the .parquet format. Refer to this for instructions on how to generate prediction spectral libraries and convert to .parquet format using DIA-NN. Full-DIA supports oxygen modifications on methionine (M) but does not include modifications such as phosphorylation or acetylation. Full-DIA will develop its own predictor capable of forecasting the peptide retention time, ion mobility, and fragmentation pattern. It may also be compatible with other formats of spectral libraries based on requests.

  • -ws
    This parameter specifies the folder that contains multiple .d directories to be analyzed.

Other optional params are list below by entering full_dia -h:

       ******************
       * Full-DIA x.y.z *
       ******************
Usage: full_dia -ws WS -lib LIB

optional arguments for users:
  -h, --help           Show this help message and exit.
  -ws WS               Specify the folder that is .d or contains .d files.
  -lib LIB             Specify the absolute path of a .speclib or .parquet spectra library.
  -out_name OUT_NAME   Specify the folder name of outputs. Default: full_dia.
  -gpu_id GPU_ID       Specify the GPU-ID (e.g. 0, 1, 2) which will be used. Default: 0.

Output

Full-DIA will generate report.log.txt and report.parquet in output folder. The report.parquet contains precursor and protein IDs, as well as plenty of associated information. Most column names are consistent with DIA-NN and are self-explanatory.

  • Protein.Group - inferred proteins. Full-DIA uses IDPicker algorithm to infer proteins.
  • Protein.Ids - all proteins matched to the precursor in the library.
  • Protein.Names - names (UniProt names) of the proteins in the Protein.Group.
  • PG.Quantity.Raw - raw quantity of the Protein.Group.
  • PG.Quantity.Deep - corrected quantity of the Protein.Group.
  • Precursor.Id - peptide seq + precursor charge.
  • Precursor.Charge - the charge of the precursor.
  • Q.Value - run-specific precursor q-value.
  • Global.Q.Value - global precursor q-value.
  • PG.Q.Value - run-specific q-value for the protein group.
  • Global.PG.Q.Value - global q-value for the protein group.
  • Proteotypic - indicates the peptide is specific to a protein.
  • Precursor.Quantity.Raw - raw quantity of the precursor.
  • Precursor.Quantity.Deep - corrected quantity of the precursor.
  • RT - the retention time of the precursor.
  • IM - the ion mobility of the precursor.

Troubleshooting


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

full_dia-1.0.2.tar.gz (8.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

full_dia-1.0.2-py3-none-any.whl (8.6 MB view details)

Uploaded Python 3

File details

Details for the file full_dia-1.0.2.tar.gz.

File metadata

  • Download URL: full_dia-1.0.2.tar.gz
  • Upload date:
  • Size: 8.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for full_dia-1.0.2.tar.gz
Algorithm Hash digest
SHA256 2af62f9839c1217ea9ccd543ec106f5505cd6b18d31e20e2a54ef12c17bedeb9
MD5 fd0e744ac5bd12fc48ded5f61113b2e6
BLAKE2b-256 dfbdfd960ad3af294cbcf1daa50a3274abd992961bd113ed2fe27831c62ea76a

See more details on using hashes here.

File details

Details for the file full_dia-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: full_dia-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 8.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for full_dia-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ef19a8060105878ff73a53c216369f9f7dde1123f612587d079595c4f69645cd
MD5 5c01809db7dbd3e0c746dadee88043c3
BLAKE2b-256 eb5115f071dcd3469a62aaa6ad8cedf3ee10245ecfca4ea53754182bb6e3f6a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page