Skip to main content

Generate a missingness-free protein quantification matrix from multiple diaPASEF runs through deep learning-based scoring.

Project description

Full-DIA

Full-DIA, a freely available software for single-cell diaPASEF data analysis that leverages deep learning to improve proteome coverage, quantitative accuracy and analysis speed. Most notably, Full-DIA is the first to automatically generate a missing-value-free protein matrix under global FDR control, which may offer superior biological interpretability and insight into single-cell proteomics data compared to conventional matrices with missing values.


Contents

Installation
Usage
Output


Installation

We recommend using Conda to create a Python environment for using Full-DIA, whether on Windows or Linux.

  1. Create a Python environment with version 3.9.18.

    conda create -n full_env python=3.12
    conda activate full_env
    
  2. Install the corresponding PyTorch and CuPy packages based on your CUDA version (which can be checked using the nvidia-smi command). Full-DIA requires an NVIDIA GPU with more than 10 GB of VRAM, a minimum of 64 GB RAM, and a high-performance Intel CPU.

  • CUDA-12
    pip install torch==2.3.1 --index-url https://download.pytorch.org/whl/cu121
    conda install cudatoolkit
    
  • CUDA-11
    pip install torch==2.3.1 --index-url https://download.pytorch.org/whl/cu118
    conda install cudatoolkit
    
  1. Install Full-DIA
    pip install full_dia[cuda11] or pip install full_dia[cuda12]
    
  • Alternatively, you can create a Conda environment with Full-DIA in one command:
    conda env create -f https://raw.githubusercontent.com/JianSong2018/full_dia/main/requirements/fulldia_cuda12.yml
    

Usage

full_dia -lib "Absolute path of the spectral library" -ws "Absolute path of the .d folder or a folder containing multiple .d folders"

(Please note that the path needs to be enclosed in quotes if running on a Windows platform.)

  • -lib
    This parameter is used to specify the absolute path of the spectral library. Full-DIA currently supports spectral libraries with the .parquet or .tsv suffix, provided that their column names are consistent with those of the DIA-NN (> v1.9) predicted spectral library. We recommend generating the predicted spectral library using DIA-NN and then converting it to the .parquet format. Refer to this for instructions on how to generate prediction spectral libraries and convert to .parquet format using DIA-NN. Full-DIA supports oxygen modifications on methionine (M) but does not include modifications such as phosphorylation or acetylation. Full-DIA will develop its own predictor capable of forecasting the peptide retention time, ion mobility, and fragmentation pattern. It may also be compatible with other formats of spectral libraries based on requests.

  • -ws
    This parameter specifies the folder that contains multiple .d directories to be analyzed.

Other optional params are list below by entering full_dia -h:

       ******************
       * Full-DIA x.y.z *
       ******************
Usage: full_dia -ws WS -lib LIB

optional arguments for users:
  -h, --help           Show this help message and exit.
  -ws WS               Specify the folder that is .d or contains .d files.
  -lib LIB             Specify the absolute path of a .speclib or .parquet spectra library.
  -out_name OUT_NAME   Specify the folder name of outputs. Default: full_dia.
  -gpu_id GPU_ID       Specify the GPU-ID (e.g. 0, 1, 2) which will be used. Default: 0.

Output

Full-DIA will generate report.log.txt and report.parquet in output folder. The report.parquet contains precursor and protein IDs, as well as plenty of associated information. Most column names are consistent with DIA-NN and are self-explanatory.

  • Protein.Group - inferred proteins. Full-DIA uses IDPicker algorithm to infer proteins.
  • Protein.Ids - all proteins matched to the precursor in the library.
  • Protein.Names - names (UniProt names) of the proteins in the Protein.Group.
  • PG.Quantity.Raw - raw quantity of the Protein.Group.
  • PG.Quantity.Deep - corrected quantity of the Protein.Group.
  • Precursor.Id - peptide seq + precursor charge.
  • Precursor.Charge - the charge of the precursor.
  • Q.Value - run-specific precursor q-value.
  • Global.Q.Value - global precursor q-value.
  • PG.Q.Value - run-specific q-value for the protein group.
  • Global.PG.Q.Value - global q-value for the protein group.
  • Proteotypic - indicates the peptide is specific to a protein.
  • Precursor.Quantity.Raw - raw quantity of the precursor.
  • Precursor.Quantity.Deep - corrected quantity of the precursor.
  • RT - the retention time of the precursor.
  • IM - the ion mobility of the precursor.

Troubleshooting


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

full_dia-1.0.0a1.tar.gz (8.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

full_dia-1.0.0a1-py3-none-any.whl (8.4 MB view details)

Uploaded Python 3

File details

Details for the file full_dia-1.0.0a1.tar.gz.

File metadata

  • Download URL: full_dia-1.0.0a1.tar.gz
  • Upload date:
  • Size: 8.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for full_dia-1.0.0a1.tar.gz
Algorithm Hash digest
SHA256 04c7b43c9cf504dbcd70a7472c1784f91e2d153ab0fb8ba40253a10481dbb163
MD5 7a2f4da90c823bea2c7f8c52acd1c650
BLAKE2b-256 55146af1629a2a7e2560402492e1a114815c445c2a13ee4ea3eaab5f17e7cc66

See more details on using hashes here.

File details

Details for the file full_dia-1.0.0a1-py3-none-any.whl.

File metadata

  • Download URL: full_dia-1.0.0a1-py3-none-any.whl
  • Upload date:
  • Size: 8.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for full_dia-1.0.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 4e82a32ad162c6ed3b0e2345fed9aa7887db196ff4e2cc8e95d6b6fe3498e2f4
MD5 4a5779e276d1ff77622e43ceaea74f4e
BLAKE2b-256 ce7bc617b64028642adcaa00fa65de878e4ef00e83e088da9be33361a4dfaf0e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page