Skip to main content

pflex is a benchmarking toolkit for evaluating CRISPR screen results against biological functional standards. The toolkit computes gene-level and complex-level performance metrics, helping researchers systematically assess the biological relevance and resolution of their CRISPR screening data.

Project description

pFLEX

PyPI Python License Build Build system Lint CLI

Abstract

Genetic networks derived from omics data are a powerful tool for systematic gene function prediction. Performance evaluation of such predictions is crucial to judge the data and computational pipeline to derive the networks, but functional diversity within protein complex or pathway standards often cause hidden evaluation biases. To visualize and mitigate such biases, we recently developed an R package FLEX. Here, we present the FLEX genetic network benchmarking tool as Python library with new and improved functionality. The pFLEX library improves the overall runtime 4.1 to 15.8-fold. It offers additional evaluation metrics that allow for an easy comparison of precision recall performance at the complex or pathway resolution between genetic networks. We demonstrate the utility of pFLEX for evaluating tissue-specific co-essentiality networks and data normalization strategies of the Cancer Dependency Map. This illustrates how different biological module-resolved precision recall metrics in pFLEX enable sensitive and fast evaluation of genetic networks.


Features

  • Precision-recall curve generation for ranked gene lists
  • Evaluation using CORUM complexes, GO terms, and pathways
  • Complex-level resolution analysis and visualization
  • Easy integration into CRISPR screen workflows
  • Packaged DepMap example inputs filtered to CORUM genes

Installation

pFLEX is developed and tested with Python 3.10. We recommend installing it in a dedicated Python 3.10 environment to keep the package and its scientific Python dependencies separate from other projects.

Create venv:

conda create -n p310 python=3.10
conda activate p310
pip install uv

Install pFLEX via pip:

uv pip install pflex

or:

pip install pflex

or install pFLEX via git to develop the package locally:

git clone https://github.com/tyasird/pFLEX.git
cd pFLEX
uv pip install -e .

Usage

Full documentation is available at https://tyasird.github.io/pFLEX/.

Input Data

pFLEX expects each input dataset as a matrix with genes in rows and screens, samples, or cell lines in columns.

Gene ACH-000014 ACH-000219 ACH-000274
A2M -0.125 -0.215 0.065
AATF 0.042 -0.088 -0.016
BCL6 -0.019 0.112 -0.074

CSV, Excel, Parquet, and .p files are supported; .p files are read as Parquet. Parquet is recommended for larger matrices.

The packaged example inputs are real DepMap 25Q2 tissue subsets filtered to genes present in CORUM:

  • skin_cell_lines_corum_genes.parquet: 3,465 genes x 75 cell lines
  • soft_tissue_cell_lines_corum_genes.parquet: 3,465 genes x 46 cell lines

Use flex.example_input_path() to resolve packaged example inputs:

import pflex as flex

inputs = {
    "Skin": {
        "path": flex.example_input_path("skin_cell_lines_corum_genes.parquet"),
        "sort": "high",
        "color": "#4E79A7",
    },
    "Soft Tissue": {
        "path": flex.example_input_path("soft_tissue_cell_lines_corum_genes.parquet"),
        "sort": "high",
        "color": "#F28E2B",
    },
}

Configuration

config = {
    "functional_standard": "CORUM",
    "min_genes_in_complex": 2,
    "min_genes_per_complex_analysis": 2,
    "output_folder": "output",
    "analysis_genes": "shared",
    "jaccard": True,
    "preprocessing": {
        "fill_na": True,
    },
    "corr_function": "numpy_without_mask",
    "per_complex": {
        "n_jobs": 8,
    },
    "plotting": {
        "save_plot": True,
        "output_type": "png",
    },
}

Common choices:

  • functional_standard: "CORUM", "GOBP", "PATHWAY", or a custom .csv path
  • analysis_genes: "shared" or "dataset_specific"
  • sort: "high" or "low" per input dataset
  • preprocessing.fill_na: fill missing values with gene means
  • corr_function: "numpy", "numpy_without_mask", "numba", or "pandas"
  • per_complex.n_jobs: worker count for per-complex analysis

Analysis Flow

flex.initialize(config)
data, common_genes = flex.load_datasets(inputs)
terms, _ = flex.load_functional_standard()

for name, dataset in data.items():
    corr = flex.perform_corr(dataset, config["corr_function"])
    flex.pra(name, corr, is_corr=True)
    flex.pra_percomplex(name, corr, is_corr=True)
    flex.complex_contributions(name)
    flex.mpr_prepare(name)

flex.plot_precision_recall_curve()
flex.plot_auc_scores()
flex.plot_significant_complexes()
flex.plot_percomplex_scatter(n_top=10)
flex.plot_percomplex_scatter_bysize(n_top=10)
flex.plot_complex_contributions()
flex.plot_mpr_summary()
flex.save_results_to_csv()

See the User Guide for a detailed explanation of every input field, configuration key, function, return value, and output.


Quickstart

import pflex as flex

inputs = {
    "Skin": {
        "path": flex.example_input_path("skin_cell_lines_corum_genes.parquet"),
        "sort": "high",
        "color": "#4E79A7",
    },
    "Soft Tissue": {
        "path": flex.example_input_path("soft_tissue_cell_lines_corum_genes.parquet"),
        "sort": "high",
        "color": "#F28E2B",
    },
}

config = {
    "functional_standard": "CORUM",
    "output_folder": "output",
    "analysis_genes": "shared",
    "jaccard": True,
    "preprocessing": {
        "fill_na": True,
    },
    "corr_function": "numpy_without_mask",
}

flex.initialize(config)
data, _ = flex.load_datasets(inputs)

for name, dataset in data.items():
    corr = flex.perform_corr(dataset, config["corr_function"])
    flex.pra(name, corr, is_corr=True)

flex.plot_precision_recall_curve()
flex.plot_auc_scores()

For a runnable full workflow, see src/pflex/examples/basic_usage.py.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pflex-1.1.tar.gz (6.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pflex-1.1-py3-none-any.whl (5.9 MB view details)

Uploaded Python 3

File details

Details for the file pflex-1.1.tar.gz.

File metadata

  • Download URL: pflex-1.1.tar.gz
  • Upload date:
  • Size: 6.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.5

File hashes

Hashes for pflex-1.1.tar.gz
Algorithm Hash digest
SHA256 9b14699fd13b12de8475d432bfe443c693292e4e03ad2e2cf462e5fec214df80
MD5 6736763232754a742d75831753076468
BLAKE2b-256 11f98c6172b7d1f868ecb57be24c0a95a5bec257363102e618b374bc90742cb6

See more details on using hashes here.

File details

Details for the file pflex-1.1-py3-none-any.whl.

File metadata

  • Download URL: pflex-1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.5

File hashes

Hashes for pflex-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3f8e14feba53201b27785b3fa2539b7a5abd9136baf27105aa532908e3d76647
MD5 32be120bf18273e1d32ae8a1df9e1de4
BLAKE2b-256 02d49da49a863b7620e890e3fd28242bb6dae125e6d81d05b00a18d2f2d90cec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page