Skip to main content

Flowsets is a revolutionary way to analyse and visualize time-dependent data using fuzzy sets.

Project description

FlowSets

Analysis and Visualization of Expression Patterns with Fuzzy Sets as FlowSets


GitHub top language GitHub release (latest by date)

FlowSets won the best poster award at ISMB/ECCB 2023 in the BioVis-Track!


Contact


Overview

FlowSets is a Python package for visualizing and analyzing gene expression patterns using fuzzy set theory. It enables the identification and visualization of gene expression flows across experimental conditions or clusters, and supports pathway enrichment analysis for genes according to a membership following specific expression patterns.


Install

You can install FlowSets using pip:

pip install flowsets

Quick Start Example

from flowsets import *

# Read in data as polars dataframe
data = pl.read_csv(
    'small_example/deseq2_results_25deg_all_comparisons_cleaned.csv',
    null_values=['NA'],
    schema={
        "baseMean": pl.Float32,
        "log2FoldChange": pl.Float32,
        "lfcSE": pl.Float32,
        "stat": pl.Float32,
        "pvalue": pl.Float32,
        "padj": pl.Float32,
        "comparison": pl.Utf8,
        "gene_id": pl.Utf8
    }
)

# Fuzzify the log2FoldChange values for each gene and comparison
# Here all states are fuzzified with the same 
explDFWide, mfFuzzy = LegacyFuzzifier.fuzzify(
    data, #df
    stepsize=0.01,
    symbol_column="gene_id", # column name refering to feature
    meancolName="log2FoldChange", # column name refering to signal
    clusterColName="comparison", # column name refering to state
    mfLevels = ["strong_down","down","neutral","up", "strong_up"], # linguistic variables which should be created
    centers=[-2, -1, 0, 1, 2], # centers for the fuzzy sets
    sdcolName=None, exprcolName=None, # these parameters are not in use, they are meant for single cell
)
# Create a FlowAnalysis (FlowSets) object for the fuzzified data
# The series is defined by tuples with the name in dataframe (clusterColName) and displayed name in FlowSets
def_series = (
    ("HSF1.KD vs Wildtype",'KO1 vs WT'), 
    ("Double.KDKO vs Wildtype",'KO1+2 vs WT'),
    ("MSN24.KO vs Wildtype",'KO2 vs WT')
)
fa = FlowAnalysis(explDFWide, "gene_id", def_series, mfFuzzy)

# Plot the flow memberships for all genes
fa.plot_flows(figsize=(15, 10), outfile="./small_example/plots/complete_flow.png")

Visualize only Specific Gene Sets

solis_genes = ["YAL005C", "YBR101C", "YDR171W", "YDR214W", "YDR258C", "YFL016C", "YGR142W", "YLL024C", "YLL026W", "YLR216C", "YMR186W", "YNL007C", "YNL064C", "YNL281W", "YOR027W", "YOR298C-A", "YPL240C", "YPR158W"]

fa.plot_flows(genes=solis_genes, title="Solis et al. 2016 - KO1 dependent genes", figsize=(10, 8), outfile="./small_example/plots/geneset_flow.png")

Pattern Search and Pathway Analysis

# Find genes with specific flow patterns and perform pathway analysis
relFlow = fa.flow_finder(
    ["?","?"], 
    minLevels=[None,None,"down"], 
    maxLevels=["down","down","up"], 
    verbose=False
    )

fa.plot_flow_memberships(
    use_edges=relFlow, 
    color_genes=solis_genes, 
    outfile="./small_example/plots/pattern_memberships.png"
    )

pw_file = "small_example/goslim.gmt"

pwScores = fa.analyse_pathways(
    use_edges=relFlow, 
    genesets_file=pw_file, 
    additional_genesets=[("solis annotated genes", solis_genes)]
    )

pwScores_signif = pwScores.sort_values("pw_coverage_pval", ascending=True).head(20)
display(pwScores_signif)

# Show as ORA plot
fa.plotORAresult(pwScores_signif, "GOslim", numResults=10, figsize=(6,6), outfile="./small_example/plots/goslim_pathway_analysis.png")

Paper Examples

other Examples


Method Summary

  • (Differential) Expression data are read in for each gene and each cluster (or state).
  • Values are fuzzified by user-defined membership classes, min-max scaling, or quantiles.
  • Relevant flows are defined using a simple grammar with flow_finder, specifying desired differences between levels.
  • For each flow or group of flows, gene set enrichment analysis is performed. Gene sets are binned by size, and for each bin, flow memberships are calculated. A z-score is computed for each gene set (relative to others in the bin), which is transformed into a p-value for all positive-z-score (overrepresented) gene sets.

A more detailed description is available in the working copy of our manuscript article.


License

This project is licensed under the MIT License.


Citation

If you use FlowSets in your research, please cite our manuscript (see WorkingVersionFlowsets.pdf).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowsets-0.0.9.tar.gz (40.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowsets-0.0.9-py3-none-any.whl (33.0 kB view details)

Uploaded Python 3

File details

Details for the file flowsets-0.0.9.tar.gz.

File metadata

  • Download URL: flowsets-0.0.9.tar.gz
  • Upload date:
  • Size: 40.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for flowsets-0.0.9.tar.gz
Algorithm Hash digest
SHA256 702f73122ca7086f87f76be06bc6a425002061da19211fc74d30b235f62d2895
MD5 868710bee351874d3058b087b3ff3798
BLAKE2b-256 3697d845a6eb03b1b29ba1e706b5973e013312992ba074eae2b8bc1b74f21176

See more details on using hashes here.

File details

Details for the file flowsets-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: flowsets-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 33.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for flowsets-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 93be9974425d63dce66880de56dcb76beef7dd9e458636ecf7f7e127b62c4cbb
MD5 1c9ea124fe5d15f24ac9e5b317d2db33
BLAKE2b-256 25a7fdea52f9f2554e0d6823b1181dc4aa8484c7d35cd16d268bb9fd9ebbf71a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page