Skip to main content

FeatureCloud Visualization

Project description

Featurecloud Cluster Visualization app

This is an interactive cluster visualization app implemented in Dash and Plotly.

App usage

This app is intended to be used in FeatureCloud environment. It requires input data in order to generate the interactive visualization interface. The data must be in the place and in the format specified by this documentation. The app has a tabular interface consisting of:

  • Confounders
  • Distances
  • Clustering Quality
  • Scree plot
  • Volcano plot
  • Help

Confounders tab

Main features:

  • Cluster or client id field based clustering display
  • K number selector
  • Cluster selector
  • X and Y axes selector
  • Use pie or bar chart selector for discrete data type visualization
  • Confounding factors filter
  • Scatter plot with confidence ellipsis
  • Linear or logarithmic scale
  • Point/Lasso/Box selection
  • Export diagrams to png
  • Confounding factors diagrams
  • Visualize and download selected points

Distances tab

Main features:

  • K number selector
  • Cluster selector
  • Confounding factors filter
  • Clustergram

Clustering Quality tab

Main features:

  • K number selector
  • Silhouette plot
  • K number selector

Scree plot tab

Main features:

  • Display components' eigenvalue

Volcano plot tab

Main features:

  • Set effect sizes thresholds (vertical)
  • Set genome wide line threshold (horizontal)

Help

It displays this documentation.

Input data requirements

Expected folder structure for visual representation

data
└───results
│   └───K2
│       │   clustering.csv
│       │   silhouette.csv
│   └───K3
│       │   clustering.csv
│       │   silhouette.csv
│   └───...
│   └───K<n>
│       │   clustering.csv
│       │   silhouette.csv
│   confoundingData.csv
│   confoundingData.meta    
│   localData.csv
│   distanceMatrix.csv
│   varianceExplained.csv
|   volcano_data.csv

Download

For a better understanding an example data set can be downloaded by clicking here.

Tip for running

When running the app in a workflow, one can upload a zip file containing the config file and data files. It will be automatically unzipped and copied to the input directory of the app. To test this, just upload the example data set mentioned above. When running the app in a workflow, one can trigger the app to finish, by clicking the Finished button in the top right corner. This makes possible starting the next app in the workflow, if any, or stopping the workflow.

Notes:

  • localData.csv file is mandatory. The rest of files are optional.
  • All files under K folders (if the folder exists) are mandatory

Delimiter

The default delimiter is the ";" character. It can be overwritten in the config.yml file.

Expected file structure

confoundingData.csv

This file contains all confounding factors related to local data. First column is the id (mandatory), followed by a maximum of 5 of columns of confounders. The confounders column names are arbitrary and must not match reserved column names: id, cluster, client_id

Example
id;age;sex;race;height;sugar-level
1;38;F;Caucasian;159;low
2;17;F;Asian;175;low
3;40;F;African-American;162;medium
4;32;F;Indian;183;high
5;18;F;Indian;193;low

confoundingData.meta

This file contains meta information about confounding factors

Supported data and value types:

Columns:

  • name: the name of the confounding factor
  • data_type:
    • continous: arbitrary values
    • discrete: values from a predefined value set
    • ordinal: values from a predefined value set in ordered manner
  • value_type
    • integer
    • string
    • enumeration values in ordered manner
Example
name;data_type;value_type
age;continuous;integer
sex;discrete;string
race;discrete;string
height;continuous;integer
sugar-level;ordinal;low,medium,high

localData.csv

This file contains the base values. Columns:

  • id: sample id (mandatory)
  • client_id: optional field, the app supports display of clustering on this field as well
  • data columns: at least 2 data columns need to be present. More than 2 data columns are supported. The column names are arbitrary and must not match reserved column names: id, cluster, client_id
Example
id;client_id;x;y;z
1;1;-0.115257648318211;0.289555823437292;0.333954194475931
2;1;-0.226069897739012;0.293898393621215;0.130668954544708
3;1;0.0606059327164007;0.0297344961039227;0.112959671444335
4;1;0.0398616396572761;-0.37563056412847;-0.35560909629883
5;1;-0.21084222999711;0.592948181336414;-0.368794747648271

distanceMatrix.csv

This file contains distances between samples. It is of n x n dimension, where n is the number of sample data.

Example
1;2;3;4;5
1;0;0.53851648071345;0.509901951359278;0.648074069840786;0.141421356237309
2;0.53851648071345;0;0.3;0.331662479035541;0.608276253029822
3;0.509901951359278;0.3;0;0.244948974278318;0.509901951359278
4;0.648074069840786;0.331662479035541;0.244948974278318;0;0.648074069840786
5;0.141421356237309;0.608276253029822;0.509901951359278;0.648074069840786;0

varianceExplained.csv

This file contains the eigenvalues for components. Columns:

  • component: mandatory field, it contains the name of the component
  • eigenvalue: mandatory field, it contains the eigenvalue of the component
Example
component;eigenvalue
x;0.729624454
y;0.408507618
z;0.228507618

clustering.csv

This file contains the cluster distribution of the samples. Columns:

  • id: mandatory, sample id
  • cluster: mandatory, cluster id
Example
id;cluster
1;1
2;1
3;1
4;1
5;1

silhouette.csv

This file contains data used to display the clusters silhouette plot. Columns:

  • index column, mandatory
  • y: mandatory, contains the value to be plotted
  • cluster: mandatory, contains the cluster id
Example
x;y;cluster
1;0.369499266613275;1
2;0.783307729521766;1
3;0.0627545099705458;1
4;0.205028521828353;1
5;0.915254552382976;1

volcano_data.csv

This file contains data used to display the volcano plot. The columns are the default columns used in Dash Bio Volcano plot library.

Example
CHR;BP;P;SNP;ZSCORE;EFFECTSIZE;GENE;DISTANCE
1;937641;0.335343792801723;rs9697358;0.9634;-0.0946;ISG15;1068
1;1136887;0.245857131900266;rs34945898;1.1605;-0.0947;TNFRSF4;0
1;2116240;0.823285880265757;rs12034613;0.2233;-0.0741;FP7162;0

General requirements for input data

  • the number of samples has to be the same in all files
  • the sample ids must be persistent

Config file support

The app supports setting all data file and directory paths from config file. The config.yml file should be placed in the default data directory (mnt/input/data) Example:

fc-cluster-visualization-app:
  delimiter: ';'
  data-dir: 'data/exampleData'
  local-data-path: 'data/exampleData/localData.csv'
  distance-matrix-path: 'data/exampleData/distanceMatrix.csv'
  confounding-meta-path: 'data/exampleData/confoundingData.meta'
  confounding-data-path: 'data/exampleData/confoundingData.csv'
  variance-explained-path: 'data/exampleData/varianceExplained.csv'
  k-values-clustering-result-dir: 'data/exampleData/results'
  k-values-clustering-file-name: 'clustering.csv'
  k-values-silhouette-file-name: 'silhouette.csv'
  volcano-data-path: 'exampleData/volcano_data.csv'
  # all files downloaded from the browser will end up here too
  download-dir: 'data/exampleData/downloads'

If config file is not present, the app will search for data in the default folder (/mnt/input/data). Any key from the config file can be omitted, in that case the app will search in the default data directory. Keys should not be left with blank values.

Limitations

  • the app supports displaying 5 confounding factors simultaneously
  • if more than 5 confounding factors are present in the confoundingMeta.csv file, it will display the first 5

Workflow

When the app runs in a FeatureCloud workflow, a Finished button will be displayed in the upper right corner. Clicking on the button terminates the application, while the controller shuts down the Docker container. Also, input folder content will be copied to the output folder.

Screenshots

Confounders tab

Confounding factors filter with scatter plot Confounders tab

Scatter plot with confounding factors diagrams Confounders tab

View selected data from scatter plot Confounders tab

Distances tab

Clustergram Distances tab

Clustering Quality tab

Silhouette diagram Clustering Quality tab

Scree plot

Scree plot tab

Volcano plot

Volcano plot tab

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fcvisualization-0.0.0.4.tar.gz (1.7 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page