Transcription Factor Enrichment Analysis
Project description
Transcription Factor Enrichment Analysis (TFEA)
Table of Contents
- Pipeline
- Installation and Requirements
- Basic Usage
- Advanced Usage
- Example Output
- Contact Information
TFEA Pipeline
Installation and Requirements
To install, this package and all python3 dependencies:
git clone https://github.com/Dowell-Lab/TFEA
cd /full/path/to/TFEA/
pip3 install --user .
This should take no longer than several minutes.
Note: If you plan to run TFEA only on FIJI using the --sbatch flag, then you only need to install DESeq and DESeq2. Otherwise, follow the instructions below for installing all TFEA dependencies.
DESeq
TFEA uses DESeq or DESeq2 (depending on replicate number) to rank inputted bed files based on fold change significance. If on FIJI, make sure all gcc modules are unloaded before installing DESeq or DESeq2. This can be accomplished with:module unload gcc
or
module purge
To install DESeq and DESeq2 type in your terminal:
R
> if (!requireNamespace("BiocManager", quietly = TRUE))
> install.packages("BiocManager")
> BiocManager::install("DESeq")
> BiocManager::install("DESeq2")
Bedtools
TFEA uses Bedtools to do several genomic computations. Instructions for installing bedtools can be found here:If you are on FIJI compute cluster, bedtools is available as a module:
module load bedtools/2.25.0
Samtools
TFEA uses samtools to index bam files. Samtools download and install instructions can be found here: Samtools Download and InstallIf you are on FIJI compute cluster, bedtools is available as a module:
module load samtools/1.8
MEME Suite
TFEA uses the MEME suite to scan sequences from inputted bed files for motif hits using the background atcg distribution form inputted bed file regions. TFEA also uses the MEME suite to generate motif logos for html display. Instructions for downloading and installing the MEME suite can be found here:MEME Download and Installation
If you are on FIJI compute cluster, the meme suite is available as a module:
module load meme/5.0.3
Image Magick
TFEA uses the meme2images script within MEME to produce motif logo figures. This requires Image Magick, which is a common linux utility package sometimes pre-installed on machines. To check if you have Image Magick installed try:identify -version
If you do not have Image Magick installed, follow these instructions:
Image Magick Download and Installation
FIJI Modules
Below is a summary of all FIJI modules needed to run TFEA.module load python/3.6.3
module load python/3.6.3/matplotlib/1.5.1
module load python/3.6.3/scipy/0.17.1
module load python/3.6.3/numpy/1.14.1
module load python/3.6.3/htseq/0.9.1
module load python/3.6.3/pybedtools/0.7.10
module load samtools/1.8
module load bedtools/2.25.0
module load meme/5.0.3
BasicUsage
Once installed, TFEA can be run from anywhere, try:
TFEA --help
Testing TFEA
To make sure TFEA is installed properly, run the following tests:Note: If you chose to skip installations because you were going to run TFEA using the --sbatch flag, make sure you load the appropriate modules on FIJI or these tests will fail.
TFEA --test-install
TFEA --test-full
These should each take no longer than several minutes to run
If on a compute cluster with slurm the --sbatch flag is compatible with --test-full and is recommended on FIJI. Execute like so:
TFEA --test-full --sbatch your_email@address.com
Running TFEA
Once you've run the above tests successfully, you should be ready to run the full version of TFEA. Below are the minimum required inputs to run the full TFEA pipeline. Test files are provided in './TFEA/test/test_files' within this repository.TFEA --output ./TFEA/test/test_files/test_output \
--bed1 ./TFEA/test/test_files/SRR1105736.tfit_bidirs.chr22.bed ./TFEA/test/test_files/SRR1105737.tfit_bidirs.chr22.bed \
--bed2 ./TFEA/test/test_files/SRR1105738.tfit_bidirs.chr22.bed ./TFEA/test/test_files/SRR1105739.tfit_bidirs.chr22.bed \
--bam1 ./TFEA/test/test_files/SRR1105736.sorted.chr22.subsample.bam ./TFEA/test/test_files/SRR1105737.sorted.chr22.subsample.bam \
--bam2 ./TFEA/test/test_files/SRR1105738.sorted.chr22.subsample.bam ./TFEA/test/test_files/SRR1105739.sorted.chr22.subsample.bam \
--label1 DMSO --label2 Nutlin \
--genomefasta ./TFEA/test/test_files/chr22.fa \
--fimo_motifs ./TFEA/test/test_files/test_database.meme
Advanced Usage
Configuration File
TFEA can be run exclusively through the command line using flags. Alternatively, TFEA can be run using a configuration file (.ini) that takes in flags as variables. For example:TFEA --config ./TFEA/test/test_files/test_config.ini
This can be helpful to keep track of different TFEA runs and because you can use variables within the config file to clean up your input. For documentation on config files and what you can do with them see Supported INI File Structure and Interpolation of values (ExtendedInterpolation)
Notes:
- Section headers (ex:
[OUTPUT]
) don't matter but you need to have at least ONE section header to be a viable .ini file. - Capitalization of variables doesn't matter.
- Feel free to specify any additional variables you like (variables are bash-like), TFEA will only parse variables that match a flag input.
- If an input is provided both as a flag and in a configuration file, TFEA prioritizes the command line flag input.
Below is an example of a configuration file (./test_files/test_config.ini):
[OUTPUT]
OUTPUT='./TFEA/test/test_files/test_output/'
LABEL1='Condition 1'
LABEL2='Condition 2'
[DATA]
TEST_FILES='./TFEA/test/test_files/'
BED1=[${TEST_FILES}+'SRR1105736.tfit_bidirs.chr22.bed',${TEST_FILES}+'SRR1105737.tfit_bidirs.chr22.bed']
BED2=[${TEST_FILES}+'SRR1105738.tfit_bidirs.chr22.bed',${TEST_FILES}+'SRR1105739.tfit_bidirs.chr22.bed']
BAM1=[${TEST_FILES}+'SRR1105736.sorted.chr22.subsample.bam', ${TEST_FILES}+'SRR1105737.sorted.chr22.subsample.bam']
BAM2=[${TEST_FILES}+'SRR1105738.sorted.chr22.subsample.bam', ${TEST_FILES}+'SRR1105739.sorted.chr22.subsample.bam']
[MODULES]
TEST_FILES='./TFEA/test/test_files/' #You need to re-initialize variables within each [MODULE]
FIMO_MOTIFS=${TEST_FILES}+'test_database.meme'
GENOMEFASTA=${TEST_FILES}+'chr22.fa'
[OPTIONS]
OUTPUT_TYPE='html'
PLOTALL=True
Using SBATCH
Specifying the `--sbatch` flag will submit TFEA to a compute cluster assuming you are logged into one. If the `--sbatch` flag is specified, it MUST be followed by an e-mail address to send job information to. For example:TFEA --config ./TFEA/test/test_files/test_config.ini --sbatch your_email@address.com
Additionally, the following flags can be used to change some of the job parameters:
--cpus CPUS Number of processes to run in parallel. Warning:
Increasing this value will significantly increase
memory footprint. Default: 1
--mem MEM Amount of memory to request forsbatch script. Default:
50gb
Note: --cpus
also works without the --sbatch
flag
Using Pre-processed Inputs
TFEA has several pipeline elements to it that a user may bypass by providing downstream pre-processed files. These files can be generated by TFEA if running the full pipeline and may also be used to speed up reruns of TFEA. Below are the three types of pre-processed inputs, short descriptions, an example of the file, and a usage example with TFEA (in some cases there are other inputs needed to go along with the pre-processed file). If multiple pre-processed inputs specified, TFEA will use the most downstream one.combined_file
A sorted (by chrom, start, stop) bed file containing regions of interest
Example (./test_files/test_combined_file.bed)
#chrom start stop
chr22 10683195 10683999
chr22 16609343 16609405
chr22 16901069 16902599
chr22 17036962 17037636
chr22 17158022 17160214
...
Usage with TFEA
TFEA --output ./TFEA/test/test_files/test_output \
--combined_file ./TFEA/test/test_files/test_combined_file.bed \
--bam1 ./TFEA/test/test_files/SRR1105736.sorted.chr22.subsample.bam ./test_files/SRR1105737.sorted.chr22.subsample.bam \
--bam2 ./TFEA/test/test_files/SRR1105738.sorted.chr22.subsample.bam ./test_files/SRR1105739.sorted.chr22.subsample.bam \
--label1 condition1 --label2 condition2 \
--genomefasta ./TFEA/test/test_files/chr22.fa \
--fimo_motifs ./TFEA/test/test_files/test_database.meme
ranked_file
A ranked bed file with regions of interest.
Note: Specifying a ranked_file turns off some plotting functionality
Example (./test_files/test_ranked_file.bed)
#chrom start stop
chr22 50794870 50797870
chr22 21554591 21557591
chr22 50304644 50307644
chr22 39096295 39099295
chr22 31176104 31179104
...
Usage with TFEA
TFEA --output ./TFEA/test/test_files/test_output \
--ranked_file ./TFEA/test/test_files/test_ranked_file.bed \
--label1 condition1 --label2 condition2 \
--genomefasta ./TFEA/test/test_files/chr22.fa \
--fimo_motifs ./TFEA/test/test_files/test_database.meme
fasta_file
A ranked fasta file with regions of interest (sequences must have unique names but these names aren't used for anything).
Note: Specifying a fasta_file turns off some plotting functionality
Example (./test_files/test_fasta_file.bed)
>chr22:50794870-50797870
ccgccccacactgacgcagt...ccgcctcagcctcctaaa
>chr22:21554591-21557591
cttggggagagcagaagcca...gtgcagtggtgcaatctt
>chr22:50304644-50307644
CTGAGCACCCCCCACCAGCCA...GGAGACGGGGCCTTTGT
...
Usage with TFEA
TFEA --output ./TFEA/test/test_files/test_output \
--fasta_file ./TFEA/test/test_files/test_fasta_file.fa \
--label1 condition1 --label2 condition2 \
--genomefasta ./TFEA/test/test_files/chr22.fa \
--fimo_motifs ./TFEA/test/test_files/test_database.meme
Secondary Analysis
TFEA can also perform MD-Score analysis and differential MD-Score analysis. This can be switched on easily if running the full pipeline:TFEA --output ./TFEA/test/test_files/test_output \
--combined_file ./TFEA/test/test_files/test_combined_file.bed \
--bam1 ./TFEA/test/test_files/SRR1105736.sorted.chr22.subsample.bam ./TFEA/test/test_files/SRR1105737.sorted.chr22.subsample.bam \
--bam2 ./TFEA/test/test_files/SRR1105738.sorted.chr22.subsample.bam ./TFEA/test/test_files/SRR1105739.sorted.chr22.subsample.bam \
--label1 condition1 --label2 condition2 \
--genomefasta ./TFEA/test/test_files/chr22.fa \
--fimo_motifs ./TFEA/test/test_files/test_database.meme \
--md --mdd
These secondary analyses can also take pre-processed input similar to TFEA. See the 'Secondary Analysis Inputs' section in the help message for more information.
Measuring TF FPKM
TFEA will also measure the FPKM of TF genes within your data if desired. This requires input into the `--motif_annotations` flag which is a bed file with motif names as the 4th column. Example:chr1 3698045 3733079 P73_HUMAN.H11MO.0.A 0 +
chr1 6579990 6589212 ZBT48_HUMAN.H11MO.0.C 0 +
chr1 15941868 15948495 ZBT17_HUMAN.H11MO.0.A 0 -
chr1 23359447 23368005 ZN436_HUMAN.H11MO.0.C 0 -
This special bed file can be generated from a .meme database file using a tab-separated 2-column file containing motif names to gene names and a gene annotation file:
Example of a motif_to_gene.tsv (this was generated on the HOCOMOCO v11 website):
Model Transcription factor
ANDR_HUMAN.H11MO.0.A AR
AP2A_HUMAN.H11MO.0.A TFAP2A
AP2C_HUMAN.H11MO.0.A TFAP2C
ASCL1_HUMAN.H11MO.0.A ASCL1
Example of gene annotations:
chr1 11873 14409 DDX11L1;NR_046018;chr1:11873-14409 0 +
chr1 14361 29370 WASH7P;NR_024540;chr1:14361-29370 0 -
chr1 17368 17436 MIR6859-1;NR_106918;chr1:17368-17436 0 -
chr1 17368 17436 MIR6859-4;NR_128720;chr1:17368-17436 0 -
chr1 17368 17436 MIR6859-3;NR_107063;chr1:17368-17436 0 -
The script works by looking for gene names that correspond to each motif within the 4th column of the gene annotation file. It expects the 4th column to be ';' delimited.
Already generated motif_annotation.bed files (and also intermediate .tsv files) are located within './motif_files/'
Generating Simulated Data
TFEA has a subpackage that is capable of generating simulated data for testing. If you have installed TFEA, it can be invoked with:TFEA-simulate --help
The purpose of this subpackage is to embed motif instances into fasta sequences that can be generated randomly or be from an experimental dataset (e.g. untreated control sample). There are several key flags that control this process (each of these may be a comma-delimited list of values that would indicate multiple instances of motif adding):
--distance_mu
: This flag controls where the center of the distribution is located (note: only normal distributions are supported at this point)
--distance_sigma
: Controls the standard deviation of the normal distribution
--rank_range
: Controls the range of sequences in which to add a motif
--motif_number
: Controls the number of motifs to add to your range of sequences
Rerunning TFEA
TFEA can be easily rerun given one or multiple TFEA output folders. This works simply by rerunning the rerun.sh script which contains all command-line flag inputs. TFEA also automatically creates a copy of your configuration file (if used) within the output folder which is then also used when rerunning (so no need to worry about editing the original configuration file). To rerun a single TFEA output folder:TFEA --rerun ./TFEA/test/test_files/test_output
The --rerun
flag also supports patterns containing wildcards to rerun all TFEA output folders that match. For example:
TFEA --rerun ./TFEA/test/test_files/test*
This works by looking recursively into all folders and subfolders for rerun.sh scripts and then executing sh rerun.sh
, so use with caution!
Help Message
Below are all the possible flags that can be provided to TFEA with a short description and default values.usage: TFEA [-h] [--output DIR]
[--bed1 [FILE1 FILE2 ... FILEN [FILE1 FILE2 ... FILEN ...]]]
[--bed2 [FILE1 FILE2 ... FILEN [FILE1 FILE2 ... FILEN ...]]]
[--bam1 [BAM1 [BAM1 ...]]] [--bam2 [BAM2 [BAM2 ...]]]
[--label1 LABEL1] [--label2 LABEL2] [--config CONFIG]
[--sbatch SBATCH] [--test-install] [--test-full]
[--combined_file COMBINED_FILE] [--ranked_file RANKED_FILE]
[--fasta_file FASTA_FILE] [--md] [--mdd]
[--md_bedfile1 MD_BEDFILE1] [--md_bedfile2 MD_BEDFILE2]
[--mdd_bedfile1 MDD_BEDFILE1] [--mdd_bedfile2 MDD_BEDFILE2]
[--md_fasta1 MD_FASTA1] [--md_fasta2 MD_FASTA2]
[--mdd_fasta1 MDD_FASTA1] [--mdd_fasta2 MDD_FASTA2]
[--mdd_pval MDD_PVAL] [--mdd_percent MDD_PERCENT]
[--combine {intersect/merge,merge all,tfit clean,tfit remove small,False}]
[--rank {deseq,fc,False}] [--scanner {fimo,genome hits}]
[--enrichment {auc,auc_bgcorrect}] [--debug]
[--genomefasta GENOMEFASTA] [--fimo_motifs FIMO_MOTIFS]
[--fimo_thresh FIMO_THRESH] [--fimo_background FIMO_BACKGROUND]
[--genomehits GENOMEHITS] [--singlemotif SINGLEMOTIF]
[--permutations PERMUTATIONS] [--largewindow LARGEWINDOW]
[--smallwindow SMALLWINDOW] [--dpi DPI] [--padjcutoff PADJCUTOFF]
[--plotall] [--output_type {txt,html}] [--cpus CPUS] [--mem MEM]
[--motif_annotations MOTIF_ANNOTATIONS] [--bootstrap BOOTSTRAP]
[--basemean_cut BASEMEAN_CUT] [--rerun [RERUN [RERUN ...]]]
[--gc GC]
Transcription Factor Enrichment Analysis (TFEA)
optional arguments:
-h, --help show this help message and exit
Main Inputs:
Inputs required for full pipeline
--output DIR, -o DIR Full path to output directory. If it exists, overwrite
its contents.
--bed1 [FILE1 FILE2 ... FILEN [FILE1 FILE2 ... FILEN ...]]
Bed files associated with condition 1
--bed2 [FILE1 FILE2 ... FILEN [FILE1 FILE2 ... FILEN ...]]
Bed files associated with condition 2
--bam1 [BAM1 [BAM1 ...]]
Sorted bam files associated with condition 1. Must be
indexed.
--bam2 [BAM2 [BAM2 ...]]
Sorted bam files associated with condition 2. Must be
indexed.
--label1 LABEL1 An informative label for condition 1
--label2 LABEL2 An informative label for condition 2
--config CONFIG, -c CONFIG
A configuration file that a user may use instead of
specifying flags. Command line flags will overwrite
options within the config file. See examples in the
config_files folder.
--sbatch SBATCH, -s SBATCH
Submits an sbatch (slurm) job. If specified, input an
e-mail address.
--test-install, -ti Checks whether all requirements are installed and
command-line runnable.
--test-full, -t Performs unit testing on full TFEA pipeline.
Processed Inputs:
Input options for pre-processed data
--combined_file COMBINED_FILE
A single bed file combining regions of interest.
--ranked_file RANKED_FILE
A bed file containing each regions rank as the 4th
column.
--fasta_file FASTA_FILE
A fasta file containing sequences to be analyzed,
ranked by the user.
Secondary Analysis Inputs:
Input options for performing MD-Score and Differential MD-Score analysis
--md Switch that controls whether to perform MD analysis.
--mdd Switch that controls whether to perform differential
MD analysis.
--md_bedfile1 MD_BEDFILE1
A bed file for MD-Score analysis associated with
condition 1.
--md_bedfile2 MD_BEDFILE2
A bed file for MD-Score analysis associated with
condition 2.
--mdd_bedfile1 MDD_BEDFILE1
A bed file for Differential MD-Score analysis
associated with condition 1.
--mdd_bedfile2 MDD_BEDFILE2
A bed file for Differential MD-Score analysis
associated with condition 2.
--md_fasta1 MD_FASTA1
A fasta file for MD-Score analysis associated with
condition 1.
--md_fasta2 MD_FASTA2
A fasta file for MD-Score analysis associated with
condition 2.
--mdd_fasta1 MDD_FASTA1
A fasta file for Differential MD-Score analysis
associated with condition 1.
--mdd_fasta2 MDD_FASTA2
A fasta file for Differential MD-Score analysis
associated with condition 2.
--mdd_pval MDD_PVAL P-value cutoff for retaining differential regions.
Default: 0.2
--mdd_percent MDD_PERCENT
Percentage cutoff for retaining differential regions.
Default: False
Module Switches:
Switches for different modules
--combine {intersect/merge,merge all,tfit clean,tfit remove small,False}
Method for combining input bed files
--rank {deseq,fc,False}
Method for ranking combined bed file
--scanner {fimo,genome hits}
Method for scanning fasta files for motifs
--enrichment {auc,auc_bgcorrect}
Method for calculating enrichment
--debug Print memory and CPU usage to stderr
Scanner Options:
Options for performing motif scanning
--genomefasta GENOMEFASTA
Genomic fasta file
--fimo_motifs FIMO_MOTIFS
Full path to a .meme formatted motif databse file.
Some databases included in motif_databases folder.
--fimo_thresh FIMO_THRESH
P-value threshold for calling FIMO motif hits.
Default: 1e-6
--fimo_background FIMO_BACKGROUND
Options for choosing mononucleotide background
distribution to use with FIMO. {'largewindow',
'smallwindow', int, file}
--genomehits GENOMEHITS
A folder containing bed files with pre-calculated
motif hits to a genome. For use with 'genome hits'
scanner option.
--singlemotif SINGLEMOTIF
Option to run analysis on a subset of motifs within
specified motif database or genome hits. Can be a
single motif or a comma-separated list of motifs.
Enrichment Options:
Options for performing enrichment analysis
--permutations PERMUTATIONS
Number of permutations to perfrom for calculating
p-value. Default: 1000
--largewindow LARGEWINDOW
The size (bp) of a large window around input regions
that captures background. Default: 1500
--smallwindow SMALLWINDOW
The size (bp) of a small window arount input regions
that captures signal. Default: 150
Output Options:
Options for the output.
--dpi DPI Resolution of output figures. Default: 100
--padjcutoff PADJCUTOFF
A p-adjusted cutoff value that determines some
plotting output.
--plotall Plot graphs for all motifs.Warning: This will make
TFEA run much slower andwill result in a large output
folder.
--output_type {txt,html}
Specify output type. Selecting html will increase
processing time and memory usage. Default: txt
Miscellaneous Options:
Other options.
--cpus CPUS Number of processes to run in parallel. Note:
Increasing this value will significantly increase
memory footprint. Default: 1
--mem MEM Amount of memory to request forsbatch script. Default:
50gb
--motif_annotations MOTIF_ANNOTATIONS
A bed file specifying genomic coordinates for genes
corresponding to motifs. Motif name must be in the 4th
column and match what is in the database.
--bootstrap BOOTSTRAP
Amount to subsample motifhits to. Set to False to turn
off. Default: False
--basemean_cut BASEMEAN_CUT
Basemean cutoff value for inputted regions. Default: 0
--rerun [RERUN [RERUN ...]]
Rerun TFEA in all folders of aspecified directory.
Default: False
--gc GC Perform GC-correction. Default: True
Example Output
TFEA will output all files and folders into the directory specified by the `--output` flag. The output directory structure is as follows:./TFEA/test/test_output
│ rerun.sh
│ test_config.ini
│ inputs.txt
│ results.txt
│ md_results.txt
│ mdd_results.txt
│ results.html
│
└───e_and_o
│ TFEA_test_output.err
│ TFEA_test_output.out
│
└───plots
│ logo_rcMOTIF.eps
│ logo_rcMOTIF.png
│ logoMOTIF.eps
│ logoMOTIF.png
│ MOTIF_enrichment_plot.png
│ MOTIF_simulation_plot.png
│ MOTIF.results.html
│
└───temp_files
combined_file.mergeall.bed
count_file.bed
count_file.header.bed
DESeq.R
DESeq.Rout
DESeq.res.txt
markov_background.txt
ranked_file.bed
ranked_file.fa
A brief description of the files contained within this output directory are below:
rerun.sh
This bash script can be used at any time to regenerate a TFEA output folder in its entirety, run it using:sh ./TFEA/test/test_output/rerun.sh
test_config.ini
TFEA copies the config file you are using into the output directoy. This file is then referenced by rerun.sh.
inputs.txt
A .txt file that contains all user-provided inputs into TFEA
results.txt
Contains TFEA results tab-delimited in .txt format. For example:#TF AUC Events p-val p-adj
P53_HUMAN.H11MO.0.A 0.2795355012578686 5 0.02464223619276762 0.04928447238553524
SP2_HUMAN.H11MO.0.A -0.04994116666412335 114 0.027169601555608307 0.054339203111216615
TF = The name of the motif analyzed
AUC = Area under the curve
Events = Number of motif hits within smallwindow
p-val = P-value
p-adj = adjusted p-value (Bonferroni)
md_results.txt and mdd_results.txt
Contains tab-delimited results for secondary MD-Score (MDS) and Differential MD-Score (MDD) analysis if these flags were specifiedresults.html
The main results html (if --output_type 'html'
specified). For example:
-
TFEA MA-plot - An MA-like plot with each dot representing a TF motif analyzed (red=significant below specified p-adj cutoff). On the y-axis is the area under the curve (AUC) which is the main TFEA statistic. On the x-axis is the log10 of the number of motif hits within the largewindow input
-
TFEA Volcano Plot - Similar to the MA-Plot, each dot is a TF motif (red=significant below specified p-adj cutoff). X-axis = area under the curve (AUC). Y-axis = -log10 of the p-adjusted value. Dashed line is the specified p-adjusted cutoff.
-
DE-Seq MA-plot - An actual MA-plot where each dot represents a region specified by the user. On the x-axis is the log10 of the average expression across conditions and replicates. On the y-axis is the log2 fold change between both conditions. The dots on this plot are colored based on how they are ranked.
-
Link to inputs.txt file, MD MA-plot and volcano (if
--md
specified), MDD MA-plot and volcano (if--mdd
specified), and a table of time to perform each module in TFEA and the total time to run TFEA. -
A list of TF motifs analyzed that have positive AUC values (headers correspond to the same headers as in results.txt). If red, then this TF motif was significant below the p-adj cutoff. Clickable links will direct to a separate MOTIF.results.html file contained within the plots/ directory in output.
-
A list of TF motifs analyzed that have negative AUC values (headers correspond to the same headers as in results.txt). If red, then this TF motif was significant below the p-adj cutoff. Clickable links will direct to a separate MOTIF.results.html file contained within the plots/ directory in output.
MOTIF.results.html
Each signficant TF motif (or all motifs if --plotall
specified) will produce its own MOTIF.results.html file contained within the plots/ directory in the specified output directory. All images are also self-contained within the plots/ folder. For example:
-
Results for this specific motif (identical to what's reported in results.html)
-
Enrichment line plot - The running sum statistic (green) for the specified TF motif. The blue dashed line indicates the random background expectation. The area under the curve (AUC) is calculated as the area between the green and dashed blue line (directional).
-
Score bar plot - Quantification of the amount added to the running sum statistic at any given point.
-
Motif hit scatter plot - The actual motif hits for each region centered on the region and bounded by the largewindow input.
-
Rank metric fill plot - A visual representation of the ranking metric used (log10(DE-Seq p-value) with direction dependent on fold change)
-
Meta plot - A meta plot of read coverage over regions separated by quartiles.
-
Heat map - A heatmap of motif hits for each quartile
-
The forward motif logo
-
The reverse complement motif logo
-
Simulation plot - The distribution of simulated AUC values (number of simulations specified with
--permutations
flag). The observed AUC is the red bar.
Contact Information
Jonathan.Rubin@colorado.eduProject details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.