Peak Identifier for Nascent Transcripts Starts (PINTS)
Project description
PINTS: Peak Identifier for Nascent Transcripts Starts
Installation
PINTS is available on PyPI, which means you can install it with the following command:
pip install pyPINTS
Alternatively, you can clone this repo to a local directory, then in the directory, run the following command:
python setup.py install
Prerequisite
Python packages
- biopython
- matplotlib
- numpy
- pandas
- pybedtools
- pyBigWig
- pysam
- requests
- scipy
- statsmodels
Get started
PINTS can call peaks directly from BAM files. To call peaks from BAM files,
you need to provide the tool a path to the bam file and what kind of experiment it was from.
If it's from a standard protocol, like PROcap, then you can set --exp-type PROcap
.
Other supported experiments including GROcap/
CoPRO/
csRNAseq/
NETCAGE/
CAGE/
RAMPAGE/
STRIPEseq. For a comprehensive list of directly supported assays, please run
pints_caller --help
If the data was generated by other methods, you need to tell the tool where it can find ends of RNAs you are interested in.
For example, --exp-type R_5
tells the tool that:
- this alignment is from a single-end library;
- the tool should look at 5' of reads. Other supported values are
R_3
,R1_5
,R1_3
,R2_5
,R2_3
.
If reads represent the reverse complement of original RNAs, like PROseq, then you need to use --reverse-complement
(not necessary for standard protocols).
One example for calling peaks from BAM file:
pints_caller --bam-file input.bam --save-to output_dir --file-prefix output_prefix --thread 16 --exp-type PROcap
Or you can call peaks from BigWig files:
pints_caller --save-to output_dir --file-prefix output_prefix --bw-pl path_to_pl.bw --bw-mn path_to_mn.bw --thread 16
Outputs
- prefix+
_{SID}_divergent_peaks.bed
: Divergent TREs; - prefix+
_{SID}_bidirectional_peaks.bed
: Bidirectional TREs (divergent + convergent); - prefix+
_{SID}_unidirectional_peaks.bed
: Unidirectional TREs, maybe lncRNAs transcribed from enhancers (e-lncRNAs) as suggested here.
{SID}
will be replaced with the number of samples that peaks are called from,
if you only provide PINTS with one sample, then {SID}
will be replaced with 1,
if you try to use PINTS with three replicates (--bam-file A.bam B.bam C.bam
), then {SID}
for peaks identified from A.bam
will be replaced with 1.
For divergent or bidirectional TREs, there will be 6 columns in the outputs:
- Chromosome
- Start site: 0-based
- End site: 0-based
- Confidence about the peak pair. Can be:
Stringent(qval)
, which means the two peaks on both forward and reverse strands are significant based on their q-values;Stringent(pval)
, which means one peak is significant according to q-value while the other one is significant according to p-value;Relaxed
, which means only one peak is significant in the pair.- A combination of the three types above, because of overlap for nearby elements.
- If epigenomic annotation is enabled by
--epig-annotation <biosample>
, then peaks that are less significant (--relaxed-fdr-target
, default is 2*fdr_target
), but overlap with epigenomic annotations from PINTS web server, will be listed with the confidence level:Marginal
.
- Major TSSs on the forward strand, if there are multiple major TSSs, they will be separated by comma
,
- Major TSSs on the reverse strand, if there are multiple major TSSs, they will be separated by comma
,
For unidirectional TREs, there will be 9 columns in the output:
- Chromosome
- Start
- End
- Peak ID
- Q-value
- Strand
- Read counts
- Position of the summit TSS
- Height of the summit
For all three types of TREs, if a valid biosample name for --epig-annotation
is provided, then an additional column with epigenomic annotation for each TRE will show up in the final output.
Parameters
Input & Output
- If you want to use BAM files as inputs:
--bam-file
: input bam file(s);--exp-type
: Type of experiment. If the experiment is not listed as a choice, or you know the position of RNA ends on the reads and you want to override the defaults, you can specify:R_5
(5' of the read for single-end lib),R_3
(3' of the read for single-end lib),R1_5
(5' of the read1 for paired-end lib),R1_3
(3' of the read1 for paired-end lib),R2_5
(5' of the read2 for paired-end lib),- or
R2_3
(3' of the read2 for paired-end lib)
--reverse-complement
: Set this switch if 1)exp-type
isRx_x
and 2) reads in this library represent the reverse complement of RNAs, like PROseq;--ct-bam
: Bam file for input/control (optional);
- If you want to use bigwig files as inputs:
--bw-pl
: Bigwig for signals on the forward strand;--bw-mn
: Bigwig for signals on the reverse strand;--ct-bw-pl
: Bigwig for input/control signals on the forward strand (optional);--ct-bw-mn
: Bigwig for input/control signals on the reverse strand (optional);
--save-to
: save peaks to this path (a folder), by default, current folder--file-prefix
: prefix to all outputs
Optional parameters
--epig-annotation <biosample>
: Use this option together with the name of the biosample that the library was derived from, for example K562; then epigenomic annotations will be downloaded from the PINTS web server and used for annotating and augmenting TREs identified by PINTS (for hg38 only);--relaxed-fdr-target <relaxed fdr>
: In the presence of--epig-annotation
, peaks that do not pass the original FDR cutoff but pass this relaxed cutoff and have support from DNase-seq and H3K27ac ChIP-seq will also be included in final outputs. By default, 2*fdr;--mapq-threshold <min mapq>
: Minimum mapping quality, by default: 30 orNone
;--close-threshold <close distance>
: Distance threshold for two peaks (on opposite strands) to be merged, by default: 300;--fdr-target <fdr>
: FDR target for multiple testing, by default: 0.1;--chromosome-start-with <chromosome prefix>
: Only keep reads mapped to chromosomes with this prefix, if it's set toNone
, then all reads will be analyzed;--thread <n thread>
: Max number of threads the tool can create;--borrow-info-reps
: Borrow information from reps to refine calling of divergent elements;--output-diagnostic-plot
: Save diagnostic plots (independent filtering and pval dist) to local folder
More parameters can be seen by running pints_caller -h
.
Other tools
pints_boundary_extender
: Extend peaks from summits.pints_visualizer
: Generate bigwig files for the inputs.pints_normalizery
: Normalize inputs.
Tips
- Be cautious to reads mapped to scaffolds instead of main chromosome (for example the notorious
chrUn_gl000220
inhg19
, they maybe rRNA contamination)!
Contact
Please submit an issue with any questions or if you experience any issues/bugs. If you use PINTS in your work, please cite: https://www.nature.com/articles/s41587-022-01211-7.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.