Skip to main content

Method for structural probing signal calculation that eliminates read distribution bias and prevents reactivity underestimation.

Project description

Welcome to probNORM

A new method for structural probing signal calculation that eliminates read distribution bias and prevents reactivity underestimation. It is based on the analysis of background RT stops in treated and control samples of a single replicate and enables statistical discrimination of the probing-sensitive nucleotides. The reactivities obtained by probNORM are highly consistent with the structural models allowing the separation of single- and double-stranded nucleotides.

For detailed documentation please see: https://zywicki-lab.github.io/probNORM/


Required

Python: version 3.6 or greater (Python 3 is supported). If you’re setting up Python for the first time,
the Anaconda Python distribution is highly recommended.
Libraries: pysam, numpy, scipy
BEDTools: The version is not important, but later versions will have more features so it’s a good idea
to get the latest. Follow the instructions at https://github.com/arq5x/bedtools2 to install,
and make sure the programs are on your path. That is, you should be able to call bedtools
from any directory.

BEDTools installation

  • via conda:

      conda install -c bioconda bedtools
    
  • via apt-get for Debian like systems:

      sudo apt-get install bedtools
    

Quick start

The main file of probNORM program is probnorm. To quickly run probNORM on provided example files type:

probnorm bam -t example/treated.sorted.bam -c example/control.sorted.bam -o output.txt

for BAM format input, and:

probnorm counts -i example/counts-input.txt -o output.txt

for count format input.

This command will run probNORM with the default parameters.

probNORM allows for two format of input data: BAM file or custom made counts file. Depending on the input type, the additional options may vary.

The example files are provided at https://github.com/zywicki-lab/probNORM


Output file

Format


The file contains full information about the normalized transcript/s. It consists of nine tab separated columns:
Column name Description
transcript_id ID of normalized transcript, the same as in the input file
position Position in transcript
stops_treated Stops count in the treated sample: from input counts file or calculated from BAM file
stops_control Stops count in the control sample: from input counts file or calculated from BAM file
stops_norm_control Normalized stops count in the control sample. Stops are normalized by incorporating the normalization factor (nf).
reactivity Reactivity, calculated based on the normalized control stops.
fold_change The ratio between stops counts in control and treated sample
p_value P-value indicates the probability of nucleotide at a given position being a part of the background, not statistically significant.
passed_quality_filter Quality filter (Y - yes / N - no). Transcript positions that exceed the filtering step are those with stops count higher than zero (both control and treated samples), without any missing parameters, and with proper coverage value (when a local script is determining the stops counts

#probnorm counts -i example/counts-input.txt -o output.txt

transcript_id	position	stops_treated	stops_control	stops_norm_control	reactivity	fold_change	p_value	passed_quality_filter
RDN18-1	1	3095.0	3472.0	2669.1000000000004	1.0632124544542494	0.2135860512052699	0.37310634695017253	Y
RDN18-1	2	2029.0	1148.0	882.5250000000001	2.5274855472882036	1.2010598126290937	0.03438625350046609	Y
RDN18-1	3	315.0	360.0	276.75	0.09548691331973486	0.18676851160572655	0.38858771448505425	Y
RDN18-1	4	264.0	405.0	311.34375	0.0	-0.23797038886541122	0.6407954148840493	Y
RDN18-1	5	139.0	171.0	131.45625	0.018832141238058788	0.08050214738573189	0.45145693582080115	Y
...
RDN18-1	1776	0	0	0.0	0.0	0	0.5	N
RDN18-1	1777	0	0	0.0	0.0	0	0.5	N
RDN18-1	1778	0	0	0.0	0.0	0	0.5	N
RDN18-1	1779	0	0	0.0	0.0	0	0.5	N
RDN18-1	1780	25.0	9.0	6.91875	0.04513784971143676	1.8533447778805348	0.002490274610317811	Y

Summary information

After each use of probNORM the summary of run will be shown. It contains such informations as:

  • input file type
  • input and output file names
  • parameters thresholds: coverage, p-value, reactive positions
  • statictics about normalized transcripts

See the example below.

  • BAM input

      ***** SUMMARY *****
    
          input mode: BAM
          input file/s: control: example/control.sorted.bam treated: example/treated.sorted.bam
          output file: test.output
          min coverage: 0
          max p-value: 1.0
          min reactive positions per transcript: 20%
          selected transcripts:  all
          total number of input transcripts: 3
          transcripts omitted due to low reactivity: 0
          transcripts normalized: 3
    
      *******************
    
  • COUNTS input

      ***** SUMMARY *****
    
          input mode: COUNTS
          input file/s: data/counts-input.txt
          output file: test.output
          max p-value: 1.0
          min reactive positions per transcript: 20%
          total number of input transcripts: 5
          transcripts omitted due to low reactivity: 0
          transcripts normalized: 5
    
      *******************
    

Contribution

If you notice any errors and mistakes, or would like to suggest some new features, please use Github's issue tracking system to report it at probNORM. You are also welcome to send a pull request with your corrections and suggestions.


License

This project is licensed under the terms of the GNU General Public License v3.0 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

probNORM-1.0.3.tar.gz (28.8 kB view hashes)

Uploaded Source

Built Distribution

probNORM-1.0.3-py3-none-any.whl (28.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page