Skip to main content

Determining 3' termini of transcripts from RNAtag-seq data in bacteria

Project description

TRS

Table of Contents
  1. Description
  2. Getting Started
  3. Scheme file
  4. Usage
  5. Contact
  6. License
  7. Acknowledgements
  8. References

Description

This algorithm is intended to determine 3' termini from RNATag-seq [1] sequencing data. We observed that in RNATag-seq data, sequencing reads accumulate at the 3' termini of transcripts that enables their identification. To do so, we compute for each position in the genome the ..... The computed ratio is acomparable measure between libraries and it signifies the positions where drastic loss in expression occur. We expect positions subjected to random shearing of the RNA in the experimental procedure to resolve in less reproducible signal. To retrieve the reproducible signal, we follow Adams et al. 2021, [2] method to analyse Term-seq [3] data, which uses peak calling followed by the irreproducibility discovery rate (IDR) procedure [4]. The pipeline is fully described in [].

Getting Started

This section briefly describes how to setup the package and shows how test it works.

Prerequisites

This package requires the installation of the following packages:

  • numpy
  • pandas
  • scipy
  • statsmodels
  • pysam
  • pyaml
  • intervaltree

Installation

Either pip install TRSalgorithm or conda install TRSalgorithm

Files required for running the program

The package requires the following files to determine 3' termini:

  • BAM files containing the mapped reads of the sequencing libraries.
  • A scheme file describing how to read the BAM files, assigning them to groups, which enables processing sequencing libraries of multiple conditions in a single run (see here for further detail).

Running an example file

COMPLETE

Scheme File

COMPLETE

Usage

usage: peakcaller.py [-h] [-w WORKDIR] [-t THRESHOLD] [-b] [-c MIN_COUNT] [--min_height MIN_HEIGHT] [--window_margin WINDOW_MARGIN] [--merge_distance MERGE_DISTANCE] [--rel_height REL_HEIGHT]
                     [--chr_list CHR_LIST] [-d DS_DISTANCE] [--signif_min_lib_count SIGNIF_MIN_LIB_COUNT] [--insignif_min_ratio INSIGNIF_MIN_RATIO] [-l LOG_LEVEL]
                     scheme

Uses read starts to determine 3' termini of transcripts.

positional arguments:
  scheme                Path for file containing info of the libs to process.

optional arguments:
  -h, --help            show this help message and exit
  -w WORKDIR, --workdir WORKDIR
                        Working directory. (default: ./runs)
  -t THRESHOLD, --threshold THRESHOLD
                        The signficance level used by the model. (default: 0.01)
  -b, --force_bam       Forces the program to reprocess the bam files. (default: False)
  -c MIN_COUNT, --min_count MIN_COUNT
                        The minimal coverage for a region to be considered. (default: 10)
  --min_height MIN_HEIGHT
                        The minimal ratio to consider as a peak. (default: None)
  --window_margin WINDOW_MARGIN
                        Defines the region which will be used to count the local number of reads starts. The margin is the number of nucleotides (upstream/downstream) that will be added to the region
                        around the considered site. (default: 3)
  --merge_distance MERGE_DISTANCE
                        The distance which below it, peaks will be merged together. (default: 0)
  --rel_height REL_HEIGHT
                        The relative height of peaks to use in the scipy find_peaks function. (default: 0.75)
  --chr_list CHR_LIST   Conversion of chromosome files from the bam to new name in the following format: bam1:new1,bam2:new2,... (default: )
  -d DS_DISTANCE, --ds_distance DS_DISTANCE
                        The distance (downstream) used to compute the read starts ratio. (default: 70)
  --signif_min_lib_count SIGNIF_MIN_LIB_COUNT
                        The minimal number of libraries in which the 3' terminus should be found as significant to report it depending on the threshold. (default: 2)
  --insignif_min_ratio INSIGNIF_MIN_RATIO
                        The minimal mean ratio to accept if the 3' terminus wasn't significant in all repeats. (default: 0.5)
  -l LOG_LEVEL, --log_level LOG_LEVEL
                        The logging level to report to the log file. (default: debug)

Contact

Amir Bar - amir.bar@mail.huji.ac.il

Acknowledgements

We are thankful to:

License

TODO: add MIT license (and a copy of Adams et al)

References

  • [1] Shishkin, Alexander A., et al. "Simultaneous generation of many RNA-seq libraries in a single reaction." Nature methods 12.4 (2015): 323-325.
  • [2] Adams, Philip P., et al. "Regulatory roles of Escherichia coli 5'UTR and ORF-internal RNAs detected by 3'end mapping." Elife 10 (2021): e62438.
  • [3] Dar, Daniel, et al. "Term-seq reveals abundant ribo-regulation of antibiotics resistance in bacteria." Science 352.6282 (2016).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TRSalgorithm-0.0.1.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

TRSalgorithm-0.0.1-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file TRSalgorithm-0.0.1.tar.gz.

File metadata

  • Download URL: TRSalgorithm-0.0.1.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for TRSalgorithm-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9ee408483fa31b41476bbdb29de8ff4ef8757e6d6777a9129b20c83e01a3dfc4
MD5 1e50dbdb1e779e57603fe66fd2a39781
BLAKE2b-256 138ed305a3b506a7217e69c78365d2d044c1d55ed3b2b8fd7891f4e893dbb293

See more details on using hashes here.

File details

Details for the file TRSalgorithm-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: TRSalgorithm-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for TRSalgorithm-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 24a3b399a6517f22daf15696cb2039f3114525ef9a5a43abeb221eab8832598d
MD5 1a083c5e7f77d960634099fc78731af7
BLAKE2b-256 111e4b7643f474bd873b3d5322479daa09c08fe164afd1c069f65b23f512211b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page