Skip to main content

A robust, parallelized Python CLI for annotating three_prime_UTR

Project description

peaks2utr: a robust, parallelized Python CLI for annotating 3' UTR

CI PYPI - Version PYPI - Python Version License: GPL v3 DOI

peaks2utr is a Python command-line tool that annotates 3' untranslated regions (UTR) for a given set of aligned sequencing reads in BAM format, and canonical annotation in GFF or GTF format. peaks2utr uses MACS (https://pypi.org/project/MACS3/) to call broad "peaks" of significant read coverage in the BAM file, and uses those peaks that pass a set of criteria as a basis to annotate novel 3' UTRs. This favours BAM files from the likes of 10x Chromium runs, where signal is inherently concentrated at the distal ends of the 3' or 5' UTRs. Reads containing soft-clipped bases and polyA-tails of a given length are detected, and their end bases tallied as "truncation points". When piled up, each co-occurring truncation point is used to determine the precise end base of a given UTR. peaks2utr can be tuned to extend, override or ignore any pre-existing 3' UTR annotations in the input GFF file.

Installation

Install latest release with:

pip install peaks2utr

Alternatively, to install from source:

git clone https://github.com/haessar/peaks2utr.git
cd peaks2utr
python3 -m build
python3 -m pip install dist/*.tar.gz

Dependencies

Installation instructions assume a Debian / Ubuntu system with root privileges. Follow the links for instructions for other systems.

Required

bedtools

apt-get install bedtools

Optional

GenomeTools (for post-processing of output gff3)

apt-get install genometools

Verify installation

To check that peaks2utr has installed correctly, simply run the following in your terminal to initiate a short run with default parameters

peaks2utr-check

This uses a small demo set of input files contained in the repository: Tb927_01_v5.1.gff & Tb927_01_v5.1.slice.bam. When complete, you should see a file Tb927_01_v5.1.new.gff which contains original annotations as well as 3' UTRs with source "peaks2utr".

Quick start

peaks2utr is called from the command line as:

peaks2utr <GFF_IN> <BAM_IN> [options]

Inputs

  • GFF_IN - gene models in either GFF3 or GTF format (existing 3' UTRs optional).
  • BAM_IN - aligned reads in BAM format.
  • [options] - Run peaks2utr --help for full set of optional arguments.

Outputs

Outputs a GFF3 annotation file (or GTF with option --gtf) including original features plus 3' UTR features with source=peaks2utr. Output file name can be specified with -o or --output; by default outputs to original filename with a *.new.<ext> suffix.

Example call

peaks2utr Tb927_01_v5.1.gff Tb927_01_v5.1.slice.bam -p 4 -o output.gff3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peaks2utr-1.5.1.tar.gz (10.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peaks2utr-1.5.1-py3-none-any.whl (10.6 MB view details)

Uploaded Python 3

File details

Details for the file peaks2utr-1.5.1.tar.gz.

File metadata

  • Download URL: peaks2utr-1.5.1.tar.gz
  • Upload date:
  • Size: 10.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for peaks2utr-1.5.1.tar.gz
Algorithm Hash digest
SHA256 b8c26c07631fedff2b3e9474852b1718ec04e6e9733fdfb8dcf0288829aaabb5
MD5 023101443ddc5d39ddbd941ac9368ecc
BLAKE2b-256 7b7944af28fdc528cad953acedd6fca1d30f369c4ba06eeef986c5fe8509ab6e

See more details on using hashes here.

File details

Details for the file peaks2utr-1.5.1-py3-none-any.whl.

File metadata

  • Download URL: peaks2utr-1.5.1-py3-none-any.whl
  • Upload date:
  • Size: 10.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for peaks2utr-1.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 53bbe2beaa95212d974c446cb8ff2c361ca41b4769c3f2cd1f393592e5a6a376
MD5 c1746ad5f901f81d2895d37f1b5954b1
BLAKE2b-256 0fa3bcdf116fe43e844bd33ded9ac14557f11af46f5088f8852a9bece7a1f22b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page