Skip to main content

Tool for emending alignments of spuriously spliced transcript reads

Project description

Emending Alignments of Spliced Transcript Reads (EASTR)

PyPI - License PyPI - Version GitHub Release

(\(\
(-.-)
o('')('')

EASTR is a tool for detecting and eliminating spuriously spliced alignments in RNA-seq datasets. It improves the accuracy of transcriptome assembly by identifying and removing misaligned spliced alignments. The tool can process GTF, BED, and BAM files as input. EASTR can be applied to any RNA-seq dataset regardless of the alignment software used.

Dependencies

Required:

Optional for testing:

Getting Started

The installation steps for running EASTR are outlined below.

Installing from source

  1. Clone repository

    git clone --recurse-submodules https://github.com/ishinder/EASTR.git
    
  2. Installing from source

    cd EASTR
    # python3 -m venv .venv # (OPTIONAL)
    # source .venv/bin/activate # (OPTIONAL)
    pip install -U pip setuptools
    pip install .
    

Installing from PyPi

  • Type the following in the terminal

    # python3 -m venv .venv # (OPTIONAL)
    # source .venv/bin/activate # (OPTIONAL)
    pip install -U pip setuptools
    pip install eastr==1.1.0
    

Required Arguments

NOTE: Only one of the below input options (GTF, BED, or BAM) should be provided.

  • --gtf : Input GTF file containing transcript annotations
  • --bed : Input BED file with intron coordinates
  • --bam : Input BAM file or a TXT file containing a list of BAM files with read alignments

Additionally, the following arguments are required:

  • -r, --reference : Reference FASTA genome used in alignment
  • -i, --bowtie2_index : Path to Bowtie2 index for the reference genome

Optional Arguments

  • --bt2_k : Minimum number of distinct alignments found by bowtie2 for a junction to be considered spurious. Default: 10
  • -o : Length of the overhang on either side of the splice junction. Default: 50
  • -a : Minimum required anchor length in each of the two exons. Default: 7
  • --min_duplicate_exon_length: Minimum length that a one-anchor alignment shift must meet or exceed to be considered as representing duplicated exons. It is used to differentiate between exon duplications and spurious splice alignments. Default: 27
  • --min_junc_score : Minimum number of supporting spliced reads required per junction. Default: 1
  • --trusted_bed : Path to a BED file path with trusted junctions, which will not be removed by EASTR.
  • --verbose : Display additional information during BAM filtering, including the count of total spliced alignments and removed alignments
  • --removed_alignments_bam : Write removed alignments to a BAM file
  • -p : Number of parallel processes. Default: 1

Minimap2 Parameters

  • -A : Matching score. Default: 3
  • -B : Mismatching penalty. Default: 4
  • -O : Gap open penalty. Default: [12, 32]
  • -E : Gap extension penalty. Default: [2, 1]
  • -k : K-mer length for alignment. Default: 3
  • --scoreN : Score of a mismatch involving ambiguous bases. Default: 1
  • -w : Minimizer window size. Default: 2
  • -m : Discard chains with chaining score. Default: 25

Output Options

  • --out_original_junctions : Write original junctions to the output file or directory
  • --out_removed_junctions : Write removed junctions to the output file or directory; the default output is to the terminal
  • --out_filtered_bam : Write filtered bams to the output file or directory
  • --filtered_bam_suffix : Suffix added to the name of the output BAM files. Default: '_EASTR_filtered'

Other arguments

  • -p : Number of parallel processes. Default: 1

Usage

The run_eastr.sh script in the tests directory demonstrates two different ways to run the EASTR pipeline: on a bamlist and on a GTF file. Below, we provide instructions for each use case.

Running EASTR on a bamlist

  1. Ensure you are in the appropriate directory containing the BAM/original folder and reference files.

  2. Create a list of BAM files (make sure the list contains the full paths to the BAM files):

    ls path/to/BAM/original/*.bam > bamlist.txt
    
  3. Run the EASTR pipeline on the bamlist with the following command:

    eastr
        --bam bamlist.txt
        --reference /path/to/reference_fasta
        --bowtie2_index /path/to/bowtie2_index
        --out_filtered_bam /path/to/output/BAM/filtered  #optional
        --out_original_junctions /path/to/output/original_junctions #optional
        --out_removed_junctions /path/to/output/removed_junctions # optional
        --removed_alignments_bam #optional
        --verbose #optional
        -p 12 #optional
    

Running EASTR on a GTF

Run the EASTR pipeline on the GTF file with the following command:

  eastr
    --gtf /path/to/gtf_file
    --reference /path/to/reference_fasta
    --bowtie2_index /path/to/bowtie2_index
    --out_removed_junctions /path/to/output/outfile.bed # optional

Analyzing an example dataset

Note 1: Downloading FASTQ files using the get_fastq.sh script requires SRA_toolkit

Note 2: Converting the GFF reference annotation to GTF in the get_ref.sh script requires gffread

We have included a script that demonstrates the application of the EASTR pipeline to an Arabidopsis dataset featured in our study. The sra_list_arabidopsis.txt file, located in the tests directory, lists the accession IDs of the samples analyzed.

The EASTR pipeline takes BAM files as input. The run_all.sh script acquires FASTQ files, the FASTA reference and annotation, and then aligns the FASTQ files using HISAT2 to generate BAM files. These BAM files are subsequently used as input to EASTR. Additionally, EASTR can accept a GTF annotation file and output a BED file containing questionable junctions (executed in the last command of the run_eastr.sh script).

To execute the entire EASTR pipeline, which filters BAM files and identifies reference annotation errors, use the run_all.sh script found in the tests directory. This script ensures all necessary steps and subscripts are carried out in the correct order. To analyze the example dataset, follow these steps:

  1. Navigate to the tests directory within the EASTR package:
  2. Make sure all scripts are executable (chmod +x *sh):
  3. Run the run_all.sh script.

The script will download the necessary FASTQ files, reference genome, and then perform the alignment and EASTR analysis. The output files will be generated in their respective directories within the tests folder.

When executed on 4 CPUs, the EASTR command to filter 6 BAM files completes in approximately 35 minutes, with the bulk of this time being dedicated to the filtering of BAM files (a single bam file typically takes between 15-20 minutes to filter on a single CPU). On 1 CPU, the EASTR command to identify questionable introns in an annotation takes about 30 seconds.

Citation

To cite EASTR in publications, please use the following reference:

Shinder I, Hu R, Ji HJ, Chao KH, Pertea M. EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes. Nat Commun. 2023 Nov 9;14(1):7223. doi: 10.1038/s41467-023-43017-4. PMID: 37940654; PMCID: PMC10632439.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eastr-1.1.1.tar.gz (300.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

eastr-1.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

eastr-1.1.1-cp313-cp313-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

eastr-1.1.1-cp313-cp313-macosx_10_13_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 10.13+ x86-64

eastr-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

eastr-1.1.1-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

eastr-1.1.1-cp312-cp312-macosx_10_13_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

eastr-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

eastr-1.1.1-cp311-cp311-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

eastr-1.1.1-cp311-cp311-macosx_10_9_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

eastr-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

eastr-1.1.1-cp310-cp310-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

eastr-1.1.1-cp310-cp310-macosx_10_9_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

File details

Details for the file eastr-1.1.1.tar.gz.

File metadata

  • Download URL: eastr-1.1.1.tar.gz
  • Upload date:
  • Size: 300.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for eastr-1.1.1.tar.gz
Algorithm Hash digest
SHA256 83941dbd193266d3a5c7bbcb0190705f7bcaec482eddd9bdd53356536db3b0ea
MD5 98c2c0df10e86197c994d96992b4c116
BLAKE2b-256 d5d17f6f911fc8a4c2fd007c2e884e6a571df095ba415663645d62ed0cc248a1

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.1.tar.gz:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a8364c47d7e4f61f718c921e3a88a2d059930d97eb1701c75b211a25eccab914
MD5 21af575b3e52aa2968747bf31d1879bb
BLAKE2b-256 39b571f2adb42d4195e6064e1781b907ea65b6cdc6b0b5f55d21d1aead909dbf

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for eastr-1.1.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 78d872f7ce8a5abc7f1b2230a38369450c88d852d9a84faa7ce50b9a324867d7
MD5 e92f758297f458a3d7f52c2d82735a94
BLAKE2b-256 d508896ad7f64d9a2441f8f7e56d626b723770305c0f0ebc446d1d81dc192cfe

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.1-cp313-cp313-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.1-cp313-cp313-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 e3bca9ed913ebe1cb87c2a3683ad64c807c2bc89922439a9bd25f8f910955238
MD5 443a54c36953894ebaf510628f03eb43
BLAKE2b-256 672f7d7239eef09a892b67f433eba5c6b1ebeef57116db43897feb0b8148eff3

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.1-cp313-cp313-macosx_10_13_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d27a4b5744b1f0f2e945cae6664e8ef5f6fd571f06ea9a7223e090ba4cf80b6a
MD5 bd80684fd8a7681a957da17c7d6c5377
BLAKE2b-256 ce2c749a41c2c929becd2587ef3f5b90a6a49fa1abd6b47e3ed89c417ef08c5c

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for eastr-1.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ce79d55ca01781f1acaa10aba81e5f0ee39483c70141c7bc1076017348246623
MD5 cff844cbc3c718d7f59d809bc870efee
BLAKE2b-256 c9308a3c6afe82252a11d681330064b959c6a3dfb4e92aa01653257ab2d5fd29

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.1-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.1-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 3f94da1574d20b220c576f6ec75e3d30cc2a39845f99ccce9050b5844ee2ec12
MD5 6d0791f1ec79d3681f0282e5dd579b20
BLAKE2b-256 5667a30eee82db0e508cca2c44d06d18ac0bde969ae8e62d4bd8ca34b98221ee

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.1-cp312-cp312-macosx_10_13_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1281d2112dd726d9d8f7a683ab735f692191a0090b779c0657883453f9bfe6fb
MD5 ffe9ffd7e367d3735565f48f3ddeeb3f
BLAKE2b-256 179c72869aaa88a0f571a0c8ef09450639edd7a0aba42762b5455e2ca878b3a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for eastr-1.1.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bccfba11b78007b4ff1643cd546ee8bc13bbbb80138a15bdf57548832ac4b638
MD5 b7ec621a28e536358da68a7645b0434b
BLAKE2b-256 f631daf4de44ffab0cb1472be3b4e4c8f5dc64b40421a9b3220990e1fcb312cf

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.1-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.1-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.1-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 47f863cfde55c658271858f65228b7f4fec25720b0249523c3cb25cbf986812a
MD5 1c922b043a6e9534e1cad1c5bf4fd68d
BLAKE2b-256 1f0ccfee844f52311b7c3f9191a87d259cca916696dc55963490eaea868fd9b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.1-cp311-cp311-macosx_10_9_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5aa1baae29b638126c2c4381435edff0fc6e4b0d8705af8ff5888468bf8d61fd
MD5 6c0cd1a8d91df317989a05761c79d5c8
BLAKE2b-256 9adf447c0e7e4e72c78f5c508cf0b11c1fe22ea5b9976ecc291e5cd708640943

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for eastr-1.1.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3b939261565b08a0a78b0a8ed5b9068d8ce96eea2c27853aaa0237a90cd5760f
MD5 0e63defbb04692b6b5e7f2c1558628fe
BLAKE2b-256 0b12ed789628d8b31bbb9f51e592b67109d81d53e65439886ce8a920e510cc3b

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.1-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.1-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 f137efadf6a489bf879391d3c2356013dda5b98b3b408fb2ed9a590089cacb6a
MD5 80541c2fa47aedce08de56275657161f
BLAKE2b-256 84f6d4a0034b446304f59ccd063b9e3b13d9a5bc917d689137aeb3eb15b51cc9

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.1-cp310-cp310-macosx_10_9_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page