Skip to main content

Tool for emending alignments of spuriously spliced transcript reads

Project description

Emending Alignments of Spliced Transcript Reads (EASTR)

License Conda Version PyPi Conda Platform

Documentation: https://ccb.jhu.edu/eastr

(\(\
(-.-)
o('')('')


EASTR is a tool for detecting and eliminating spuriously spliced alignments in RNA-seq datasets. It improves the accuracy of transcriptome assembly by identifying and removing misaligned spliced alignments. The tool can process GTF, BED, and BAM files as input. EASTR can be applied to any RNA-seq dataset regardless of the alignment software used.

Dependencies

Required:

Optional for testing:

Getting Started

Install using conda (Recommended)

Installing with conda gives you eastr and all of the required (bowtie2, samtools) and optional (gffread, sra-tools) dependencies.

To install from the bioconda channel use the following command:

conda install bioconda::eastr

Install using pip

You can install the EASTR package directly from PyPi but you will need to ensure that all required dependencies (bowtie2 and samtools) have been installed and are in your $PATH environment variable.

To install from pip use the following command:

# python3 -m venv .venv # (OPTIONAL)
# source .venv/bin/activate # (OPTIONAL)
pip install eastr

Required Arguments

NOTE: Only one of the below input options (GTF, BED, or BAM) should be provided.

  • --gtf : Input GTF file containing transcript annotations
  • --bed : Input BED file with intron coordinates
  • --bam : Input BAM file or a TXT file containing a list of BAM files with read alignments

Additionally, the following arguments are required:

  • -r, --reference : Reference FASTA genome used in alignment
  • -i, --bowtie2_index : Path to Bowtie2 index for the reference genome

Optional Arguments

  • --bt2_k : Minimum number of distinct alignments found by bowtie2 for a junction to be considered spurious. Default: 10
  • -o : Length of the overhang on either side of the splice junction. Default: 50
  • -a : Minimum required anchor length in each of the two exons. Default: 7
  • --min_duplicate_exon_length: Minimum length that a one-anchor alignment shift must meet or exceed to be considered as representing duplicated exons. It is used to differentiate between exon duplications and spurious splice alignments. Default: 27
  • --min_junc_score : Minimum number of supporting spliced reads required per junction. Default: 1
  • --trusted_bed : Path to a BED file path with trusted junctions, which will not be removed by EASTR.
  • --verbose : Display additional information during BAM filtering, including the count of total spliced alignments and removed alignments
  • --removed_alignments_bam : Write removed alignments to a BAM file
  • -p : Number of parallel processes. Default: 1

Minimap2 Parameters

  • -A : Matching score. Default: 3
  • -B : Mismatching penalty. Default: 4
  • -O : Gap open penalty. Default: [12, 32]
  • -E : Gap extension penalty. Default: [2, 1]
  • -k : K-mer length for alignment. Default: 3
  • --scoreN : Score of a mismatch involving ambiguous bases. Default: 1
  • -w : Minimizer window size. Default: 2
  • -m : Discard chains with chaining score. Default: 25

Output Options

  • --out_original_junctions : Write original junctions to the output file or directory
  • --out_removed_junctions : Write removed junctions to the output file or directory; the default output is to the terminal
  • --out_filtered_bam : Write filtered bams to the output file or directory
  • --filtered_bam_suffix : Suffix added to the name of the output BAM files. Default: '_EASTR_filtered'

Other arguments

  • -p : Number of parallel processes. Default: 1

Usage

The run_eastr.sh script in the tests directory demonstrates two different ways to run the EASTR pipeline: on a bamlist and on a GTF file. Below, we provide instructions for each use case.

Running EASTR on a bamlist

  1. Ensure you are in the appropriate directory containing the BAM/original folder and reference files.

  2. Create a list of BAM files (make sure the list contains the full paths to the BAM files):

    ls path/to/BAM/original/*.bam > bamlist.txt
    
  3. Run the EASTR pipeline on the bamlist with the following command:

    eastr
        --bam bamlist.txt
        --reference /path/to/reference_fasta
        --bowtie2_index /path/to/bowtie2_index
        --out_filtered_bam /path/to/output/BAM/filtered  #optional
        --out_original_junctions /path/to/output/original_junctions #optional
        --out_removed_junctions /path/to/output/removed_junctions # optional
        --removed_alignments_bam #optional
        --verbose #optional
        -p 12 #optional
    

Running EASTR on a GTF

Run the EASTR pipeline on the GTF file with the following command:

  eastr
    --gtf /path/to/gtf_file
    --reference /path/to/reference_fasta
    --bowtie2_index /path/to/bowtie2_index
    --out_removed_junctions /path/to/output/outfile.bed # optional

Analyzing an example dataset

Note 1: Downloading FASTQ files using the get_fastq.sh script requires SRA_toolkit

Note 2: Converting the GFF reference annotation to GTF in the get_ref.sh script requires gffread

We have included a script that demonstrates the application of the EASTR pipeline to an Arabidopsis dataset featured in our study. The sra_list_arabidopsis.txt file, located in the tests directory, lists the accession IDs of the samples analyzed.

The EASTR pipeline takes BAM files as input. The run_all.sh script acquires FASTQ files, the FASTA reference and annotation, and then aligns the FASTQ files using HISAT2 to generate BAM files. These BAM files are subsequently used as input to EASTR. Additionally, EASTR can accept a GTF annotation file and output a BED file containing questionable junctions (executed in the last command of the run_eastr.sh script).

To execute the entire EASTR pipeline, which filters BAM files and identifies reference annotation errors, use the run_all.sh script found in the tests directory. This script ensures all necessary steps and subscripts are carried out in the correct order. To analyze the example dataset, follow these steps:

  1. Navigate to the tests directory within the EASTR package:
  2. Make sure all scripts are executable (chmod +x *sh):
  3. Run the run_all.sh script.

The script will download the necessary FASTQ files, reference genome, and then perform the alignment and EASTR analysis. The output files will be generated in their respective directories within the tests folder.

When executed on 4 CPUs, the EASTR command to filter 6 BAM files completes in approximately 35 minutes, with the bulk of this time being dedicated to the filtering of BAM files (a single bam file typically takes between 15-20 minutes to filter on a single CPU). On 1 CPU, the EASTR command to identify questionable introns in an annotation takes about 30 seconds.

Citation

To cite EASTR in publications, please use the following reference:

Shinder I, Hu R, Ji HJ, Chao KH, Pertea M. EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes. Nat Commun. 2023 Nov 9;14(1):7223. doi: 10.1038/s41467-023-43017-4. PMID: 37940654; PMCID: PMC10632439.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eastr-1.1.2.tar.gz (303.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

eastr-1.1.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

eastr-1.1.2-cp313-cp313-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

eastr-1.1.2-cp313-cp313-macosx_10_13_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 10.13+ x86-64

eastr-1.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

eastr-1.1.2-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

eastr-1.1.2-cp312-cp312-macosx_10_13_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

eastr-1.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

eastr-1.1.2-cp311-cp311-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

eastr-1.1.2-cp311-cp311-macosx_10_9_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

eastr-1.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

eastr-1.1.2-cp310-cp310-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

eastr-1.1.2-cp310-cp310-macosx_10_9_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

File details

Details for the file eastr-1.1.2.tar.gz.

File metadata

  • Download URL: eastr-1.1.2.tar.gz
  • Upload date:
  • Size: 303.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for eastr-1.1.2.tar.gz
Algorithm Hash digest
SHA256 9ddfa1f00e6840b1bccecc450e06817baf5e2b32c2fd34927ce7a4410e9e1240
MD5 9570ef5ebe1bdf4fae7393fa9e0d2af2
BLAKE2b-256 9a11f07218126636880dad255f5d529d00802efa70b0799f42316bbcba5df48d

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.2.tar.gz:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 afe5bfb819f37117a82e138dd98048897fbc428c2fa5c3582f4919bf715251d3
MD5 8ae1d3d318fb92fa9c912f8d67b19d43
BLAKE2b-256 aa0273dfc47a61fa5676db947a69cef86acade97aec9b8222b70b5e5b64aa863

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for eastr-1.1.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 63b5a4ddd1be88dd4e1a64ab7c98e008191d9b3f94d03b9ce6212dde3fd4c2f2
MD5 5e3409fe35c04adfe8fc31b5549f6f6d
BLAKE2b-256 e393a0d1cdd62ad0a8cdde073d60df113355da7d2f917626ef52829c518e5ec0

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.2-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.2-cp313-cp313-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.2-cp313-cp313-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 f42ab7e88acd03e68790cb44863837eea3c88d1b8f4fd009834fac0c5381ed4f
MD5 6a040d0e311387aa248a6d9f741049c6
BLAKE2b-256 fbf951bc466b46d59787fc856dea03a72d5bb5fe75a53426ee117368ae5c51e9

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.2-cp313-cp313-macosx_10_13_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 71100fd76e8090f8d0cef6199654dd6704b6341d5e125ac2e9b74628339916ca
MD5 75659df99f7fff80a8de70b08c977076
BLAKE2b-256 4a97553a07967d369802fb8330be7f08e039d38f6f9eec1ba0d8f94cb791f4ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for eastr-1.1.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6cc3481e9931d382c85fc10cb29f67969e1adf200f8c84db6493ba04e1ab0985
MD5 4c7bf0b94682803dfc93b8d538b9193c
BLAKE2b-256 0c70c64767af00727fb65166ef4a18b2b119bda7f559f73006dd75ac41b12124

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.2-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.2-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.2-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 e52f0c765347c53948847baa1a9d364414aad05d899a88c4c024eabd8d838a2c
MD5 7f786e17c5eec079c7ab177ed1f01e8d
BLAKE2b-256 dd3e59064c20a3ec5fbcbf3fdc8ec8debd1031391dd152d9ef5d0bd9063bbdf6

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.2-cp312-cp312-macosx_10_13_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 047fee72d5a8189d0f83e0c1fbe01031d70989e829dd009e0b2caf6b6cf33149
MD5 37af71fdfaa7190ee56f8ad6f2cb0928
BLAKE2b-256 f12601e44a754c1edad677a472e373961a8320b95404dc13802b6bc553a45b94

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for eastr-1.1.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cffc00d9ee4846cf7fc3a3f8911861ae48cc1c0fe5a827f5db3cfc291c6f106c
MD5 687c8f8fde5495f52e4cd9fde424f182
BLAKE2b-256 995a57a587ba5c3fb64e680dca61733e9a0640e79ed1f0580e58abb908427c0f

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.2-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.2-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.2-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e03fda9a34f6f28fbec6f71062110d88be2552b9beb6cb7215f88eb6e668a218
MD5 ecc0922cbf477856455c3a879e36b2c1
BLAKE2b-256 d0b06a8e6ce410a4e49a0c090973610bb9203311c42ae22c50f43f8174425978

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.2-cp311-cp311-macosx_10_9_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 65268d82badc0eb5f2d4a421832cb17eab6d6c43035edc60b488508438e01cf2
MD5 7584fecd4d3f923f1c5608777b0ef1ce
BLAKE2b-256 31601023f0df3e1882d9ed8299096f73db9e5aff06ff13d9516cc407fa8f07e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for eastr-1.1.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1d631446009035eba4ffb31dcd385133ea08b55e1c616c9e6c88b600964e350b
MD5 33b9edf31d1ead928662f8deb37ccb66
BLAKE2b-256 dc538bc4391a5b8fbdf39615ca1b275ee708f5f5455a4a5deb41f9a9100caef0

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.2-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eastr-1.1.2-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for eastr-1.1.2-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ad76024add1eba6780279952016abe203148b479b902ae17b890197cc40a6631
MD5 ef5ca7587a69f323c358474c032b57d1
BLAKE2b-256 60e966fef587bc024280c876caa8c5df4019c8656882e70963f39fddcce5fe6d

See more details on using hashes here.

Provenance

The following attestation bundles were made for eastr-1.1.2-cp310-cp310-macosx_10_9_x86_64.whl:

Publisher: build_wheels.yml on ishinder/EASTR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page