Skip to main content

automatically design sgRNA for exon skipping with many base editors

Project description

AltEx-BE: Alternate Exon Skipping by Base Editing

Build Status License Python Version PyPI

Overview

AltEx-BE is a command-line bioinformatics tool that designs sgRNAs (single guide RNAs) to induce targeted exon skipping using Base Editing technology.

Manipulating alternative splicing is key to understanding diseases like cancer and neurodegenerative disorders, but designing the right tools for the job is a major bottleneck. The manual process of identifying targetable exons, designing sgRNAs for specific base editors, and assessing off-target risks is complex, tedious, and slows down critical research.

AltEx-BE is a powerful command-line tool built to automate this entire workflow. It intelligently parses transcript data to find the best exon targets, designs candidates for a multitude of base editors, and evaluates their off-target risk to provide a ranked list of high-confidence sgRNAs.

By transforming a complex, multi-step design process into a single command, AltEx-BE bridges the gap between your scientific question and a successful wet lab experiment, significantly accelerating research into splicing-related diseases and therapies.

Key Features

  • 🧬 Automated Target Exon Annotation:

    • Automatically parses transcript structures from refFlat files to identify and classify potential targets for exon skipping. This includes Skipped Exons (SE) and exons with Alternative 3'/5' Splice Sites (A3SS/A5SS), eliminating the need for tedious manual searches.
  • ⚙️ Universal Base Editor Compatibility:

    • Supports virtually any ABE or CBE. You can use built-in presets or define any custom editor by specifying its PAM sequence and editing window, allowing immediate use of the latest editors from new publications.
    • AltEx-BE can design sgRNAs for multiple Base-Editors in one run
  • 🚀 Streamlined End-to-End Workflow:

    • Seamlessly moves from data input to candidate selection. The design command generates sgRNAs, while the visualize command creates comprehensive reports to help you evaluate and rank the best candidates for your experiment.

Workflow Diagram

Here is a simplified diagram illustrating the workflow of AltEx-BE:

Installation

To get started with AltEx-BE, clone the repository and install the required dependencies.

# 1. install via bioconda
conda install -c conda-forge -c bioconda altex-be

# 2. install via pypi
pip install AltEx-BE

[!CAUTION] AltEx-BE required python 3.10~ 3.12 So when you face to some install errors, please suspect version conflict.

Required dataset

To use AltEx-BE, you should prepare 2 input files in your computer

  • refFlat file or gtf file of your interest species
    • refflat file contains Refseq infomations: explanation of refFlat format is here
    • you can download refflat files from UCSC goldenpath: refflat files of mm39 is here
    • also you can use GTF file as a input
      • If you use GTF, AltEx-BE automatically converts GTF into refflat format and generate them into output directory
  • Fasta files contain all chromosome sequence of your interest species
    • you can download Fasta file also from UCSC goldenpath
    • please comfirm your .fa files contain all of chromosome. if not, AltEx-BE process will fail
  • (optional) CSV or TXT or TSV contain the gene symbols or Refseq IDs
    • AltEx-BE is avalilable for many genes. When you want to design sgRNAs for many genes, You can input gene list via --gene-file option.
    • The input file should only have 1 column with gene symbols or refseq IDs (No need the header row)

[!NOTE] Point of Gene and RefseqID input

  • When providing a gene symbol (e.g., MYGENE), AltEx-BE will analyze all known transcripts of that gene to identify alternative splicing events.
  • When providing a RefSeq ID (e.g., NM_0012345), AltEx-BE will automatically identify the corresponding gene and analyze all of its transcripts. This ensures a comprehensive analysis even when starting from a single transcript identifier.

Usage

AltEx-BE can be run via a graphical user interface (UI) or directly from the command line.

Graphical User Interface (UI)

For users who prefer a graphical interface, AltEx-BE includes a web-based UI built with Streamlit. It allows you to configure and run the pipeline without using the command line.

To launch the UI, run:

altex-be --ui

The UI helps you:

  • 📥 Select input files (FASTA, transcript annotation)
  • ⚙️ Configure all options for the analysis
  • ▶️ Run the AltEx-BE pipeline locally
  • 📝 Monitor execution logs in real time
  • 📊 Preview and browse output files

Command-Line Interface (CLI)

For command-line usage, AltEx-BE is operated via the altex-be command. Here are a few examples:

1. Using a Preset Editor:

  • By default, AltEx-BE design sgRNAs for below 6 Base Editing Tools

[!NOTE] Preset Base Editors:

base_editor_name pam_sequence editing_window_start editing_window_end base_editor_type
target_aid_ngg NGG 17 19 cbe
be4max_ngg NGG 12 17 cbe
abe8e_ngg NGG 12 17 abe
target_aid_ng NG 17 19 cbe
be4max_ng NG 12 17 cbe
abe8e_ng NG 12 17 abe
altex-be \
    --refflat-path /path/to/your/refFlat.txt \
    --fasta-path /path/to/your/genome.fa \
    --output-dir /path/to/output_directory \
    --gene-symbols MYGENE \
    --assembly-name hg38

2. Input Base Editor Information in the Command Line:

altex-be \
    --refflat-path /path/to/your/refFlat.txt \
    --fasta-path /path/to/your/genome.fa \
    --output-dir /path/to/output_directory \
    --gene-symbols MYGENE \
    --assembly-name hg38 \
    --be-name target-aid \
    --be-type cbe \
    --be-pam NGG \
    --be-start 17 \
    --be-end 19

[!CAUTION] --be-start and --be-end specify the editing window of your base editor. The location of the editing window is counted from the base next to the PAM (1-indexed).

3. Input a CSV/TSV/TXT File Containing Information about Your Base Editors:

You can provide a file containing the information for one or more base editors. This is useful when you want to design sgRNAs for multiple editors at once.

[!CAUTION] The input file should have the following columns: base_editor_name, pam_sequence, editing_window_start, editing_window_end, base_editor_type.

altex-be \
    --refflat-path /path/to/your/refFlat.txt \
    --fasta-path /path/to/your/genome.fa \
    --output-dir /path/to/output_directory \
    --gene-symbols MYGENE \
    --assembly-name hg38 \
    --be-files /path/to/your/base_editor_info.csv

List of command line options

Short Option Long Option Argument Explanation
-h --help Show the help message and exit.
-v --version Show the version of Altex BE.
--ui Launch the Streamlit web UI for AltEx-BE.
-r --refflat-path FILE (Mutually Required -r or -g) Path to the refFlat file.
-g --gtf-path FILE (Mutually Required with -r or -g) Path to the GTF file.
-f --fasta-path FILE (Required) Path to the FASTA file.
-o --output-dir DIR (Required) Directory for the output files.
--gene-symbols SYMBOL [SYMBOL ...] A space-separated list of gene symbols of interest.
--refseq-ids ID [ID ...] A space-separated list of RefSeq IDs of interest.
--gene-file FILE Path to a CSV or TXT file contain your interest gene symbols/RefseqIDs
--run-all_genes store true when user input this option, AltEx-BE design sgRNAs for all genes
-a --assembly-name ASSEMBLY (Required) The name of the genome assembly to use (e.g., hg38, mm39).
-n --be-name NAME The name of the base editor to use.
-p --be-pam SEQUENCE The PAM sequence for the base editor.
-s --be-start INTEGER The start of the editing window for the base editor (1-indexed from the base next to the PAM).
-e --be-end INTEGER The end of the editing window for the base editor (1-indexed from the base next to the PAM).
-t --be-type TYPE The type of base editor (ABE or CBE).
--be-files FILE Path to a CSV or TXT file containing information about one or more base editors.

Format of AltEx-BE output

altex-be makes 2 output files in Path/To/YourOutput/ directory which you specified in --output-dir command

  • Summary sgRNA table (.csv)
    • this table contain imformation of sgRNAs designed and ranked by AltEx-BE
- Meaning of each column is :
column name meaning remark
geneName gene symbol of target gene
chrom location of target gene
strand strand of target gene
exonstart, exonend, exonlength general information of target exon
coding whether target gene is protein coding or non coding gene
frame mod3 of the length of target exon 0 = in-frame or 1,2 = out-frame
exon_position relative location of target exon in target gene "first" or "internal" or "last"
uuid the unique id for each sgRNAs changes in every run
exon_intron_boundary+-25bp_sequence sequence around SA or SD
sgrna_sequence sgRNA sequence Thymine is not replaced by Uracil
sgrna_target_pos_in_seq position of target A or C in sgRNA relative location in sgrna
sgrna_overlap_between_cds_and_editing_window number of overlapping bases with editing window
sgrna_unintended_edited_base_count number of possible being edited bases (A or C) in cds
sgrna_start/end_in_genome location of sgrna
site type target splicing site of sgRNA acceptor or donor
sgrna_strand strand of sgRNA
base_editor_name/pam_sequence/window_start or end / base editor type infomation of BE to design sgRNA
crispr_direct_url link to CRISPR direct
pam+20bp exact match pam+20bp (23-mer) exact match in all chromosome
pam+12bp exact match pam+12bp (12-mer) exact match in all chromosome
sgrna_priority ranking of sgRNA for each target exon ranked by off-target specificity and GC content

sgRNA Prioritization

To facilitate the selection of optimal sgRNAs for experimental validation, AltEx-BE ranks sgRNAs for each target splice site based on predicted off-target binding specificity. The tool prioritizes sgRNAs primarily by the number of exact 20-nucleotide matches (PAM + 20bp) across the genome, selecting those with the fewest potential off-target sites. For sgRNAs with equivalent off-target profiles, GC content is considered as a secondary criterion, favoring sgRNAs within the optimal range of 40-60%. In rare cases where multiple sgRNAs remain equivalent, extended off-target matches (PAM + 12bp) and the number of editable bases within the CDS region are used as final tiebreakers.

  • BED file for UCSC custom track (.bed)
    • this bed file can use as a UCSC custom tracks, you can input that bed file into this webpage
- colored box (red, blue) is sgRNA sequences. red means sgRNAs for abe, blue means sgRNAs for cbe. - score columns in bed file means offtarget count of 20bp+PAM - when you assign bed file, you should choose correct assembly name in above website

License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

altex_be-1.0.9.tar.gz (46.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

altex_be-1.0.9-py3-none-any.whl (51.3 kB view details)

Uploaded Python 3

File details

Details for the file altex_be-1.0.9.tar.gz.

File metadata

  • Download URL: altex_be-1.0.9.tar.gz
  • Upload date:
  • Size: 46.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for altex_be-1.0.9.tar.gz
Algorithm Hash digest
SHA256 c1e19d53f65ebf240c8bac9bf8e9e12a10a2362532b07907e20c17e63401173c
MD5 5fe5a452953e037835e3ad86b0731c53
BLAKE2b-256 1324e4a4f1fec96f212522aec1c59770529414ceab98c22ff1d27def3ad70631

See more details on using hashes here.

File details

Details for the file altex_be-1.0.9-py3-none-any.whl.

File metadata

  • Download URL: altex_be-1.0.9-py3-none-any.whl
  • Upload date:
  • Size: 51.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for altex_be-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 f35c40a40fcf9167dbe2671225171417a502b2149120f2dfe4e955cfcab15cfe
MD5 4367835210be500bb0d1c16c64115b96
BLAKE2b-256 147ec7796d552186d094daf2ba190ad716399f153deb8abc08c7b20311f2c589

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page