Skip to main content

Python package to annotate and visualize gene fusions.

Project description

test test Unit tests Code style: black DOI Citations: 35

Annotate Gene Fusion (AGFusion)

Checkout the webapp: https://www.agfusion.app

AGFusion (pronounced 'A G Fusion') is a python package for annotating gene fusions from the human or mouse genomes. AGFusion simply needs the reference genome, the two gene partners, and the fusion junction coordinates as input, and outputs the following:

  • FASTA files of cDNA, CDS, and protein sequences.
  • Visualizes the protein domain and exon architectures of the fusion transcripts.
  • Saves tables listing the coordinates of protein features and exons included in the fusion.
  • Optional exon structure and protein domain visualization of the wild-type version of the fusion gene partners.

Some other things to know:

  • AGFusion automatically predicts the functional effect of the gene fusion (e.g. in-frame, out-of-frame, etc.).
  • Annotation is by default done only for canonical gene isoforms, but there is the option to annotate all gene non-canonical isoform combinations.
  • All gene and protein annotation is from Ensembl
  • Supports up to Ensembl release 115

Table of Contents

Installation

Step 1: Install AGFusion.

pip install agfusion

Step 2: Download your desired pyensembl reference genome database. For example:

For GRCh38/hg38:
pyensembl install --species homo_sapiens --release 115

For GRCh37/hg19:
pyensembl install --species homo_sapiens --release 75

For GRCm38/mm10:
pyensembl install --species mus_musculus --release 87

Step 3: Finally, download your desired AGFusion database.

For GRCh38/hg38:
agfusion download -g hg38

For GRCh37/hg19:
agfusion download -g hg19

For GRCm38/mm10:
agfusion download -g mm10

You can view all supported species and ensembl releases with agfusion download -a.

Dependencies

  • Python 3.7 or higher
  • Python package dependencies are listed in requirements.txt.

Examples

Basic Usage

You just need to provide the two fusion gene partners (gene symbol, Ensembl ID, or Entrez gene ID), their predicted fusion junctions in genomic coordinates, and the genome build. You can also specify certain transcripts with Ensembl transcript ID or RefSeq ID

Example usage from the command line:

agfusion annotate \
  --gene5prime DLG1 \
  --gene3prime BRAF \
  --junction5prime 31684294 \
  --junction3prime 39648486 \
  -db agfusion.mus_musculus.87.db \
  -o DLG1-BRAF

The protein domain structure of the DLG1-BRAF fusion:

alt tag

The exon structure of the DLG1-BRAF fusion:

alt tag

Plotting wild-type protein and exon structure

You can additionally plot the wild-type proteins and exon structures for each gene with --WT flag.

agfusion annotate \
   -g5 ENSMUSG00000022770 \
   -g3 ENSMUSG00000002413 \
   -j5 31684294 \
   -j3 39648486 \
   -db agfusion.mus_musculus.87.db \
   -o DLG1-BRAF \
   --WT

Canonical gene isoforms

By default AGFusion only plots the canonical gene isoforms, but you can tell AGFusion to include non-canonical isoform with the --noncanonical flag.

agfusion annotate \
  -g5 ENSMUSG00000022770 \
  -g3 ENSMUSG00000002413 \
  -j5 31684294 \
  -j3 39648486 \
  -db agfusion.mus_musculus.87.db \
  -o DLG1-BRAF \
  --noncanonical

Input from fusion-finding algorithms

You can provide as input output files from fusion-finding algorithms. Currently supported algorithms are:

Below is an example for FusionCatcher.

agfusion batch \
  -f final-list_candidate-fusion-genes.txt \
  -a fusioncatcher \
  -o test \
  -db agfusion.mus_musculus.87.db

Graphical parameters

You can change domain names and colors:

agfusion annotate \
  -g5 ENSMUSG00000022770 \
  -g3 ENSMUSG00000002413 \
  -j5 31684294 \
  -j3 39648486 \
  -db agfusion.mus_musculus.87.db \
  -o DLG1-BRAF \
  --recolor "Pkinase_Tyr;red" --recolor "L27_1;blue" \
  --rename "Pkinase_Tyr;Kinase" --rename "L27_1;L27"

alt tag

You can rescale the protein length so that images of two different fusions have appropriate relative lengths when plotted side by side:

agfusion annotate \
  -g5 ENSMUSG00000022770 \
  -g3 ENSMUSG00000002413 \
  -j5 31684294 \
  -j3 39648486 \
  -db agfusion.mus_musculus.87.db \
  -o DLG1-BRAF \
  --recolor "Pkinase_Tyr;red" --recolor "L27_1;blue" \
  --rename "Pkinase_Tyr;Kinase" --rename "L27_1;L27" \
  --scale 2000
agfusion annotate \
  -g5 FGFR2 \
  -g3 DNM3 \
  -j5 130167703 \
  -j3 162019992 \
  -db agfusion.mus_musculus.87.db \
  -o FGFR2-DNM3 \
  --recolor "Pkinase_Tyr;red" \
  --rename "Pkinase_Tyr;Kinase" \
  --scale 2000

alt tag alt tag

Building your own database

AGFusion uses a pre-built SQLite database to annotation gene fusions; in addition to data from pyensembl. The SQLite databases are stored on AWS S3.

Follow the steps below if you want to build your own SQLite database:

(1) Install mysqlclient.

(2) Download and unzip the PFAM reference file: https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.clans.tsv.gz

(3) Install your desired pyensembl reference genome. For example: pyensembl install --release 111.

(4) Build the AGFusion database: agfusion build -d . -s homo_sapiens -r 111 --pfam Pfam-A.clans.tsv

Troubleshooting

(1) Problem: I get a warning message like the following:

2017-08-28 15:02:51,377 - AGFusion - WARNING - No cDNA sequence available for AC073283.4! Will not print cDNA sequence for the AC073283.4-MSH2 fusion. You might be working with an outdated pyensembl. Update the package and rerun 'pyensembl install'

Solution: Run the following to update pyensembl package and database:

git clone git@github.com:hammerlab/pyensembl.git
cd pyensembl
sudo pip install .
pyensembl install --release (your-release) --species (your-species)

(2) Problem: Cannot run agfusion download due to URLError. When downloading the database you may run into this error:

urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1108)>

Solution: A potential solution for Mac users is from here. You can run the following command:

/Applications/Python\ 3.9/Install\ Certificates.command

Releasing a New Version

AGFusion uses bump2version to manage versioning. It automatically updates the version in setup.py, commits the change, and creates a git tag. Pushing a tag triggers the PyPI deployment workflow automatically.

Install bump2version:

pip install bump2version

Stable releases

# Patch release (e.g. 1.4.3 -> 1.4.4)
bump2version patch

# Minor release (e.g. 1.4.3 -> 1.5.0)
bump2version minor

# Major release (e.g. 1.4.3 -> 2.0.0)
bump2version major

Pre-releases (alpha / beta / release candidate)

Use --new-version to set the exact pre-release version following PEP 440 conventions (a = alpha, b = beta, rc = release candidate):

# Start an alpha (e.g. next minor alpha)
bump2version --new-version 1.6.0a1 minor

# Increment alpha number
bump2version --new-version 1.6.0a2 minor

# Promote to beta
bump2version --new-version 1.6.0b1 minor

# Promote to release candidate
bump2version --new-version 1.6.0rc1 minor

# Final stable release
bump2version --new-version 1.6.0 minor

Pre-releases are published to PyPI and are installable with:

pip install --pre agfusion

Push to trigger deployment

After bumping, push the commit and tag to GitHub — the CI workflow will publish to PyPI automatically:

git push && git push --tags

License

MIT license

Citing AGFusion

You can cite bioRxiv: http://dx.doi.org/10.1101/080903

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agfusion-1.5.0a1.tar.gz (38.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agfusion-1.5.0a1-py3-none-any.whl (33.6 kB view details)

Uploaded Python 3

File details

Details for the file agfusion-1.5.0a1.tar.gz.

File metadata

  • Download URL: agfusion-1.5.0a1.tar.gz
  • Upload date:
  • Size: 38.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agfusion-1.5.0a1.tar.gz
Algorithm Hash digest
SHA256 6b5c1c2eb8f32843c29be47af60bd2debab46b686aa9ac175ba314975c8f8393
MD5 959268f6256f301402b9c889b119bcf6
BLAKE2b-256 07bdb70762860b0897dbd3214677f6d3951079a74644b991e9459f79eea5bc38

See more details on using hashes here.

File details

Details for the file agfusion-1.5.0a1-py3-none-any.whl.

File metadata

  • Download URL: agfusion-1.5.0a1-py3-none-any.whl
  • Upload date:
  • Size: 33.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agfusion-1.5.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 49e87c1f72db210ca3d507d2cd5f0b17b116fb8bde89c9c5999f8ab530b12aec
MD5 75f93833b00fa8ee14f0634ae6301170
BLAKE2b-256 7a1a17890ca96e6969de6c24f307ce84bd32c36759d35288924a435187195949

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page