Skip to main content

Intron classification tool for identifying U2-type and U12-type introns using SVM

Project description

intronIC_logo

intronIC - (intron Interrogator and Classifier)

intronIC is a bioinformatics tool for extracting and classifying intron sequences as U12-type (minor) or U2-type (major) using a support vector machine trained on position-weight matrix scores.


Quick Start

Installation

pip install intronIC

Basic Usage

# Classify introns (default model loaded automatically)
intronIC -g genome.fa.gz -a annotation.gff3.gz -n species_name -p 8

# Extract sequences only (no classification)
intronIC extract -g genome.fa.gz -a annotation.gff3.gz -n species_name -p 8

# Train a custom model (optional - most users don't need this)
intronIC train -n my_model -p 8

Test Run

# Quick installation test using bundled test data
intronIC test -p 4

# Or show where test data is located
intronIC test --show-only

Documentation

  • Changelog - Release notes and version history

For complete documentation, see the intronIC Wiki:


What's New in v2.2

  • New 8D RBF SVM default model trained on expanded reference data (472 U12 + 30,155 U2 introns)
  • Five new classification features: branch point offset, BPS motif sharpness, polypyrimidine tract metrics, and multi-site support scoring
  • Reduced false positives: 0 confident false calls in C. elegans (was 2), 1 in Ascaris (was 47)
  • See CHANGELOG.md for full release history

Key Features

  • RBF SVM classification with probability scores (0-100%) using 8 sequence-derived features
  • Default pretrained model loaded automatically — works for virtually all species
  • Streaming mode (default) for ~85% memory reduction on large genomes
  • Parallel processing for improved performance (-p 8 recommended)
  • Fast runtimes: ~6-10 minutes for human genome with default settings
  • Comprehensive metadata including phase, position, parent gene/transcript

Scientific Background

Most eukaryotic introns (~99.5%) are spliced by the major (U2-type) spliceosome, while a small fraction (~0.5%) are spliced by the minor (U12-type) spliceosome. U12-type introns have:

  • Highly conserved TCCTTAAC branch point motif
  • Terminal dinucleotides: AT-AC (~25%) or GT-AG (~75%)
  • Functional importance and evolutionary conservation

intronIC identifies U12-type introns using:

  1. PWM Scoring: Apply position-weight matrices to 5' splice site, branch point, and 3' splice site regions
  2. Normalization: Convert raw scores to z-scores via robust scaling
  3. Feature Engineering: Compute composite features (multi-site corroboration, BP position, PPT metrics, BPS motif sharpness)
  4. SVM Classification: RBF SVM ensemble with balanced class weights outputs probability scores

For detailed algorithm description, see the Technical Details wiki page.


Citation

If you use intronIC in your research, please cite:

Devlin C Moyer, Graham E Larue, Courtney E Hershberger, Scott W Roy, Richard A Padgett. Comprehensive database and evolutionary dynamics of U12-type introns. Nucleic Acids Research, Volume 48, Issue 13, 27 July 2020, Pages 7066–7078. https://doi.org/10.1093/nar/gkaa464


Support


Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

git clone https://github.com/glarue/intronIC.git
cd intronIC
make install    # Set up development environment
make test       # Run tests

License

intronIC is released under the GNU General Public License v3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intronic-2.2.0.tar.gz (40.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

intronic-2.2.0-py3-none-any.whl (39.9 MB view details)

Uploaded Python 3

File details

Details for the file intronic-2.2.0.tar.gz.

File metadata

  • Download URL: intronic-2.2.0.tar.gz
  • Upload date:
  • Size: 40.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for intronic-2.2.0.tar.gz
Algorithm Hash digest
SHA256 b50d4c756452cdd6a3faf74eb633da354bba5f55947822607aac1b821e764634
MD5 451ec282a1e966f7de73c5b2448317a3
BLAKE2b-256 709631f95f490526c0480c2bb93588c444a3ffd2936e93fb0a4faab8449925bf

See more details on using hashes here.

Provenance

The following attestation bundles were made for intronic-2.2.0.tar.gz:

Publisher: publish.yml on glarue/intronIC

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file intronic-2.2.0-py3-none-any.whl.

File metadata

  • Download URL: intronic-2.2.0-py3-none-any.whl
  • Upload date:
  • Size: 39.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for intronic-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3b92ebfd5473daa61ac8c4e664c521d1505253bd94899ceb77ba023832f3b747
MD5 d64047a88bbee42ba2bec83bc746b595
BLAKE2b-256 60b6ee70b7996945ee9464bbdbf418559c7f7a7ab5706a39e111cbbd0cf0ce84

See more details on using hashes here.

Provenance

The following attestation bundles were made for intronic-2.2.0-py3-none-any.whl:

Publisher: publish.yml on glarue/intronIC

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page