Intron classification tool for identifying U2-type and U12-type introns using SVM
Project description
intronIC - (intron Interrogator and Classifier)
intronIC is a bioinformatics tool for extracting and classifying intron sequences as U12-type (minor) or U2-type (major) using a support vector machine trained on position-weight matrix scores.
Quick Start
Installation
pip install intronIC
Basic Usage
# Classify introns (default model loaded automatically)
intronIC -g genome.fa.gz -a annotation.gff3.gz -n species_name -p 8
# Extract sequences only (no classification)
intronIC extract -g genome.fa.gz -a annotation.gff3.gz -n species_name -p 8
# Train a custom model (optional - most users don't need this)
intronIC train -n my_model -p 8
Test Run
# Quick installation test using bundled test data
intronIC test -p 4
# Or show where test data is located
intronIC test --show-only
Documentation
- Changelog - Release notes and version history
For complete documentation, see the intronIC Wiki:
- Quick Start Guide - Installation, dependencies, resource usage
- Overview - Classification approach and scientific background
- Usage Info - Complete CLI reference
- Output Files - File formats and interpretation
- Technical Details - Algorithm and ML architecture
- Example Usage - Common workflows
- About - Background and motivation
What's New in v2.2
- New 8D RBF SVM default model trained on expanded reference data (472 U12 + 30,155 U2 introns)
- Five new classification features: branch point offset, BPS motif sharpness, polypyrimidine tract metrics, and multi-site support scoring
- Reduced false positives: 0 confident false calls in C. elegans (was 2), 1 in Ascaris (was 47)
- See CHANGELOG.md for full release history
Key Features
- RBF SVM classification with probability scores (0-100%) using 8 sequence-derived features
- Default pretrained model loaded automatically — works for virtually all species
- Streaming mode (default) for ~85% memory reduction on large genomes
- Parallel processing for improved performance (
-p 8recommended) - Fast runtimes: ~6-10 minutes for human genome with default settings
- Comprehensive metadata including phase, position, parent gene/transcript
Scientific Background
Most eukaryotic introns (~99.5%) are spliced by the major (U2-type) spliceosome, while a small fraction (~0.5%) are spliced by the minor (U12-type) spliceosome. U12-type introns have:
- Highly conserved TCCTTAAC branch point motif
- Terminal dinucleotides: AT-AC (~25%) or GT-AG (~75%)
- Functional importance and evolutionary conservation
intronIC identifies U12-type introns using:
- PWM Scoring: Apply position-weight matrices to 5' splice site, branch point, and 3' splice site regions
- Normalization: Convert raw scores to z-scores via robust scaling
- Feature Engineering: Compute composite features (multi-site corroboration, BP position, PPT metrics, BPS motif sharpness)
- SVM Classification: RBF SVM ensemble with balanced class weights outputs probability scores
For detailed algorithm description, see the Technical Details wiki page.
Citation
If you use intronIC in your research, please cite:
Devlin C Moyer, Graham E Larue, Courtney E Hershberger, Scott W Roy, Richard A Padgett. Comprehensive database and evolutionary dynamics of U12-type introns. Nucleic Acids Research, Volume 48, Issue 13, 27 July 2020, Pages 7066–7078. https://doi.org/10.1093/nar/gkaa464
Support
- Documentation: intronIC Wiki
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
git clone https://github.com/glarue/intronIC.git
cd intronIC
make install # Set up development environment
make test # Run tests
License
intronIC is released under the GNU General Public License v3.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file intronic-2.2.0.tar.gz.
File metadata
- Download URL: intronic-2.2.0.tar.gz
- Upload date:
- Size: 40.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b50d4c756452cdd6a3faf74eb633da354bba5f55947822607aac1b821e764634
|
|
| MD5 |
451ec282a1e966f7de73c5b2448317a3
|
|
| BLAKE2b-256 |
709631f95f490526c0480c2bb93588c444a3ffd2936e93fb0a4faab8449925bf
|
Provenance
The following attestation bundles were made for intronic-2.2.0.tar.gz:
Publisher:
publish.yml on glarue/intronIC
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
intronic-2.2.0.tar.gz -
Subject digest:
b50d4c756452cdd6a3faf74eb633da354bba5f55947822607aac1b821e764634 - Sigstore transparency entry: 1281022841
- Sigstore integration time:
-
Permalink:
glarue/intronIC@3bfc9fa415dca75bca2f8c44824183c710593076 -
Branch / Tag:
refs/tags/v2.2.0 - Owner: https://github.com/glarue
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3bfc9fa415dca75bca2f8c44824183c710593076 -
Trigger Event:
release
-
Statement type:
File details
Details for the file intronic-2.2.0-py3-none-any.whl.
File metadata
- Download URL: intronic-2.2.0-py3-none-any.whl
- Upload date:
- Size: 39.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b92ebfd5473daa61ac8c4e664c521d1505253bd94899ceb77ba023832f3b747
|
|
| MD5 |
d64047a88bbee42ba2bec83bc746b595
|
|
| BLAKE2b-256 |
60b6ee70b7996945ee9464bbdbf418559c7f7a7ab5706a39e111cbbd0cf0ce84
|
Provenance
The following attestation bundles were made for intronic-2.2.0-py3-none-any.whl:
Publisher:
publish.yml on glarue/intronIC
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
intronic-2.2.0-py3-none-any.whl -
Subject digest:
3b92ebfd5473daa61ac8c4e664c521d1505253bd94899ceb77ba023832f3b747 - Sigstore transparency entry: 1281022845
- Sigstore integration time:
-
Permalink:
glarue/intronIC@3bfc9fa415dca75bca2f8c44824183c710593076 -
Branch / Tag:
refs/tags/v2.2.0 - Owner: https://github.com/glarue
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3bfc9fa415dca75bca2f8c44824183c710593076 -
Trigger Event:
release
-
Statement type: