A lightweight memory first, model agnostic version of SuPreMo
Project description
supremo_lite
A lightweight memory-first, model-agnostic version of SuPreMo.
Key Features
- 🧬 Personalized Genome Generation: Apply variants from VCF files to reference genomes
- 🎯 Variant-Centered Sequences: Generate sequence windows around variants
- ✂️ PAM Site Analysis: Identify variants that disrupt CRISPR PAM sites
- 🧪 Saturation Mutagenesis: Systematic single-nucleotide mutations at every position for predictive modeling
- 🔧 Memory Efficient: Chunked processing for large VCF files
- 🗺️ Chromosome Matching: Optional handling of chromosome naming differences (chr1 ↔ 1, chrM ↔ MT) via
auto_map_chromosomes=True - ⚡ PyTorch Integration: Automatic tensor support when PyTorch is available
Installation
Install from GitHub (Recommended)
For the latest features and bug fixes:
# Install directly latest release
pip install supremo_lite
# Or install a specific version/tag
pip install git+https://github.com/gladstone-institutes/supremo_lite.git@v0.5.0
# Or install from a specific branch
pip install git+https://github.com/gladstone-institutes/supremo_lite.git@main
Dependencies
Required dependencies will be installed automatically:
pandas- For VCF data handlingnumpy- For numerical operationspyfaidx- For FASTA file reading
Optional dependencies:
torch- For PyTorch tensor support (automatically detected)- https://github.com/gladstone-institutes/brisket - Cython powered faster 1 hot encoding for DNA sequences (automatically detected)
Quick Start
import supremo_lite as sl
from pyfaidx import Fasta
# Load reference genome and variants
reference = Fasta('hg38.fa')
variants = sl.read_vcf('variants.vcf')
DNA Sequence Encoding
supremo_lite uses one-hot encoding by default:
A=[1,0,0,0],C=[0,1,0,0],G=[0,0,1,0],T=[0,0,0,1]- Ambiguous bases =
[0,0,0,0] - Returns PyTorch tensors when available, otherwise NumPy arrays
Personalized Genome Generation
# Apply variants to create personalized genome
personal_genome = sl.get_personal_genome(
reference_fn=reference,
variants_fn=variants,
encode=True, # One-hot encoded (or False for strings)
chunk_size=10000, # Process 10k variants at a time
verbose=True # Show progress
)
# If your VCF uses 'chr1' and reference uses '1', enable chromosome mapping
personal_genome = sl.get_personal_genome(
reference_fn=reference,
variants_fn=variants,
auto_map_chromosomes=True # Handle chromosome name differences
)
📖 Full Guide: Personalized Genomes | Tutorial Notebook
Variant-Centered Sequences
# Generate reference and alternate sequences around variants
# Note: get_alt_ref_sequences is a generator that yields chunks
results = list(sl.get_alt_ref_sequences(
reference_fn=reference,
variants_fn=variants,
seq_len=1000,
encode=True
))
# Unpack from the first chunk
alt_seqs, ref_seqs, metadata = results[0]
# Returns: (n_variants, seq_len, 4) shaped arrays
📖 Full Guide: Variant-Centered Sequences | Getting Started Notebook
Prediction Alignment
# Align model predictions accounting for variant coordinate changes
from supremo_lite.mock_models import TestModel
model = TestModel(n_targets=2, bin_size=8, crop_length=10)
ref_preds = model(ref_seqs)
alt_preds = model(alt_seqs)
ref_aligned, alt_aligned = sl.align_predictions_by_coordinate(
ref_pred=ref_preds[0],
alt_pred=alt_preds[0],
metadata=metadata[0],
prediction_type="1D",
bin_size=8,
crop_length=10
)
📖 Full Guide: Prediction Alignment | Tutorial with Visualizations
Saturation Mutagenesis
# Mutate every position in a region
ref_seq, alt_seqs, metadata = sl.get_sm_sequences(
chrom='chr1',
start=1000,
end=1100, # 100 bp → 300 mutations (3 per position)
reference_fasta=reference
)
Documentation
📚 User Guides
Detailed documentation for each major feature:
- Personalized Genomes - Apply variants to genomes
- Variant-Centered Sequences - Extract sequence windows around variants
- Prediction Alignment - Align model predictions for variant effect analysis
- Saturation Mutagenesis - In-silico mutagenesis workflows
- Variant Classification - Flow chart showing automatic variant classification logic
📓 Interactive Tutorials
Hands-on Jupyter notebooks with visualizations:
- Getting Started - Installation and basic concepts
- Personalized Genomes - Genome personalization workflows
- Prediction Alignment - Complete prediction workflow with visualizations ⭐
🔍 API Reference
Core Functions:
get_personal_genome()- Generate personalized genomesget_alt_ref_sequences()- Generate variant-centered sequencesalign_predictions_by_coordinate()- Align model predictionsget_sm_sequences()- Saturation mutagenesisread_vcf()- Read VCF files
For complete API documentation with all parameters, see the docs/ directory.
Issues and Support
We welcome feedback, bug reports, and feature requests! If you encounter any issues or have suggestions for improvements, please:
- Check existing issues first to see if your problem has already been reported
- File a new issue on our GitHub Issues page
- Provide detailed information including:
- Python version and operating system
- Package version (
supremo_lite.__version__) - Complete error messages and stack traces
- Minimal reproducible example
- Expected vs. actual behavior
Common Issues to Report
- Performance problems with large genomes or variant files
- Unexpected behavior with edge cases
- Documentation gaps or unclear examples
- Feature requests for new functionality
Contributing
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
License
supremo_lite was created by Natalie Gill and Sean Whalen, based on Sequence Mutator for Predictive Models (SuPreMo) by Katie Gjoni. It is licensed under the terms of the MIT license.
Credits
supremo_lite was created with cookiecutter and the py-pkgs-cookiecutter template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file supremo_lite-0.5.4.tar.gz.
File metadata
- Download URL: supremo_lite-0.5.4.tar.gz
- Upload date:
- Size: 62.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
321efe67fe5d6625192cba5043163b4bbfa59fca02025eff1032669fefa20e48
|
|
| MD5 |
89250e36d51bd4448e5aeef8e9cd482f
|
|
| BLAKE2b-256 |
9650299240fe2d55247be727c9aa0c0e9c9a34ea50da4715f2144e1589e0840d
|
Provenance
The following attestation bundles were made for supremo_lite-0.5.4.tar.gz:
Publisher:
test.yml on gladstone-institutes/supremo_lite
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
supremo_lite-0.5.4.tar.gz -
Subject digest:
321efe67fe5d6625192cba5043163b4bbfa59fca02025eff1032669fefa20e48 - Sigstore transparency entry: 653784714
- Sigstore integration time:
-
Permalink:
gladstone-institutes/supremo_lite@1cf406cc566bbf39067e0d44be0c1df4eaba33df -
Branch / Tag:
refs/heads/main - Owner: https://github.com/gladstone-institutes
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
test.yml@1cf406cc566bbf39067e0d44be0c1df4eaba33df -
Trigger Event:
push
-
Statement type:
File details
Details for the file supremo_lite-0.5.4-py3-none-any.whl.
File metadata
- Download URL: supremo_lite-0.5.4-py3-none-any.whl
- Upload date:
- Size: 67.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31a41e6c342a25ac3841165935b30ef9e422ad0a5ce6d8a72146d0496c37c25e
|
|
| MD5 |
4614858addc6bab8fcab256d6c5a2ba5
|
|
| BLAKE2b-256 |
f1dc690d4c268477201574f6a13e7cd1dd83b7ea963640ee6d25a13228b23499
|
Provenance
The following attestation bundles were made for supremo_lite-0.5.4-py3-none-any.whl:
Publisher:
test.yml on gladstone-institutes/supremo_lite
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
supremo_lite-0.5.4-py3-none-any.whl -
Subject digest:
31a41e6c342a25ac3841165935b30ef9e422ad0a5ce6d8a72146d0496c37c25e - Sigstore transparency entry: 653784726
- Sigstore integration time:
-
Permalink:
gladstone-institutes/supremo_lite@1cf406cc566bbf39067e0d44be0c1df4eaba33df -
Branch / Tag:
refs/heads/main - Owner: https://github.com/gladstone-institutes
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
test.yml@1cf406cc566bbf39067e0d44be0c1df4eaba33df -
Trigger Event:
push
-
Statement type: