A lightweight memory first, model agnostic version of SuPreMo

These details have not been verified by PyPI

Project description

supremo_lite

A lightweight memory-first, model-agnostic version of SuPreMo.

Key Features

🧬 Personalized Genome Generation: Apply variants from VCF files to reference genomes
🎯 Variant-Centered Sequences: Generate sequence windows around variants
✂️ PAM Site Analysis: Identify variants that disrupt CRISPR PAM sites
🧪 Saturation Mutagenesis: Systematic single-nucleotide mutations at every position for predictive modeling
🔧 Memory Efficient: Chunked processing for large VCF files
🗺️ Chromosome Matching: Optional handling of chromosome naming differences (chr1 ↔ 1, chrM ↔ MT) via auto_map_chromosomes=True
⚡ PyTorch Integration: Automatic tensor support when PyTorch is available

Installation

Install from GitHub (Recommended)

For the latest features and bug fixes:

# Install directly latest release
pip install supremo_lite

# Or install a specific version/tag
pip install git+https://github.com/gladstone-institutes/supremo_lite.git@v0.5.0

# Or install from a specific branch
pip install git+https://github.com/gladstone-institutes/supremo_lite.git@main

Dependencies

Required dependencies will be installed automatically:

pandas - For VCF data handling
numpy - For numerical operations
pyfaidx - For FASTA file reading

Optional dependencies:

torch - For PyTorch tensor support (automatically detected)
https://github.com/gladstone-institutes/brisket - Cython powered faster 1 hot encoding for DNA sequences (automatically detected)

Quick Start

import supremo_lite as sl
from pyfaidx import Fasta

# Load reference genome and variants
reference = Fasta('hg38.fa')
variants = sl.read_vcf('variants.vcf')

DNA Sequence Encoding

supremo_lite uses one-hot encoding by default:

A = [1,0,0,0], C = [0,1,0,0], G = [0,0,1,0], T = [0,0,0,1]
Ambiguous bases = [0,0,0,0]
Returns PyTorch tensors when available, otherwise NumPy arrays

Personalized Genome Generation

# Apply variants to create personalized genome
personal_genome = sl.get_personal_genome(
    reference_fn=reference,
    variants_fn=variants,
    encode=True,      # One-hot encoded (or False for strings)
    chunk_size=10000, # Process 10k variants at a time
    verbose=True      # Show progress
)

# If your VCF uses 'chr1' and reference uses '1', enable chromosome mapping
personal_genome = sl.get_personal_genome(
    reference_fn=reference,
    variants_fn=variants,
    auto_map_chromosomes=True  # Handle chromosome name differences
)

📖 Full Guide: Personalized Genomes | Tutorial Notebook

Variant-Centered Sequences

# Generate reference and alternate sequences around variants
# Note: get_alt_ref_sequences is a generator that yields chunks
results = list(sl.get_alt_ref_sequences(
    reference_fn=reference,
    variants_fn=variants,
    seq_len=1000,
    encode=True
))
# Unpack from the first chunk
alt_seqs, ref_seqs, metadata = results[0]
# Returns: (n_variants, seq_len, 4) shaped arrays

📖 Full Guide: Variant-Centered Sequences | Getting Started Notebook

Prediction Alignment

# Align model predictions accounting for variant coordinate changes
from supremo_lite.mock_models import TestModel

model = TestModel(n_targets=2, bin_size=8, crop_length=10)
ref_preds = model(ref_seqs)
alt_preds = model(alt_seqs)

ref_aligned, alt_aligned = sl.align_predictions_by_coordinate(
    ref_pred=ref_preds[0],
    alt_pred=alt_preds[0],
    metadata=metadata[0],
    prediction_type="1D",
    bin_size=8,
    crop_length=10
)

📖 Full Guide: Prediction Alignment | Tutorial with Visualizations

Saturation Mutagenesis

# Mutate every position in a region
ref_seq, alt_seqs, metadata = sl.get_sm_sequences(
    chrom='chr1',
    start=1000,
    end=1100,  # 100 bp → 300 mutations (3 per position)
    reference_fasta=reference
)

📖 Full Guide: Mutagenesis

Documentation

📚 User Guides

Detailed documentation for each major feature:

Personalized Genomes - Apply variants to genomes
Variant-Centered Sequences - Extract sequence windows around variants
Prediction Alignment - Align model predictions for variant effect analysis
Saturation Mutagenesis - In-silico mutagenesis workflows
Variant Classification - Flow chart showing automatic variant classification logic

📓 Interactive Tutorials

Hands-on Jupyter notebooks with visualizations:

Getting Started - Installation and basic concepts
Personalized Genomes - Genome personalization workflows
Prediction Alignment - Complete prediction workflow with visualizations ⭐

🔍 API Reference

Core Functions:

get_personal_genome() - Generate personalized genomes
get_alt_ref_sequences() - Generate variant-centered sequences
align_predictions_by_coordinate() - Align model predictions
get_sm_sequences() - Saturation mutagenesis
read_vcf() - Read VCF files

For complete API documentation with all parameters, see the docs/ directory.

Issues and Support

We welcome feedback, bug reports, and feature requests! If you encounter any issues or have suggestions for improvements, please:

Check existing issues first to see if your problem has already been reported
File a new issue on our GitHub Issues page
Provide detailed information including:
- Python version and operating system
- Package version (supremo_lite.__version__)
- Complete error messages and stack traces
- Minimal reproducible example
- Expected vs. actual behavior

Common Issues to Report

Performance problems with large genomes or variant files
Unexpected behavior with edge cases
Documentation gaps or unclear examples
Feature requests for new functionality

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

supremo_lite was created by Natalie Gill and Sean Whalen, based on Sequence Mutator for Predictive Models (SuPreMo) by Katie Gjoni. It is licensed under the terms of the MIT license.

Credits

supremo_lite was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.0

Feb 3, 2026

0.5.5

Nov 18, 2025

This version

0.5.4

Oct 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

supremo_lite-0.5.4.tar.gz (62.9 kB view details)

Uploaded Oct 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

supremo_lite-0.5.4-py3-none-any.whl (67.9 kB view details)

Uploaded Oct 30, 2025 Python 3

File details

Details for the file supremo_lite-0.5.4.tar.gz.

File metadata

Download URL: supremo_lite-0.5.4.tar.gz
Upload date: Oct 30, 2025
Size: 62.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for supremo_lite-0.5.4.tar.gz
Algorithm	Hash digest
SHA256	`321efe67fe5d6625192cba5043163b4bbfa59fca02025eff1032669fefa20e48`
MD5	`89250e36d51bd4448e5aeef8e9cd482f`
BLAKE2b-256	`9650299240fe2d55247be727c9aa0c0e9c9a34ea50da4715f2144e1589e0840d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for supremo_lite-0.5.4.tar.gz:

Publisher: test.yml on gladstone-institutes/supremo_lite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: supremo_lite-0.5.4.tar.gz
- Subject digest: 321efe67fe5d6625192cba5043163b4bbfa59fca02025eff1032669fefa20e48
- Sigstore transparency entry: 653784714
- Sigstore integration time: Oct 30, 2025
Source repository:
- Permalink: gladstone-institutes/supremo_lite@1cf406cc566bbf39067e0d44be0c1df4eaba33df
- Branch / Tag: refs/heads/main
- Owner: https://github.com/gladstone-institutes
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: test.yml@1cf406cc566bbf39067e0d44be0c1df4eaba33df
- Trigger Event: push

File details

Details for the file supremo_lite-0.5.4-py3-none-any.whl.

File metadata

Download URL: supremo_lite-0.5.4-py3-none-any.whl
Upload date: Oct 30, 2025
Size: 67.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for supremo_lite-0.5.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`31a41e6c342a25ac3841165935b30ef9e422ad0a5ce6d8a72146d0496c37c25e`
MD5	`4614858addc6bab8fcab256d6c5a2ba5`
BLAKE2b-256	`f1dc690d4c268477201574f6a13e7cd1dd83b7ea963640ee6d25a13228b23499`

See more details on using hashes here.

Provenance

The following attestation bundles were made for supremo_lite-0.5.4-py3-none-any.whl:

Publisher: test.yml on gladstone-institutes/supremo_lite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: supremo_lite-0.5.4-py3-none-any.whl
- Subject digest: 31a41e6c342a25ac3841165935b30ef9e422ad0a5ce6d8a72146d0496c37c25e
- Sigstore transparency entry: 653784726
- Sigstore integration time: Oct 30, 2025
Source repository:
- Permalink: gladstone-institutes/supremo_lite@1cf406cc566bbf39067e0d44be0c1df4eaba33df
- Branch / Tag: refs/heads/main
- Owner: https://github.com/gladstone-institutes
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: test.yml@1cf406cc566bbf39067e0d44be0c1df4eaba33df
- Trigger Event: push

supremo-lite 0.5.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

supremo_lite

Key Features

Installation

Install from GitHub (Recommended)

Dependencies

Quick Start

DNA Sequence Encoding

Personalized Genome Generation

Variant-Centered Sequences

Prediction Alignment

Saturation Mutagenesis

Documentation

📚 User Guides

📓 Interactive Tutorials

🔍 API Reference

Issues and Support

Common Issues to Report

Contributing

License

Credits

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance