Metrics of immunological foreignness for candidate T-cell epitopes

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

WEIRDO

Widely Estimated Immunological Recognition and Detection of Outliers

A Python library for computing peptide foreignness scores—predicting whether a peptide sequence is likely from a pathogen (bacteria, virus) or from self (human, mammalian).

Overview

WEIRDO trains a multi-layer perceptron (MLP) on k-mer presence data from SwissProt to predict organism category membership. Given any peptide, it outputs:

Category probabilities: likelihood of appearing in human, bacteria, viruses, mammals, etc.
Foreignness score: max(pathogens) / (max(pathogens) + max(self))

Quick Start

from weirdo.scorers import SwissProtReference, MLPScorer

# Load reference data (SwissProt 8-mers with organism labels)
ref = SwissProtReference().load()

# Define organism categories
categories = [
    'archaea', 'bacteria', 'fungi', 'human', 'invertebrates',
    'mammals', 'plants', 'rodents', 'vertebrates', 'viruses'
]

# Get training data: each 8-mer labeled with organism presence
peptides, labels = ref.get_training_data(
    target_categories=categories,
    multi_label=True,
    max_samples=200000  # Optional: sample for faster training
)

# Train the MLP
scorer = MLPScorer(k=8, hidden_layer_sizes=(256, 128, 64))
scorer.train(peptides, labels, target_categories=categories, epochs=200)

# Score new peptides (any length)
df = scorer.predict_dataframe(['MTMDKSEL', 'SIINFEKL', 'NLVPMVATV'])
print(df)

Output:

    peptide  human  viruses  bacteria  mammals  ...  foreignness
   MTMDKSEL   0.82     0.12      0.08     0.79  ...        0.127
   SIINFEKL   0.15     0.73      0.21     0.18  ...        0.802
  NLVPMVATV   0.31     0.68      0.15     0.35  ...        0.660

Installation

pip install weirdo

Download reference data (~2.5 GB compressed / ~7.5 GB uncompressed) for training:

weirdo data download

Training Data

WEIRDO uses pre-computed 8-mer data from SwissProt (~100M unique k-mers):

Category	Description
human	Homo sapiens proteins
rodents	Mouse, rat proteins
mammals	Other mammals (dog, cow, primates, etc.)
vertebrates	Fish, birds, reptiles, amphibians
invertebrates	Insects, worms, mollusks
bacteria	Bacterial proteins
viruses	Viral proteins
archaea	Archaeal proteins
fungi	Fungal proteins
plants	Plant proteins

Each 8-mer has True/False labels for each category, indicating whether it appears in proteins from that organism group.

Feature Extraction

The MLP uses 592 features extracted from each peptide:

Amino Acid Properties (48 features)

12 physicochemical properties × 4 statistics (mean, std, min, max):

Hydropathy, hydrophilicity
Mass, volume
Polarity, pK side chain
Accessible surface area (folded/unfolded)
Local flexibility, refractivity
Solvent exposed area, % exposed residues

Structural Features (27 features)

Secondary structure propensities (12): helix, sheet, turn × 4 stats
Category fractions (9): positive/negative charged, hydrophobic, aromatic, aliphatic, polar, tiny, small, cysteine
Charge features (4): net charge, charge transitions, max cluster, R/(R+K) ratio
Disorder features (2): disorder/order promoting fractions

Composition Features (420 features)

Amino acid frequencies (20): fraction of each amino acid
Dipeptide frequencies (400): fraction of each amino acid pair

Sequence Statistics (12 features)

Length, log-length, sqrt-length
Unknown fraction, unique AA fraction
Max run length, repeat fraction
Entropy/complexity (entropy, effective AAs, Gini, top-2/maximum frequency)

Reduced Alphabet Frequencies (80 features)

Composition across common reduced alphabets (Murphy, GBMR, SDM, etc.)

Dipeptide Summary (5 features)

Entropy, Gini, max/top2 frequency, homodipeptide fraction

API Reference

Training

from weirdo.scorers import SwissProtReference, MLPScorer

# Load reference
ref = SwissProtReference().load()

# Get training data
peptides, labels = ref.get_training_data(
    target_categories=['human', 'viruses', 'bacteria', 'mammals'],
    multi_label=True,
    max_samples=100000  # Optional: limit for memory
)

# Train
scorer = MLPScorer(
    k=8,
    hidden_layer_sizes=(256, 128, 64),
    activation='relu',
    alpha=0.0001,  # L2 regularization
)
scorer.train(
    peptides, labels,
    target_categories=['human', 'viruses', 'bacteria', 'mammals'],
    epochs=200,
    learning_rate=0.001
)

Prediction

# Category probabilities (sigmoid-activated)
probs = scorer.predict_proba(['MTMDKSEL'])
# Shape: (1, n_categories)

# Foreignness score
foreign = scorer.foreignness(
    ['MTMDKSEL'],
    pathogen_categories=['bacteria', 'viruses'],
    self_categories=['human', 'mammals', 'rodents']
)
# Returns: max(pathogens) / (max(pathogens) + max(self))

# Full DataFrame output (handles variable-length peptides)
df = scorer.predict_dataframe(['MTMDKSEL', 'SIINFEKL', 'NLVPMVATV'])

Feature Extraction

# Extract features as DataFrame
df = scorer.features_dataframe(['MTMDKSEL', 'SIINFEKL'])
# Shape: (2, 593) - 592 features + peptide column

# Feature names
names = scorer.get_feature_names()
# ['hydropathy_mean', 'hydropathy_std', ..., 'dipep_YY']

Model Persistence

from weirdo import save_model, load_model, list_models

# Save trained model
save_model(scorer, 'my-foreignness-model')

# List saved models
for model in list_models():
    print(f"{model.name}: {model.scorer_type}")

# Load model
scorer = load_model('my-foreignness-model')

CLI

# Data management
weirdo data download        # Download SwissProt reference
weirdo data list            # Show data status

# Model management
weirdo models list          # List trained models
weirdo models train --data train.csv --name my-model
weirdo models info my-model # Show model details
weirdo models available     # List built-in downloadable pretrained models
weirdo models download NAME # Download pretrained weights by name
weirdo models download --url https://.../model.tar.gz --save-as my-model

# Scoring
weirdo score --model my-model MTMDKSEL SIINFEKL

Long-Run Training on Modal

Use scripts/train_modal_long_run.py to run long remote training and export weights as a .tar.gz archive:

# One-time setup: seed full SwissProt CSV into Modal data volume
modal volume put weirdo-data-cache data/swissprot-8mers.csv downloads/swissprot-8mers.csv --force

modal run scripts/train_modal_long_run.py \
  --model-name swissprot-mlp-modal \
  --epochs 1000 \
  --max-samples 2000000 \
  --output-archive ./swissprot-mlp-modal.tar.gz

The script:

trains MLPScorer remotely
saves model files to a Modal volume
packages weights into MODEL_NAME.tar.gz
can return archive bytes to your local machine (--output-archive)
reads SwissProt from --swissprot-path (default: /root/.weirdo/downloads/swissprot-8mers.csv)
treats --max-samples 0 (default) as "use all available rows"

Distributing Model Weights

Recommended flow:

Train and export MODEL_NAME.tar.gz (Modal script above).
Upload archive to a GitHub Release asset (or other HTTPS hosting).
Download after install via CLI:

# Direct URL (no code change needed)
weirdo models download --url https://github.com/ORG/REPO/releases/download/vX.Y.Z/MODEL_NAME.tar.gz --save-as MODEL_NAME

If you want named built-in downloads (weirdo models download MODEL_NAME), add an entry to PRETRAINED_MODELS in weirdo/model_manager.py and release a new package version.

Architecture

weirdo/
├── scorers/
│   ├── mlp.py          # MLPScorer with feature extraction
│   ├── swissprot.py    # SwissProtReference (training data)
│   ├── config.py       # Presets and configuration
│   ├── registry.py     # Scorer registry
│   └── trainable.py    # TrainableScorer base class
├── model_manager.py    # Save/load trained models
├── amino_acid_properties.py  # 12 AA property dictionaries
└── api.py              # High-level functions

Citation

@software{weirdo,
  title = {WEIRDO: Widely Estimated Immunological Recognition and Detection of Outliers},
  author = {PIRL-UNC},
  url = {https://github.com/pirl-unc/weirdo}
}

License

Apache License 2.0. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

This version

2.1.3

Apr 6, 2026

2.1.2

Feb 6, 2026

2.1.1

Feb 5, 2026

2.1.0

Feb 5, 2026

1.0.0

Oct 14, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weirdo-2.1.3.tar.gz (91.7 kB view details)

Uploaded Apr 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

weirdo-2.1.3-py3-none-any.whl (87.8 kB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file weirdo-2.1.3.tar.gz.

File metadata

Download URL: weirdo-2.1.3.tar.gz
Upload date: Apr 6, 2026
Size: 91.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for weirdo-2.1.3.tar.gz
Algorithm	Hash digest
SHA256	`b0e17a9e78ff6e0b737504692c206af9b2fb5b714d3b7b3125efb22f151c59a3`
MD5	`3253418139aea1624bd572d9730260bc`
BLAKE2b-256	`ddf252cb382568fe4a7124a7c3a3e8e706c582d3dbd0a1a70e3ed0b64de98b89`

See more details on using hashes here.

File details

Details for the file weirdo-2.1.3-py3-none-any.whl.

File metadata

Download URL: weirdo-2.1.3-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 87.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for weirdo-2.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d38d52fc8484a950d9c95376ea93ac0121d9feb476a625feadf2ff3f25872526`
MD5	`1e89b5f96e1c07a47eebadf228e0ef28`
BLAKE2b-256	`d4be002b2c7ee2e53f6f78ab6aca37359252d941634ffbb560e44c1e728b5fa7`

See more details on using hashes here.

weirdo 2.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

WEIRDO

Overview

Quick Start

Installation

Training Data

Feature Extraction

Amino Acid Properties (48 features)

Structural Features (27 features)

Composition Features (420 features)

Sequence Statistics (12 features)

Reduced Alphabet Frequencies (80 features)

Dipeptide Summary (5 features)

API Reference

Training

Prediction

Feature Extraction

Model Persistence

CLI

Long-Run Training on Modal

Distributing Model Weights

Architecture

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes