No project description provided

These details have not been verified by PyPI

Project description

Paraplume is a sequence-based paratope prediction method. It predicts which amino acids in an antibody sequence are likely to interact with an antigen during binding. Concretely, given an amino acid sequence, the model returns a probability for each residue indicating the likelihood of antigen interaction.

📖 HOW IT WORKS

Paraplume uses supervised learning and involves three main steps:

Labelling: Antibody sequences are annotated with paratope labels using structural data from SAbDab.
Sequence representation: Each amino acid is embedded into a high-dimensional vector using Protein Language Model (PLM) embeddings.
Model training: A Multi-Layer Perceptron (MLP) is trained to minimize Binary Cross-Entropy Loss, using PLM embeddings as inputs and paratope labels as targets.

The full workflow of Paraplume is summarized Figure B below:

Summary

⚙️ INSTALLATION

It is available on PyPI and can be installed through pip.

pip install paraplume

We recommend installing it in a virtual environment with python >= 3.10.

💻 COMMAND LINE

We provide several commands to use the model as inference with the default weights or retrain the model with a custom dataset. All commands can be run with cpu or gpu, if available (cf gpu option).

paraplume-infer provides two commands, one to infer the paratope from a unique sequence (seq-to-paratope) and another from a batch of sequences in the form of a csv file (file-to-paratope).

paraplume-infer COMMAND [OPTIONS][ARGS] ...

By default the model used is trained using the 'expanded' dataset from the Paragraph paper, that we divided in 1000 sequences for the training set and 85 sequences for the validation and available in ./datasets/. PDB 4FQI was excluded from the train and validation sets as we analyze variants of this antibody in our paper using the trained model.

However we also provide the possibility to use a custom model for inference. To train your custom model you will need to run two commands: paraplume-create-dataset to generate labels and PLM embeddings for your desired training dataset, and paraplume-train to train the model.

After training the model on your custom dataset, the model is saved in a folder whose path can be given to the inference commands as a --custom-model option.

📋 Commands

1. paraplume-infer seq-to-paratope

Predict paratope directly from amino acid sequences provided as command line arguments.

Usage

paraplume-infer seq-to-paratope [OPTIONS]

Options

Option	Type	Default	Description
`-h, --heavy-chain`	TEXT	-	Heavy chain amino acid sequence
`-l, --light-chain`	TEXT	-	Light chain amino acid sequence
`--custom-model`	PATH	None	Path to custom trained model folder
`--gpu`	INT	0	Choose index of GPU device to use if multiple GPUs available. By default it's the first one (index 0). -1 forces cpu usage. If no GPU is available, CPU is used
`--large/--small`	flag	--large	Use default Paraplume which uses the 6 PLMs AbLang2,Antiberty,ESM,ProtT5,IgT5 and IgBert (--large) or the smallest version using only ESM-2 embeddings (--small)

Examples

Both chains:

paraplume-infer seq-to-paratope \
  -h QAYLQQSGAELVKPGASVKMSCKASDYTFTNYNMHWIKQTPGQGLEWIGAIYPGNGDTSYNQKFKGKATLTADKSSSTAYMQLSSLTSEDSAVYYCASLGSSYFDYWGQGTTLTVSS \
  -l EIVLTQSPTTMAASPGEKITITCSARSSISSNYLHWYQQKPGFSPKLLIYRTSNLASGVPSRFSGSGSGTSYSLTIGTMEAEDVATYYCHQGSNLPFTFGSGTKLEIK

Heavy chain only:

paraplume-infer seq-to-paratope \
  -h QAYLQQSGAELVKPGASVKMSCKASDYTFTNYNMHWIKQTPGQGLEWIGAIYPGNGDTSYNQKFKGKATLTADKSSSTAYMQLSSLTSEDSAVYYCASLGSSYFDYWGQGTTLTVSS

Light chain only:

paraplume-infer seq-to-paratope \
  -l EIVLTQSPTTMAASPGEKITITCSARSSISSNYLHWYQQKPGFSPKLLIYRTSNLASGVPSRFSGSGSGTSYSLTIGTMEAEDVATYYCHQGSNLPFTFGSGTKLEIK

2. paraplume-infer file-to-paratope

Predict paratope from sequences stored in a CSV file.

Usage

paraplume-infer file-to-paratope [OPTIONS] FILE_PATH

Arguments

Argument	Type	Required	Description
`FILE_PATH`	PATH	✓	Path to input CSV file

Options

Option	Type	Default	Description
`--custom-model`	PATH	None	Path to custom trained model folder
`--name`	TEXT	paratope_	Prefix for output file
`--gpu`	INT	0	Choose index of GPU device to use if multiple GPUs available. By default it's the first one (index 0). -1 forces cpu usage. If no GPU is available, CPU is used
`--result-folder, -r`	PATH	None	Folder path where to save the results. If not passed the result is saved in the input data folder
`--emb-proc-size`	INT	100	Embedding batch size for memory management
`--compute-sequence-embeddings`	flag	False	Compute both paratope and classical sequence embeddings for each sequence and each of the 6 PLMs AbLang2, Antiberty, ESM, ProtT5, IgT5 and IgBert. Only possible when using the default trained_models/large
`--single-chain`	flag	False	Process single chain sequences
`--large/--small`	flag	--large	Use default Paraplume which uses the 6 PLMs AbLang2,Antiberty,ESM,ProtT5,IgT5 and IgBert (--large) or the smallest version using only ESM-2 embeddings (--small)

Examples

Paired chains:

paraplume-infer file-to-paratope ./tutorial/paired.csv

Heavy chain only:

paraplume-infer file-to-paratope ./tutorial/heavy.csv --single-chain

Light chain only:

paraplume-infer file-to-paratope ./tutorial/light.csv --single-chain

Sample input files are available in the tutorial folder.

Input

Your CSV file must include these columns (any additional column is fine):

For paired chains (default):

sequence_heavy	sequence_light
QAYLQQSGAELVKPGASVKMSCKASDYTFTNYNMHWIKQTPGQGLEWIGAIYPGNGDTSYNQKFKGKATLTADKSSSTAYMQLSSLTSEDSAVYYCASLGSSYFDYWGQGTTLTVSS	EIVLTQSPTTMAASPGEKITITCSARSSISSNYLHWYQQKPGFSPKLLIYRTSNLASGVPSRFSGSGSGTSYSLTIGTMEAEDVATYYCHQGSNLPFTFGSGTKLEIK
EVQLVESGGGLVQPGGSLRLSCAASGFTFSRYAMSWVRQAPGKGLEWVSVISSGGSYTYYADSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCAKDREYRYYYYGMDVWGQGTTVTVSS	DIQMTQSPSSLSASVGDRVTITCRASQGISSWLAWYQQKPGKAPKLLIYDASSLESGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQYGSSPPYTFGQGTKLEIK

For single heavy chain (use --single-chain):

sequence_heavy	sequence_light
QAYLQQSGAELVKPGASVKMSCKASDYTFTNYNMHWIKQTPGQGLEWIGAIYPGNGDTSYNQKFKGKATLTADKSSSTAYMQLSSLTSEDSAVYYCASLGSSYFDYWGQGTTLTVSS
EVQLVESGGGLVQPGGSLRLSCAASGFTFSRYAMSWVRQAPGKGLEWVSVISSGGSYTYYADSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCAKDREYRYYYYGMDVWGQGTTVTVSS

For single light chain (use --single-chain):

sequence_heavy	sequence_light
	EIVLTQSPTTMAASPGEKITITCSARSSISSNYLHWYQQKPGFSPKLLIYRTSNLASGVPSRFSGSGSGTSYSLTIGTMEAEDVATYYCHQGSNLPFTFGSGTKLEIK
	DIQMTQSPSSLSASVGDRVTITCRASQGISSWLAWYQQKPGKAPKLLIYDASSLESGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQYGSSPPYTFGQGTKLEIK

Output

Creates a pickle file (e.g., ./tutorial/paratope_paired.pkl) containing:

model_prediction_heavy - Paratope predictions for heavy chains
model_prediction_light - Paratope predictions for light chains

Reading results:

import pandas as pd
predictions = pd.read_pickle("./tutorial/paratope_paired.pkl")
print(predictions.head())

3. paraplume-create-dataset

Create dataset to train the neural network. Sequences and labels are saved in a .json file, and LPLM embeddings are saved in a .pt file.

Usage

paraplume-create-dataset [OPTIONS] CSV_FILE_PATH PDB_FOLDER_PATH

Arguments

Argument	Type	Required	Description
`CSV_FILE_PATH`	PATH	✓	Path of csv file to use for pdb list
`PDB_FOLDER_PATH`	PATH	✓	Pdb folder path for ground truth labeling

Options

Option	Type	Default	Description
`--result-folder, -r`	PATH	result	Where to save results
`--emb-proc-size`	INTEGER	100	We create embeddings chunk by chunk to avoid memory explosion. This is the chunk size. Optimal value depends on your computer
`--gpu`	INTEGER	0	Choose index of GPU device to use if multiple GPUs available. By default it's the first one (index 0). -1 forces cpu usage. If no GPU is available, CPU is used
`--single-chain`	flag	False	Generate embeddings using llms on single chain mode, which slightly increases performance

Example

paraplume-create-dataset ./tutorial/custom_train_set.csv pdb_folder \
  -r training_data \
  --gpu 0 \
  --emb-proc-size 50 \
  --single-chain

Input

custom_train_set.csv contains information about the PDB files used for training and has the following format:

pdb	Lchain	Hchain	antigen_chain
1ahw	D	E	F
1bj1	L	H	W
1ce1	L	H	P

Column descriptions:

pdb: PDB code of the antibody-antigen complex (should be available in pdb_folder as pdb_folder/pdb_code.pdb)
Lchain: Light chain identifier used to label the paratope
Hchain: Heavy chain identifier used to label the paratope
antigen_chain: Antigen chain identifier used to label the paratope

Output

Creates a folder with the same name custom_train_set inside training_data, in which there are two files, json.dict with the sequences and labels, and embeddings.pt for the PLM embeddings.

4. paraplume-train

Train the model given provided parameters and data.

Usage

paraplume-train [OPTIONS] TRAIN_FOLDER_PATH VAL_FOLDER_PATH

Arguments

Argument	Type	Required	Description
`TRAIN_FOLDER_PATH`	PATH	✓	Path of train folder
`VAL_FOLDER_PATH`	PATH	✓	Path of val folder

Options

Option	Type	Default	Description
`--lr`	FLOAT	0.001	Learning rate to use for training
`--n_epochs, -n`	INTEGER	1	Number of epochs to use for training
`--result-folder, -r`	PATH	result	Where to save results
`--pos-weight`	FLOAT	1	Weight to give to positive labels
`--batch-size, -bs`	INTEGER	10	Batch size
`--mask-prob`	FLOAT	0	Probability with which to mask each embedding coefficient
`--dropouts`	TEXT	0	Dropout probabilities for each hidden layer, separated by commas. Example '0.3,0.3'
`--dims`	TEXT	1000	Dimensions of hidden layers. Separated by commas. Example '100,100'
`--override`	flag	False	Override results
`--seed`	INTEGER	0	Seed to use for training
`--l2-pen`	FLOAT	0	L2 penalty to use for the model weights
`--alphas`	TEXT	-	Whether to use different alphas labels to help main label
`--patience`	INTEGER	0	Patience to use for early stopping. 0 means no early stopping
`--emb-models`	TEXT	all	LLM embedding models to use, separated by commas. LLMs should be in 'ablang2','igbert','igT5','esm','antiberty','prot-t5','all'. Example 'igT5,esm'
`--gpu`	INTEGER	0	Choose index of GPU device to use if multiple GPUs available. By default it's the first one (index 0). -1 forces cpu usage. If no GPU is available, CPU is used

Example

paraplume-train training_data/custom_train_set training_data/custom_val_set \
  --lr 0.001 \
  -n 50 \
  -r training_results \
  --batch-size 32 \
  --dims 512,256 \
  --dropouts 0.2,0.1 \
  --patience 5 \
  --emb-models igT5,esm \
  --gpu 0

Input

The two arguments (training_data/custom_train_set and training_data/custom_val_set in the example) are paths of folders created by the previous paraplume-create-dataset command.

Output

Model weights and training parameters are saved in a folder specified by the -r option (training_results in the example, results by default).

The resulting trained model can then be used at inference by passing the output folder path as the --custom-model argument of the inference commands (see inference command lines).

🚀 TUTORIALS

Command Line Tutorial

If you want to use the default model with the already trained weights, just install the package and run paraplume-infer file-to-paratope ./tutorial/paired.csv and the result will be available as paratope_paired.pkl in the same tutorial folder.

If you want to train and use your custom model via command line, follow the 4 steps below.

Step 0: Set up

Clone repository
Make sure you are in Paraplume.
Install the package in your favorite virtual environment with pip install paraplume
Download PDB files from SabDab using IMGT format and save them in ./all_structures/imgt.

Step 1: Create training and validation datasets from CSVs

paraplume-create-dataset ./tutorial/custom_train_set.csv ./all_structures/imgt -r custom_folder

The folder custom_folder will be created. Inside this folder the folder custom_train_set is created in which there are two files, dict.json for the sequences and labels, and embeddings.pt for the PLM embeddings. Repeat for the validation set (used for early stopping):

paraplume-create-dataset ./tutorial/custom_val_set.csv ./all_structures/imgt -r custom_folder

Step 2: Train the model

paraplume-train ./custom_folder/custom_train_set ./custom_folder/custom_val_set \
  --lr 0.001 \
  -n 50 \
  --batch-size 8 \
  --dims 512,256 \
  --dropouts 0.2,0.1 \
  --patience 5 \
  --emb-models igT5,esm \
  --gpu 0 \
  -r ./custom_folder

This will save training results in custom_folder. checkpoint.pt contains the weights of the model, summary_dict.json contains the parameters used for training, and summary_plot.png some plots showing the training process.

Step 3: Use the trained custom model for inference

After training, your custom model will be saved in the results folder and can be used with inference commands using the --custom-model option.

paraplume-infer file-to-paratope ./tutorial/paired.csv --custom-model ./custom_folder

And the result is available as paratope_paired.pkl in the tutorial folder !!

Python Tutorial

A comprehensive Python tutorial for default inference usage (using the already trained weights) with examples is available in the tutorial folder.

If you want to use to train and use your custom model, follow the command line tutorial, or use the code available in paraplume/create_dataset.py and paraplume/train.py (function main in both files). Don't hesitate to contact me if you need help gabrielathenes@gmail.com.

⚡ QUICK START

Install: pip install paraplume
Single sequence: paraplume-infer seq-to-paratope -h YOUR_HEAVY_CHAIN -l YOUR_LIGHT_CHAIN
File batch: paraplume-infer file-to-paratope your_file.csv

For detailed usage, expand the sections above! 👆

📧 Contact

Any issues or questions should be addressed to us at gabrielathenes@gmail.com.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.1.1

Dec 27, 2025

1.1.0

Dec 26, 2025

This version

1.0.0

Oct 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paraplume-1.0.0.tar.gz (66.0 MB view details)

Uploaded Oct 4, 2025 Source

File details

Details for the file paraplume-1.0.0.tar.gz.

File metadata

Download URL: paraplume-1.0.0.tar.gz
Upload date: Oct 4, 2025
Size: 66.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for paraplume-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`9fef95a402d24a785f5a32736243bac133ac483439b33b2b9411a308ec14f919`
MD5	`f42e5c46c852043466dffac4572c0ccd`
BLAKE2b-256	`3baa4712bbd36a3e9dc402baa5f29eed8863db3d1ff1328f801f6d376ec434e7`

See more details on using hashes here.

paraplume 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

📖 HOW IT WORKS

⚙️ INSTALLATION

💻 COMMAND LINE

📋 Commands

1. paraplume-infer seq-to-paratope

Usage

Options

Examples

2. paraplume-infer file-to-paratope

Usage

Arguments

Options

Examples

Input

Output

3. paraplume-create-dataset

Usage

Arguments

Options

Example

Input

Output

4. paraplume-train

Usage

Arguments

Options

Example

Input

Output

🚀 TUTORIALS

Command Line Tutorial

Step 0: Set up

Step 1: Create training and validation datasets from CSVs

Step 2: Train the model

Step 3: Use the trained custom model for inference

Python Tutorial

⚡ QUICK START

📧 Contact

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes