A Python package for working with protein sequence and PTM
Project description
SEQUAL / seq=
Sequal is a Python package for in-silico generation of modified sequences from a sequence input and modifications. It is designed to assist in protein engineering, mass spectrometry analysis, drug design, and other bioinformatics research.
Features
- Full support for ProForma 2.1 standard for proteoform notation
- Generate all possible sequences with static and variable modifications
- Flexible modification handling with support for:
- Formula validation for chemical modifications
- Charged formulas (positive and negative charges)
- Glycan structure validation
- Custom monosaccharides in glycan compositions
- Info tags and metadata
- Observed mass recording
- Named entities (peptidoform, peptidoform ion, and compound ion names)
- Terminal global modifications (N-term, C-term, and terminal-specific)
- Placement controls for unknown position modifications (Position, Limit, CoMKP, CoMUP)
- Ion notation with semantic detection (a-type, b-type, c-type, x-type, y-type, z-type)
- Indexing and slicing for convenient access to modification values
- Support for custom modification annotations
- Sequence ambiguity representation
- Utilities for mass spectrometry fragment generation
- Labile and non-labile ion simulation
Installation
To install Sequal, use pip:
pip install sequal
Usage
ProForma 2.1 Support
Sequal supports the ProForma 2.1 standard for proteoform notation, which provides a standardized way to represent protein sequences with their modifications.
Parsing ProForma notation
from sequal.sequence import Sequence
# Basic ProForma notation with modification
seq = Sequence.from_proforma("ELVIS[Phospho]K")
print(seq.seq[4].value) # S
print(seq.seq[4].mods[0].synonyms[0]) # Phospho
# ProForma with terminal modifications
seq = Sequence.from_proforma("[Acetyl]-PEPTIDE-[Amidated]")
print(seq.mods[-1][0].synonyms[0]) # Acetyl (N-terminal)
print(seq.mods[-2][0].synonyms[0]) # Amidated (C-terminal)
# ProForma with global modifications
seq = Sequence.from_proforma("<[Carbamidomethyl]@C>PEPTCDE")
print(seq.global_mods[0].synonyms[0]) # Carbamidomethyl
print(seq.global_mods[0].target_residues) # ['C']
# ProForma with sequence ambiguity
seq = Sequence.from_proforma("PEPT(?DE|ID)E")
print(seq.sequence_ambiguities[0].sequence) # DE|ID
print(seq.sequence_ambiguities[0].position) # 4
Working with information tags
from sequal.sequence import Sequence
# ProForma with info tags
seq = Sequence.from_proforma("ELVIS[Phospho|INFO:newly discovered]K")
mod = seq.seq[4].mods[0]
print(mod.synonyms[0]) # Phospho
print(mod.info_tags[0]) # newly discovered
# Multiple info tags
seq = Sequence.from_proforma("PEPTIDE-[Amidated|INFO:Common C-terminal mod|INFO:Added manually]")
mod = seq.mods[-2][0] # C-terminal modification
print(mod.synonyms[0]) # Amidated
print(mod.info_tags) # ['Common C-terminal mod', 'Added manually']
Joint representation of experimental data and interpretation
from sequal.sequence import Sequence
# ProForma with joint interpretation and mass
seq = Sequence.from_proforma("ELVIS[U:Phospho|+79.966331]K")
mod = seq.seq[4].mods[0]
print(mod.mod_value.pipe_values[0].value) # Phospho
print(mod.mod_value.pipe_values[0].source) # U
print(mod.mod_value.pipe_values[1].mass) # 79.966331
# ProForma with observed mass
seq = Sequence.from_proforma("ELVIS[Phospho|Obs:+79.978]K")
mod = seq.seq[4].mods[0]
print(mod.synonyms[0]) # Phospho
print(mod.mod_value.pipe_values[1].observed_mass) # 79.978
# Complex case with synonyms, observed mass and info tags
seq = Sequence.from_proforma("ELVIS[Phospho|O-phospho-L-serine|Obs:+79.966|INFO:Validated]K")
mod = seq.seq[4].mods[0]
print(mod.synonyms[0]) # Phospho
print(mod.synonyms[1]) # O-phospho-L-serine
print(mod.mod_value.pipe_values[3].observed_mass) # 79.966
print(mod.info_tags[0]) # Validated
Accessing mod_value with indexing and slicing
from sequal.sequence import Sequence
# Using indexing to access pipe values
seq = Sequence.from_proforma("ELVIS[Unimod:21|Phospho|INFO:Validated]K")
mod = seq.seq[4].mods[0]
print(mod.mod_value[0].value) # Primary pipe value - Unimod:21
print(mod.mod_value[1].value) # Second pipe value - Phospho
print(mod.mod_value[2].type) # PipeValue.INFO_TAG
# Using slicing to access multiple pipe values
pipe_values = mod.mod_value[1:3] # Get second and third pipe values
for pv in pipe_values:
print(f"{pv.type}: {pv.value}")
# Iterating through all pipe values
for i, pv in enumerate(mod.mod_value):
print(f"Pipe value {i}: {pv.value}")
# Getting the length of pipe values
print(f"Number of pipe values: {len(mod.mod_value)}")
Working with formula and glycan modifications
from sequal.sequence import Sequence
# Working with formula modifications
seq = Sequence.from_proforma("PEPTIDE[Formula:C2H3NO]")
mod = seq.seq[6].mods[0]
print(f"Formula: {mod.mod_value[0].value}")
print(f"Is valid formula: {mod.mod_value[0].is_valid}")
# Invalid formula - still parsed but marked invalid
seq = Sequence.from_proforma("PEPTIDE[Formula:123]")
mod = seq.seq[6].mods[0]
print(f"Invalid formula: {mod.mod_value[0].value}")
print(f"Is valid formula: {mod.mod_value[0].is_valid}") # False
# Working with glycan modifications
seq = Sequence.from_proforma("PEPTID[Glycan:HexNAc1Hex3]E")
mod = seq.seq[5].mods[0]
print(f"Glycan: {mod.mod_value[0].value}")
print(f"Is valid glycan: {mod.mod_value[0].is_valid}")
print(f"Glycan pipe value type: {mod.mod_value[0].type}") # PipeValue.GLYCAN
# Invalid glycan - still parsed but marked invalid and stored as SYNONYM type
seq = Sequence.from_proforma("PEPTID[Glycan:Invalid123]E")
mod = seq.seq[5].mods[0]
print(f"Invalid glycan: {mod.mod_value[0].value}")
print(f"Is valid glycan: {mod.mod_value[0].is_valid}") # False
print(f"Invalid glycan pipe value type: {mod.mod_value[0].type}") # PipeValue.SYNONYM
Named entities (ProForma 2.1)
from sequal.sequence import Sequence
# Peptidoform name
seq = Sequence.from_proforma("(>Tryptic peptide)SEQUEN[Phospho]CE")
print(seq.peptidoform_name) # Tryptic peptide
# Peptidoform ion name
seq = Sequence.from_proforma("(>>Precursor ion z=2)SEQUENCE")
print(seq.peptidoform_ion_name) # Precursor ion z=2
# Compound ion name (for chimeric spectra)
seq = Sequence.from_proforma("(>>>MS2 Scan 1234)PEPTIDE/2+SEQUENCE/3")
print(seq.compound_ion_name) # MS2 Scan 1234
# All three naming levels together
seq = Sequence.from_proforma("(>>>MS2)(>>Precursor)(>Albumin)PEPTIDE")
print(seq.compound_ion_name) # MS2
print(seq.peptidoform_ion_name) # Precursor
print(seq.peptidoform_name) # Albumin
Charged formulas (ProForma 2.1)
from sequal.sequence import Sequence
# Positive charge
seq = Sequence.from_proforma("SEQUEN[Formula:Zn1:z+2]CE")
mod = seq.seq[5].mods[0]
print(mod.mod_value[0].value) # Zn1
print(mod.mod_value[0].charge) # z+2
print(mod.mod_value[0].charge_value) # 2
# Negative charge
seq = Sequence.from_proforma("PEPTIDE[Formula:C2H3NO:z-1]")
mod = seq.seq[6].mods[0]
print(mod.mod_value[0].charge) # z-1
print(mod.mod_value[0].charge_value) # -1
Custom monosaccharides (ProForma 2.1)
from sequal.sequence import Sequence
# Custom monosaccharide in glycan composition
seq = Sequence.from_proforma("SEQUEN[Glycan:{C8H13N1O5}1Hex2]CE")
mod = seq.seq[5].mods[0]
print(mod.mod_value[0].value) # {C8H13N1O5}1Hex2
print(mod.mod_value[0].is_valid) # True
# Charged custom monosaccharide
seq = Sequence.from_proforma("N[Glycan:{C8H13N1O5Na1:z+1}1Hex2HexNAc2]")
mod = seq.seq[0].mods[0]
print(mod.mod_value[0].value) # {C8H13N1O5Na1:z+1}1Hex2HexNAc2
# Multiple custom monosaccharides
seq = Sequence.from_proforma("N[Glycan:{C8H13N1O5}2{C6H10O5}1Hex3]")
mod = seq.seq[0].mods[0]
print(mod.mod_value[0].is_valid) # True
Terminal global modifications (ProForma 2.1)
from sequal.sequence import Sequence
# N-terminal global modification
seq = Sequence.from_proforma("<[Acetyl]@N-term>PEPTIDE")
print(seq.global_mods[0].synonyms[0]) # Acetyl
print(seq.global_mods[0].target_residues) # ['N-term']
# C-terminal global modification
seq = Sequence.from_proforma("<[Amidated]@C-term>PEPTIDE")
print(seq.global_mods[0].target_residues) # ['C-term']
# Terminal-specific: only N-terminal glutamine
seq = Sequence.from_proforma("<[Gln->pyro-Glu]@N-term:Q>QATPEILMCNSIGCLMG")
print(seq.global_mods[0].target_residues) # {'N-term': ['Q']}
# Multiple terminal global modifications
seq = Sequence.from_proforma("<[TMT6plex]@K,N-term><[Oxidation]@M,C-term:G>MTPEILTCNSIGCLK")
print(len(seq.global_mods)) # 2
Placement controls (ProForma 2.1)
from sequal.sequence import Sequence
# Position constraint: limit where modifications can occur
seq = Sequence.from_proforma("[Oxidation|Position:M]^4?PEPTIDEMETCM")
print(seq.to_proforma())
# Limit per position: allow multiple modifications at same position
seq = Sequence.from_proforma("[Oxidation|Limit:2]^4?PEPTIDE")
print(seq.to_proforma())
# CoMKP: allow colocalization with known position modifications
seq = Sequence.from_proforma("[Oxidation|CoMKP]?PEPT[Phospho]IDE")
print(seq.to_proforma())
# CoMUP: allow colocalization with unknown position modifications
seq = Sequence.from_proforma("PEPTIDE[Dioxidation|CoMUP][Oxidation|CoMUP]")
mod1 = seq.seq[6].mods[0]
mod2 = seq.seq[6].mods[1]
print(mod1.colocalize_unknown) # True
print(mod2.colocalize_unknown) # True
# Combined placement controls
seq = Sequence.from_proforma("[Oxidation|Position:M,C|Limit:2|CoMKP]^4?PEPTIDE")
print(seq.to_proforma())
Ion notation (ProForma 2.1)
from sequal.sequence import Sequence
# b-type ion
seq = Sequence.from_proforma("PEPTIDE-[b-type-ion]")
c_term_mods = seq.mods[-2]
print(c_term_mods[0].is_ion_type) # True
print(c_term_mods[0].value) # b-type-ion
# a-type ion
seq = Sequence.from_proforma("PEPTIDE-[a-type-ion]")
c_term_mods = seq.mods[-2]
print(c_term_mods[0].is_ion_type) # True
# Unimod ion type reference
seq = Sequence.from_proforma("PEPTIDE-[UNIMOD:2132]") # b-type-ion
c_term_mods = seq.mods[-2]
print(c_term_mods[0].is_ion_type) # True
# Complex ion notation with formula
seq = Sequence.from_proforma("PEPTID[Formula:H-1C-1O-2|Info:d-ion]-[a-type-ion]")
print(seq.to_proforma())
# Ion notation with charge
seq = Sequence.from_proforma("SFFLYSKLTV-[b-type-ion]/2")
print(seq.charge) # 2
print(seq.mods[-2][0].is_ion_type) # True
Converting to ProForma format
from sequal.sequence import Sequence
# Parse and convert back to ProForma
proforma = "ELVIS[Phospho|INFO:newly discovered]K"
seq = Sequence.from_proforma(proforma)
print(seq.to_proforma()) # ELVIS[Phospho|INFO:newly discovered]K
# Complex example with multiple modification types
proforma = "<[Carbamidomethyl]@C>[Acetyl]-PEPTCDE-[Amidated]"
seq = Sequence.from_proforma(proforma)
print(seq.to_proforma()) # <[Carbamidomethyl]@C>[Acetyl]-PEPTCDE-[Amidated]
# ProForma 2.1 features combined
proforma = "(>Tryptic peptide)<[TMT6plex]@K,N-term>SEQUEN[Formula:Zn1:z+2]CE-[b-type-ion]/2"
seq = Sequence.from_proforma(proforma)
print(seq.to_proforma()) # Perfect roundtrip preservation
Sequence comprehension
Using Sequence Object with Unmodified Protein Sequence
from sequal.sequence import Sequence
#Using Sequence object with unmodified protein sequence
seq = Sequence("TESTEST")
print(seq.seq) #should print "TESTEST"
print(seq[0:2]) #should print "TE"
Using Sequence Object with Modified Protein Sequence
from sequal.sequence import Sequence
#Using Sequence object with modified protein sequence. []{}() could all be used as modification annotation.
seq = Sequence("TEN[HexNAc]ST")
for i in seq.seq:
print(i, i.mods) #should print N [HexNAc] on the 3rd amino acid
seq = Sequence("TEN[HexNAc][HexNAc]ST")
for i in seq.seq:
print(i, i.mods) #should print N [HexNAc, HexNAc] on the 3rd amino acid
# .mods property provides access to an arrays of all modifications at this amino acid
seq = Sequence("TE[HexNAc]NST", mod_position="left") #mod_position left indicate that the modification should be on the left of the amino acid instead of default which is right
for i in seq.seq:
print(i, i.mods) #should print N [HexNAc] on the 3rd amino acid
Custom Annotation Formatting
from sequal.sequence import Sequence
#Format sequence with custom annotation
seq = Sequence("TENST")
a = {1:"tes", 2:["1", "200"]}
print(seq.to_string_customize(a, individual_annotation_enclose=False, individual_annotation_separator="."))
# By supplying .to_string_customize with a dictionary of position on the sequence that you wish to annotate
# The above would print out TE[tes]N[1.200]ST
Modification
Creating a Modification Object
from sequal.modification import Modification
# Create a modification object and try to find all its possible positions using regex
mod = Modification("HexNAc", regex_pattern="N[^P][S|T]")
for ps, pe in mod.find_positions("TESNEST"):
print(ps, pe)
# this should print out the position 3 on the sequence as the start of the match and position 6 as the end of the match
Generating Modified Sequences
Static Modification
from sequal.sequence import ModdedSequenceGenerator
from sequal.modification import Modification
propiona = Modification("Propionamide", regex_pattern="C", mod_type="static")
seq = "TECSNTT"
mods = [propiona]
g = ModdedSequenceGenerator(seq, static_mods=mods)
for i in g.generate():
print(i) # should print {2: [Propionamide]}
Variable Modification
from sequal.sequence import ModdedSequenceGenerator
from sequal.modification import Modification
nsequon = Modification("HexNAc", regex_pattern="N[^P][S|T]", mod_type="variable", labile=True)
osequon = Modification("Mannose", regex_pattern="[S|T]", mod_type="variable", labile=True)
carbox = Modification("Carboxylation", regex_pattern="E", mod_type="variable", labile=True)
seq = "TECSNTT"
mods = [nsequon, osequon, carbox]
g = ModdedSequenceGenerator(seq, mods, [])
print(g.variable_map.mod_position_dict)
# should print {'HexNAc0': [3], 'Mannose0': [0, 2, 4, 5, 6], 'Carboxylation0': [1]}
for i in g.generate():
print(i)
# should print all possible combinations of variable modifications
Mass spectrometry utilities
Generating Non-Labile and Labile Ions
from sequal.mass_spectrometry import fragment_non_labile, fragment_labile
from sequal.modification import Modification
from sequal.sequence import ModdedSequenceGenerator, Sequence
nsequon = Modification("HexNAc", regex_pattern="N[^P][S|T]", mod_type="variable", labile=True, labile_number=1, mass=203)
propiona = Modification("Propionamide", regex_pattern="C", mod_type="static", mass=71)
seq = "TECSNTT"
static_mods = [propiona]
variable_mods = [nsequon]
g = ModdedSequenceGenerator(seq, variable_mods, static_mods)
for i in g.generate():
print(i)
s = Sequence(seq, mods=i)
for b, y in fragment_non_labile(s, "by"):
print(b, "b{}".format(b.fragment_number))
print(y, "y{}".format(y.fragment_number))
g = ModdedSequenceGenerator(seq, variable_mods, static_mods)
for i in g.generate():
s = Sequence(seq, mods=i)
ion = fragment_labile(s)
if ion.has_labile:
print(ion, "Y{}".format(ion.fragment_number))
print(ion.mz_calculate(1))
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sequal-2.1.2.tar.gz.
File metadata
- Download URL: sequal-2.1.2.tar.gz
- Upload date:
- Size: 39.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.10.12 Linux/6.6.87.2-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fde0af3b1b05246d3cf93af3968d30f662830289285522c3a7fd2030a763d8fd
|
|
| MD5 |
c4806f28d5e14ee71072f98c4a0f37f4
|
|
| BLAKE2b-256 |
8448406a89dc03258420d34af6df586d0083e1203192cc74ed812cacef7f62ad
|
File details
Details for the file sequal-2.1.2-py3-none-any.whl.
File metadata
- Download URL: sequal-2.1.2-py3-none-any.whl
- Upload date:
- Size: 39.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.10.12 Linux/6.6.87.2-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e506857714096b6cc3eaf7824b656691d79c7b6b13d93342f7c46f7e3f21c86e
|
|
| MD5 |
cca65208c49e07db8c644079389835d2
|
|
| BLAKE2b-256 |
93112d46a9ffb1371d02fdd19807f05671bb1c524c00a1d6d43b5e3321f68e42
|