Peptide <-> SMILES utilities for monomer peptides with non-canonical residues, terminal modifications, and selected cyclizations.
Project description
PepLink
PepLink is a Python package for peptide-to-structure conversion.
It currently focuses on one reliable v1 scope:
aa_seqs_to_smiles(...): monomer peptide definition ->SMILESorSELFIESsmiles_to_aa_seqs(...): standard-amino-acidSMILESorSELFIES-> peptide sequencelist_supported_noncanonical_aas(...): inspect bundled and user-registered non-canonical amino-acid mappingsregister_noncanonical_aa(...)/register_noncanonical_aas(...): register custom non-canonical amino acids for the current Python processload_noncanonical_aas_from_csv(...)/register_noncanonical_aas_from_csv(...): read custom non-canonical amino acids from a user CSV file
Installation
pip install PepLink
Runtime dependencies:
rdkitselfies
Quick Start
For an interactive version of the examples in this README, open examples/quick_start.ipynb.
aa_seqs_to_smiles(...)
from PepLink import aa_seqs_to_smiles
smiles = aa_seqs_to_smiles(
"RRXXRF",
unusual_amino_acids=[
{"position": 3, "name": "1-NAL"},
{"position": 4, "name": "1-NAL"},
],
n_terminal="ACT",
c_terminal="AMD",
)
print(smiles)
smiles_to_aa_seqs(...)
from PepLink import smiles_to_aa_seqs
result = smiles_to_aa_seqs("C[C@H](N)C(=O)N[C@@H](CS)C(=O)O")
print(result.sequence) # AC
print(result.is_cyclic) # False
print(result.cyclization) # linear
print(result.unsupported_reason) # None
Non-canonical amino-acid registry
from PepLink import (
aa_seqs_to_smiles,
list_supported_noncanonical_aas,
register_noncanonical_aa,
)
supported = list_supported_noncanonical_aas()
print(supported["1-NAL"])
register_noncanonical_aa("MyAA", "N[C@@H](CC)C(=O)O")
smiles = aa_seqs_to_smiles(
"AXA",
unusual_amino_acids=[{"position": 2, "name": "MyAA"}],
)
print(smiles)
from PepLink import register_noncanonical_aas_from_csv
register_noncanonical_aas_from_csv("examples/example_custom_noncanonical_aas.csv")
Supported Scope
aa_seqs_to_smiles(...)
PepLink v1 supports monomer peptides with:
- 20 canonical amino acids plus D-forms represented by lowercase one-letter codes
- 420 bundled non-canonical amino-acid mappings
- all 241 N-terminal modifications found in
all_peptides_data.json - all 55 C-terminal modifications found in
all_peptides_data.json - 11 implemented intrachain bond types
Supported intrachain bond types:
DSBAMDTIEDCBESTAMNp-XylBTRZB(E)-but-2-enyl-BBisMeBn-Bbut-2-ynyl-B
Meaning of the supported intrachain bond abbreviations:
| Bond | Full name | Meaning |
|---|---|---|
DSB |
Disulfide Bond | A covalent S-S linkage between two cysteine sulfur atoms. |
AMD |
Amide Bond | An amide linkage formed between a carboxyl group and nitrogen; in peptides this bond has partial double-bond character, so the C-N bond is not freely rotatable. |
TIE |
Thioether Bond | A thioether linkage with the general form R-S-R'. |
DCB |
Dicarbon Bond (C=C) | A carbon-carbon double-bond crosslink. |
EST |
Ester Bond | An ester linkage formed from a carboxyl group and a hydroxyl group. |
AMN |
Amine Bond | A bond involving an amino or amine group such as -NH2, -NH-, or -N-. |
p-XylB |
para-Xylene thioether bridge | A para-xylene-based thioether bridge that connects two residues through sulfur atoms. |
TRZB |
Triazole bridge | A sidechain-sidechain linkage formed through a triazole ring bridge. |
(E)-but-2-enyl-B |
(E)-but-2-enyl bridge | A sidechain-sidechain crosslink bridged by an (E)-but-2-enyl group containing a C=C unit. |
BisMeBn-B |
Bismethylenebenzene bridge | A sidechain-sidechain crosslink bridged by a benzene ring with two methylene linkers. |
but-2-ynyl-B |
but-2-ynyl bridge | A sidechain-sidechain crosslink bridged by a but-2-ynyl group containing a carbon-carbon triple bond. |
Common chain_participating abbreviations used in examples:
SSB: Sidechain-Sidechain BondMMB: Mainchain-Mainchain BondSMB: Sidechain-Mainchain Bond
smiles_to_aa_seqs(...)
PepLink v1 intentionally keeps reverse parsing conservative.
It officially supports:
- standard amino acids only
- L/D configuration
- linear peptides
- head-to-tail cyclic peptides
SMILESinputSELFIESinput
It does not promise reverse parsing for:
- non-canonical amino acids
- sidechain-crosslinked cyclic peptides
- terminally modified peptides
- coordination complexes
When a molecule is outside this reliable scope, smiles_to_aa_seqs(...) returns a PeptideParseResult with unsupported_reason.
Public API
aa_seqs_to_smiles(...)
aa_seqs_to_smiles(
sequence,
*,
unusual_amino_acids=None,
intrachain_bonds=None,
n_terminal=None,
c_terminal=None,
output_format="smiles",
aa_overrides=None,
n_terminal_overrides=None,
c_terminal_overrides=None,
) -> str
Key conventions:
sequenceuses one-letter amino-acid codes- non-canonical residues are represented by
Xorxplaceholders unusual_amino_acidsmust match the placeholder positions exactlyintrachain_bondscan use either lightweight dicts or DBAASP-like nested dictsoutput_formatis either"smiles"or"selfies"
Minimal direct examples:
Linear peptide
Dataset example: id=11
from PepLink import aa_seqs_to_smiles
smiles = aa_seqs_to_smiles("RVKRVWPLVIRTVIAGYNLYRAIKKK")
Single non-canonical residue
Dataset example: id=151
smiles = aa_seqs_to_smiles(
"GIKEXKRIVQRIKDFLRNLV",
unusual_amino_acids=[
{"position": 5, "name": "Phg"},
],
)
Multiple non-canonical residues
Dataset example: id=157
smiles = aa_seqs_to_smiles(
"GRFKRXRKKXKKLFKKIS",
unusual_amino_acids=[
{"position": 6, "name": "Phg"},
{"position": 10, "name": "Phg"},
],
)
Terminal modifications
Dataset example: id=10360
smiles = aa_seqs_to_smiles(
"K",
n_terminal="C16",
c_terminal="AMD",
)
Another real example with D-amino acids is id=8:
smiles = aa_seqs_to_smiles(
"KVvvKWVvKvVK",
n_terminal="C16",
c_terminal="AMD",
)
Intrachain bond examples
Each bond type below is backed by a real record from all_peptides_data.json.
DSB
Dataset example: id=57
smiles = aa_seqs_to_smiles(
"VTCDILSVEAKGVKLNDAACAAHCLFRGRSGGYCNGKRVCVCR",
intrachain_bonds=[
{"position1": 3, "position2": 34, "type": "DSB", "chain_participating": "SSB"},
{"position1": 20, "position2": 40, "type": "DSB", "chain_participating": "SSB"},
{"position1": 24, "position2": 42, "type": "DSB", "chain_participating": "SSB"},
],
)
AMD head-to-tail cyclization
Dataset example: id=105
smiles = aa_seqs_to_smiles(
"SwFkTkSk",
intrachain_bonds=[
{"position1": 1, "position2": 8, "type": "AMD", "chain_participating": "MMB"},
],
)
TIE
Dataset example: id=1079
smiles = aa_seqs_to_smiles(
"IXSIXLCTPGCKTGALMGCNMKTATCHCSIHVXK",
unusual_amino_acids=[
{"position": 2, "name": "DHB"},
{"position": 5, "name": "DHA"},
{"position": 33, "name": "DHA"},
],
intrachain_bonds=[
{"position1": 3, "position2": 7, "type": "TIE", "chain_participating": "SSB"},
{"position1": 8, "position2": 11, "type": "TIE", "chain_participating": "SSB"},
{"position1": 13, "position2": 19, "type": "TIE", "chain_participating": "SSB"},
{"position1": 23, "position2": 26, "type": "TIE", "chain_participating": "SSB"},
{"position1": 25, "position2": 28, "type": "TIE", "chain_participating": "SSB"},
],
)
DCB
Dataset example: id=4419
smiles = aa_seqs_to_smiles(
"FLPILASLAAKFGPKLFXLVTKKX",
unusual_amino_acids=[
{"position": 18, "name": "AGL"},
{"position": 24, "name": "AGL"},
],
intrachain_bonds=[
{"position1": 18, "position2": 24, "type": "DCB", "chain_participating": "SSB"},
],
)
EST
Dataset example: id=6917
smiles = aa_seqs_to_smiles(
"SadAssX",
unusual_amino_acids=[
{"position": 7, "name": "D-Allo-Thr"},
],
n_terminal="3,4-OH-4-Me-C16",
intrachain_bonds=[
{"position1": 0, "position2": 7, "type": "EST", "chain_participating": "MMB"},
],
)
AMN
Dataset example: id=19104
smiles = aa_seqs_to_smiles(
"CANSCXYGPLTWSCXGNTK",
unusual_amino_acids=[
{"position": 6, "name": "DHA"},
{"position": 15, "name": "3-OH-Asp"},
],
intrachain_bonds=[
{"position1": 1, "position2": 18, "type": "TIE", "chain_participating": "SSB"},
{"position1": 5, "position2": 11, "type": "TIE", "chain_participating": "SSB"},
{"position1": 4, "position2": 14, "type": "TIE", "chain_participating": "SSB"},
{"position1": 6, "position2": 19, "type": "AMN", "chain_participating": "SSB"},
],
)
p-XylB
Dataset example: id=11913
smiles = aa_seqs_to_smiles(
"cWkKkC",
c_terminal="AMD",
intrachain_bonds=[
{"position1": 1, "position2": 6, "type": "p-XylB", "chain_participating": "SSB"},
],
)
TRZB
Dataset example: id=14660
smiles = aa_seqs_to_smiles(
"FKXRRWQWRMKKLGAPSITXVRRAF",
unusual_amino_acids=[
{"position": 3, "name": "BisHomo-Pra"},
{"position": 20, "name": "Lys(N3)"},
],
intrachain_bonds=[
{"position1": 3, "position2": 20, "type": "TRZB", "chain_participating": "SSB"},
],
)
(E)-but-2-enyl-B
Dataset example: id=17263
smiles = aa_seqs_to_smiles(
"KFFKKLKKAVKKGFKKFAKV",
intrachain_bonds=[
{"position1": 4, "position2": 8, "type": "(E)-but-2-enyl-B", "chain_participating": "SSB"},
],
)
BisMeBn-B
Dataset example: id=17273
smiles = aa_seqs_to_smiles(
"KFFKKLKKAVKKGFKKFAKV",
intrachain_bonds=[
{"position1": 12, "position2": 16, "type": "BisMeBn-B", "chain_participating": "SSB"},
],
)
but-2-ynyl-B
Dataset example: id=19191
smiles = aa_seqs_to_smiles(
"VKRFKKFFRKFKKFV",
c_terminal="AMD",
intrachain_bonds=[
{"position1": 6, "position2": 10, "type": "but-2-ynyl-B", "chain_participating": "SSB"},
],
)
smiles_to_aa_seqs(...)
smiles_to_aa_seqs(text, *, input_format="auto") -> PeptideParseResult
Returned fields:
sequenceis_cycliccyclizationnormalized_smilesinput_formatunsupported_reason
Examples:
from PepLink import aa_seqs_to_smiles, smiles_to_aa_seqs
linear_smiles = aa_seqs_to_smiles("AC")
print(smiles_to_aa_seqs(linear_smiles))
head_to_tail_smiles = aa_seqs_to_smiles(
"SwFkTkSk",
intrachain_bonds=[
{"position1": 1, "position2": 8, "type": "AMD", "chain_participating": "MMB"},
],
)
print(smiles_to_aa_seqs(head_to_tail_smiles))
For head-to-tail cyclic peptides, the returned sequence is normalized to a canonical rotation, because a ring has no unique start residue.
Non-canonical amino-acid registry
list_supported_noncanonical_aas(*, include_custom=True) -> dict[str, str]
load_noncanonical_aas_from_csv(csv_path) -> dict[str, str]
register_noncanonical_aa(name, smiles) -> str
register_noncanonical_aas(mapping) -> dict[str, str]
register_noncanonical_aas_from_csv(csv_path) -> dict[str, str]
clear_registered_noncanonical_aas() -> None
Key conventions:
list_supported_noncanonical_aas(...)returnsname -> SMILESmappings only for non-canonical residues- bundled mappings contribute 420 non-canonical residue names by default
- CSV helpers expect columns
name(oraa) andSMILES register_noncanonical_aa(...)validates and canonicalizes the inputSMILES- registered mappings are process-local and are picked up automatically by
aa_seqs_to_smiles(...) aa_overridesis still available when you want a per-call override instead of mutating the process-wide registry
DBAASP Helper
If your source data already follows the DBAASP-style structure used in all_peptides_data.json, use from_dbaasp_record(...).
import json
from pathlib import Path
from PepLink import aa_seqs_to_smiles, from_dbaasp_record
records = json.loads(Path("all_peptides_data.json").read_text())
record = next(item for item in records if item["id"] == 57)
inputs = from_dbaasp_record(record)
smiles = aa_seqs_to_smiles(**inputs.to_api_kwargs())
Dataset Compatibility
all_peptides_data.json is the reference dataset used in this repository.
Current coverage:
- N-terminal modifications in dataset:
241 / 241bundled - C-terminal modifications in dataset:
55 / 55bundled - unusual amino-acid names in dataset:
420 / 545bundled - missing unusual amino-acid names:
125
Unsupported Cases
PepLink v1 intentionally rejects several categories.
- multimer peptides and interchain bonds
- coordination bonds
- reverse parsing of non-canonical / terminally modified / sidechain-crosslinked peptides
- intrachain bond types not yet implemented:
ETH,CAR,IMN
Real dataset examples:
- multimer / interchain bond:
id=1 - coordination bond:
id=15 - unsupported bond types appear in records such as
id=17389andid=21130 - a known forward edge case that still fails in v1:
id=5779
Extending Mappings
You can extend the bundled mappings without modifying PepLink source code.
Register custom unusual amino acids for the current process
from PepLink import register_noncanonical_aas
register_noncanonical_aas(
{
"MyAA": "N[C@@H](CC)C(=O)O",
"MyAA2": "N[C@@H](CO)C(=O)O",
}
)
Register custom unusual amino acids from a CSV file
Example file: examples/example_custom_noncanonical_aas.csv
from PepLink import register_noncanonical_aas_from_csv
register_noncanonical_aas_from_csv("examples/example_custom_noncanonical_aas.csv")
Add missing unusual amino acids per call
smiles = aa_seqs_to_smiles(
"AXA",
unusual_amino_acids=[{"position": 2, "name": "MyAA"}],
aa_overrides={"MyAA": "N[C@@H](CC)C(=O)O"},
)
Add terminal modifications
smiles = aa_seqs_to_smiles(
"AK",
n_terminal="MyNCap",
c_terminal="MyCTail",
n_terminal_overrides={"MyNCap": "CC(=O)O"},
c_terminal_overrides={"MyCTail": "N"},
)
Notes
- Forward
SELFIESoutput is now implemented through the public API. - Reverse parsing remains intentionally narrower than forward generation.
- The supported runtime implementation now lives entirely inside the
PepLink/package. - Custom non-canonical amino-acid registrations are process-local runtime state.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file peplink-0.1.0.tar.gz.
File metadata
- Download URL: peplink-0.1.0.tar.gz
- Upload date:
- Size: 47.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38fb541b7cd4f3f751c8538ba00da1dcc372064b85d4ec1a9f0a1d110d13a429
|
|
| MD5 |
6a188581518bf9ea297dbeb950f88d48
|
|
| BLAKE2b-256 |
a8adb9434bed0f128dd18c03a1b86475ed2340ab906152907099f0d861843b6e
|
File details
Details for the file peplink-0.1.0-py3-none-any.whl.
File metadata
- Download URL: peplink-0.1.0-py3-none-any.whl
- Upload date:
- Size: 44.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e074ceed9364d5ca596cbb64d62e8602671c0f68cf2be0bdfb2fe20542dd9ef
|
|
| MD5 |
c91ba5d7eb88095c0e03916ef724773b
|
|
| BLAKE2b-256 |
ab7e7f29ac824622cf98b04b25fa0d26c1f8e093113de761a2aeaf686c0b41a8
|