Skip to main content

A toolkit enabling SMILES generation and property analysis for noncanonical and cyclized peptides.

Project description

p2smi: Peptide FASTA-to-SMILES Conversion and Molecular Property Tools

p2smi is a Python package for generating and modifying peptide SMILES strings from FASTA input and computing molecular properties. It supports cyclic and linear peptides, noncanonical amino acids, and common chemical modifications (e.g., N-methylation, PEGylation).

This package was released in its current form to support work on the PeptideCLM model, described in our Publication.

If you use this tool, please cite the PeptideCLM paper. A publication for this specific toolkit is forthcoming.

Manuscript

Directory

Features

  • Convert peptide FASTA files into valid SMILES strings
  • Automatically handle peptide cyclizations (disulfide, head-to-tail, side-chain to N-term, side-chain to C-term, side-chain to side-chain)
  • Modify peptide SMILES with customizable N-methylation and PEGylation
  • Evaluate synthesis feasibility with defined synthesis rules
  • Compute molecular properties: logP, TPSA, molecular formula, and Lipinski rule evaluation

Installation

pip install p2smi

For development:

git clone https://github.com/AaronFeller/p2smi.git
cd p2smi
pip install -e .[dev]

Command-Line Tools

Command Description
generate-peptides Generate random peptide sequences based on user-defined constraints and modifications
fasta2smi Convert a FASTA file of peptide sequences into SMILES format
modify-smiles Apply modifications (N-methylation, PEGylation) to existing SMILES strings
smiles-props Compute molecular properties (logP, TPSA, formula, Lipinski rules) from SMILES
synthesis-check Check synthesis constraints for peptides (currently only functional for natural amino acids)

Run each command with --help to view usage and options:

generate-peptides --help
fasta2smi --help
modify-smiles --help
smiles-props --help
synthesis-check --help

Example Usage

Generate a random peptide:

generate-peptides \
    --max_seq_len 20 \
    --min_seq_len 10 \
    --noncanonical_percent 0.1 \
    --lowercase_percent 0.1 \
    --num_sequences 10 \
    --constraints all \
    --outfile outputfile.smi

Convert a FASTA file to SMILES:

fasta2smi -i peptides.fasta -o output.smi

Modify existing SMILES strings (N-methylation/PEGylation):

modify-smiles -i input.smi -o modified.smi --peg_rate 0.3 --nmeth_rate 0.2 --nmeth_residues 0.25

Compute properties of a SMILES string:

smiles-props "C1CC(NC(=O)C2CC2)C1"

Check synthetic feasability

synthesis-check output.smi  # only works for natural amino acids

Future Work

  • Expand support for additional post-translational modifications (build importer)
  • Enhance synthesis-check with rules for noncanonical amino acid and modified peptides
  • Expand usage of mol files (applying RDKit's Chem.MolFromSmiles() function)
  • Include alternative encodings (HELM, SELFIES, etc.)
  • Enable batch processing/threading for high-throughput analysis
  • Incorporate predictive models for synthesis of unnatural amino acids

For Contributors

There are several ways you can contribute to this project:

  • Reporting Bugs: If you encounter any issues or unexpected behavior, please let us know by opening an issue.
  • Suggesting Enhancements: Have ideas to improve the project? We’d love to hear them! Share your suggestions by opening an issue.
  • Submitting Pull Requests: If you’d like to fix a bug or implement a new feature, you can submit a pull request.
  • Improving Documentation: Clear and comprehensive documentation helps everyone.

License

MIT License

Citation

If you use this tool, please cite:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

p2smi-1.1.0.tar.gz (39.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

p2smi-1.1.0-py3-none-any.whl (36.1 kB view details)

Uploaded Python 3

File details

Details for the file p2smi-1.1.0.tar.gz.

File metadata

  • Download URL: p2smi-1.1.0.tar.gz
  • Upload date:
  • Size: 39.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for p2smi-1.1.0.tar.gz
Algorithm Hash digest
SHA256 9bc7d44f7fbc6dc0974acd7b25b95b5a9b8448ab723f4101c1ee722b0f100c24
MD5 95025307c9b3bac4f969a87ff534cea1
BLAKE2b-256 9160acdf963578414609e1263b96efa421c9106a8d183bbfbcc408c88cc2134b

See more details on using hashes here.

File details

Details for the file p2smi-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: p2smi-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 36.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for p2smi-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 116b544b1260874d5ac2452738a58b6cb05a51423a2eaa6fd3cf16d9081ebcba
MD5 f1ca991e5de00cb2b9227b6da43e2794
BLAKE2b-256 c258c9dcb961f000ad7b61fbdff094ca9c62c4c9c6bfd500aad2a0cff656a651

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page