Skip to main content

A package for converting peptide FASTA to SMILES strings and calculating molecular properties.

Project description

p2smi: Peptide FASTA-to-SMILES Conversion and Molecular Property Tools

p2smi is a Python package for generating and modifying peptide SMILES strings from FASTA input and computing molecular properties. It supports cyclic and linear peptides, noncanonical amino acids, and common chemical modifications (e.g., N-methylation, PEGylation).

This package was released in its current form to support work on the PeptideCLM model, described in our Publication.

If you use this tool, please cite the PeptideCLM paper. A JOSS publication is forthcoming.

Manuscript

Directory

Features

  • Convert peptide FASTA files into valid SMILES strings
  • Automatically handle peptide cyclizations (disulfide, head-to-tail, side-chain to N-term, side-chain to C-term, side-chain to side-chain)
  • Modify peptide SMILES with customizable N-methylation and PEGylation
  • Evaluate synthesis feasibility with defined synthesis rules
  • Compute molecular properties: logP, TPSA, molecular formula, and Lipinski rule evaluation

Installation

pip install p2smi

For development:

git clone https://github.com/AaronFeller/p2smi.git
cd p2smi
pip install -e .[dev]

Command-Line Tools

Command Description
generate-peptides Generate random peptide sequences based on user-defined constraints and modifications
fasta2smi Convert a FASTA file of peptide sequences into SMILES format
modify-smiles Apply modifications (N-methylation, PEGylation) to existing SMILES strings
smiles-props Compute molecular properties (logP, TPSA, formula, Lipinski rules) from SMILES
synthesis-check Check synthesis constraints for peptides (currently only functional for natural amino acids)

Run each command with --help to view usage and options:

generate-peptides --help
fasta2smi --help
modify-smiles --help
smiles-props --help
synthesis-check --help

Example Usage

Generate a random peptide:

generate-peptides \
    --max_seq_len 20 \
    --min_seq_len 10 \
    --noncanonical_percent 0.1 \
    --lowercase_percent 0.1 \
    --num_sequences 10 \
    --constraints all \
    --outfile outputfile.smi

Convert a FASTA file to SMILES:

fasta2smi -i peptides.fasta -o output.smi

Modify existing SMILES strings (N-methylation/PEGylation):

modify-smiles -i input.smi -o modified.smi --peg_rate 0.3 --nmeth_rate 0.2 --nmeth_residues 0.25

Compute properties of a SMILES string:

smiles-props "C1CC(NC(=O)C2CC2)C1"

Check synthetic feasability

synthesis-check output.smi  # only works for natural amino acids

Future Work

  • Expand support for additional post-translational modifications (build importer)
  • Enhance synthesis-check with rules for noncanonical amino acid and modified peptides
  • Expand usage of mol files (applying RDKit's Chem.MolFromSmiles() function)
  • Include alternative encodings (HELM, SELFIES, etc.)
  • Enable batch processing/threading for high-throughput analysis
  • Incorporate predictive models for synthesis of unnatural amino acids

For Contributors

There are several ways you can contribute to this project:

  • Reporting Bugs: If you encounter any issues or unexpected behavior, please let us know by opening an issue.
  • Suggesting Enhancements: Have ideas to improve the project? We’d love to hear them! Share your suggestions by opening an issue.
  • Submitting Pull Requests: If you’d like to fix a bug or implement a new feature, you can submit a pull request.
  • Improving Documentation: Clear and comprehensive documentation helps everyone.

License

MIT License

Citation

If you use this tool, please cite:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

p2smi-1.0.0.tar.gz (36.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

p2smi-1.0.0-py3-none-any.whl (33.6 kB view details)

Uploaded Python 3

File details

Details for the file p2smi-1.0.0.tar.gz.

File metadata

  • Download URL: p2smi-1.0.0.tar.gz
  • Upload date:
  • Size: 36.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for p2smi-1.0.0.tar.gz
Algorithm Hash digest
SHA256 621657fcc2858976082bc3a06d7b892272f1c75943e1df9234c3439d05c68911
MD5 607076f30053d3ac260a043bba18e754
BLAKE2b-256 7a0913ebcb9573562bae9fb3e8199a4b1ca74676367cf688d52ddb584cef22d9

See more details on using hashes here.

File details

Details for the file p2smi-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: p2smi-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 33.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for p2smi-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91e47ae4d7a1afce97e722918ff81b5b850eb4f00d6371f1d4d9df16d3b614f9
MD5 f1cb700c611890f8f70da26c4df3ef5b
BLAKE2b-256 4c8fbf8713b9d4bc260d05ade038ccfb7e611e471aa35f2c355b74d224abfbb0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page