A package for converting peptide FASTA to SMILES strings and calculating molecular properties.
Project description
p2smi: Peptide FASTA-to-SMILES Conversion and Molecular Property Tools
p2smi is a Python package for generating and modifying peptide SMILES strings from FASTA input and computing molecular properties. It supports cyclic and linear peptides, noncanonical amino acids, and common chemical modifications (e.g., N-methylation, PEGylation).
This package was released in its current form to support work on the PeptideCLM model, described in our Publication.
If you use this tool, please cite the PeptideCLM paper. A JOSS publication is forthcoming.
Manuscript
Directory
Features
- Convert peptide FASTA files into valid SMILES strings
- Automatically handle peptide cyclizations (disulfide, head-to-tail, side-chain to N-term, side-chain to C-term, side-chain to side-chain)
- Modify peptide SMILES with customizable N-methylation and PEGylation
- Evaluate synthesis feasibility with defined synthesis rules
- Compute molecular properties: logP, TPSA, molecular formula, and Lipinski rule evaluation
Installation
pip install p2smi
For development:
git clone https://github.com/AaronFeller/p2smi.git
cd p2smi
pip install -e .[dev]
Command-Line Tools
| Command | Description |
|---|---|
generate-peptides |
Generate random peptide sequences based on user-defined constraints and modifications |
fasta2smi |
Convert a FASTA file of peptide sequences into SMILES format |
modify-smiles |
Apply modifications (N-methylation, PEGylation) to existing SMILES strings |
smiles-props |
Compute molecular properties (logP, TPSA, formula, Lipinski rules) from SMILES |
synthesis-check |
Check synthesis constraints for peptides (currently only functional for natural amino acids) |
Run each command with
--helpto view usage and options:
generate-peptides --help
fasta2smi --help
modify-smiles --help
smiles-props --help
synthesis-check --help
Example Usage
Generate a random peptide:
generate-peptides \
--max_seq_len 20 \
--min_seq_len 10 \
--noncanonical_percent 0.1 \
--lowercase_percent 0.1 \
--num_sequences 10 \
--constraints all \
--outfile outputfile.smi
Convert a FASTA file to SMILES:
fasta2smi -i peptides.fasta -o output.smi
Modify existing SMILES strings (N-methylation/PEGylation):
modify-smiles -i input.smi -o modified.smi --peg_rate 0.3 --nmeth_rate 0.2 --nmeth_residues 0.25
Compute properties of a SMILES string:
smiles-props "C1CC(NC(=O)C2CC2)C1"
Check synthetic feasability
synthesis-check output.smi # only works for natural amino acids
Future Work
- Expand support for additional post-translational modifications (build importer)
- Enhance synthesis-check with rules for noncanonical amino acid and modified peptides
- Expand usage of mol files (applying RDKit's Chem.MolFromSmiles() function)
- Include alternative encodings (HELM, SELFIES, etc.)
- Enable batch processing/threading for high-throughput analysis
- Incorporate predictive models for synthesis of unnatural amino acids
For Contributors
There are several ways you can contribute to this project:
- Reporting Bugs: If you encounter any issues or unexpected behavior, please let us know by opening an issue.
- Suggesting Enhancements: Have ideas to improve the project? We’d love to hear them! Share your suggestions by opening an issue.
- Submitting Pull Requests: If you’d like to fix a bug or implement a new feature, you can submit a pull request.
- Improving Documentation: Clear and comprehensive documentation helps everyone.
License
Citation
If you use this tool, please cite:
- Peptide-Aware Chemical Language Model Successfully Predicts Membrane Diffusion of Cyclic Peptides (JCIM)
A JOSS paper will follow.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file p2smi-1.0.0.tar.gz.
File metadata
- Download URL: p2smi-1.0.0.tar.gz
- Upload date:
- Size: 36.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
621657fcc2858976082bc3a06d7b892272f1c75943e1df9234c3439d05c68911
|
|
| MD5 |
607076f30053d3ac260a043bba18e754
|
|
| BLAKE2b-256 |
7a0913ebcb9573562bae9fb3e8199a4b1ca74676367cf688d52ddb584cef22d9
|
File details
Details for the file p2smi-1.0.0-py3-none-any.whl.
File metadata
- Download URL: p2smi-1.0.0-py3-none-any.whl
- Upload date:
- Size: 33.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91e47ae4d7a1afce97e722918ff81b5b850eb4f00d6371f1d4d9df16d3b614f9
|
|
| MD5 |
f1cb700c611890f8f70da26c4df3ef5b
|
|
| BLAKE2b-256 |
4c8fbf8713b9d4bc260d05ade038ccfb7e611e471aa35f2c355b74d224abfbb0
|