Skip to main content

A Python package that automatically generates derived variables from a column with SMILES (Simplified Molecular-Input Line-Entry System).

Project description

Development Status :: 3 - Alpha

SMILES featurizer

PyPI package Code style: black PyPI

A Python package that automatically generates derived feature variables from a column with SMILES (Simplified Molecular-Input Line-Entry System)

The python package, SMILES Featurizer helps quickly and painlessly explore the baseline and key features for many projects that use SMILES strings. It's still in the development phase, and there are some errors with certain SMILES strings due to dependencies in the package. There are no scheduled regular updates, and I welcome pull requests at any time. I intentionally did not encapsulate it highly as a class, and I maintain it in the form of functions. This is because it is based on the processing of a single data frame and because the service is highly likely to be modified.


Install

$ pip install smilesfeaturizer
$ pip install git+https://github.com/dsdanielpark/SMILES-featurizer.git

Usage Open In Colab

The dataset assumes the presence of SMILES strings in a column named SMILES. See tutorial notebook.

Feature generation

  • Create fingerprint columns for SMILES representations based on various packages RDKit, Mol2Vec, DataMol, MolFeat, Scikit-Learn.

    from smilesfeaturizer import generate_smiles_feature
    
    df = generate_smiles_feature(df) # default method="simple"
    
    df = generate_smiles_feature(df, method="specific") 
    

Create dashboard

  • Through the dashboard, you can determine which compounds exhibit what prediction performance.

    from smilesfeaturizer import create_inline_dash_dashboard
    
    # Load your DataFrame and specify the columns
    true_col = 'pIC50'
    predicted_col = 'predicted_pIC50'
    
    # Create and run the Dash dashboard
    create_inline_dash_dashboard(df, true_col, predicted_col)
    

Save reporting images

  • Molecular images, basic information, and the prediction versus actual values are visually represented in bar graphs for easy viewing.
    from smilesfeaturizer import smiles_insight_plot
    
    selected_metric = 'RMSE'  # Choose the error metric you want to display
    true_col = 'pIC50'  # Replace with your true column name
    predicted_col = 'predicted_pIC50'  # Replace with your predicted column name
    smiles_insight_plot(df[:1], true_col, predicted_col, selected_metric, 'output_folder', show=True)
    

License

Apache 2.0

Bugs and Issues

Sincerely grateful for any reports on new features or bugs. Your valuable feedback on the code is highly appreciated.

Contacts


Copyright (c) 2023 MinWoo Park, South Korea

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smilesfeaturizer-0.1.3.tar.gz (19.3 MB view details)

Uploaded Source

Built Distribution

smilesfeaturizer-0.1.3-py3-none-any.whl (19.3 MB view details)

Uploaded Python 3

File details

Details for the file smilesfeaturizer-0.1.3.tar.gz.

File metadata

  • Download URL: smilesfeaturizer-0.1.3.tar.gz
  • Upload date:
  • Size: 19.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for smilesfeaturizer-0.1.3.tar.gz
Algorithm Hash digest
SHA256 c9abc16b66011fc9b3bfd1954f3d08b107d07694a68ff43518516ba8f0acd9c8
MD5 1f03c5acf9f85fa5e48033e1d8154e6e
BLAKE2b-256 1fce58b7e2d6657f7344d10c936069d52aeda6b8a069bf3225d1c355b7f262fd

See more details on using hashes here.

File details

Details for the file smilesfeaturizer-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for smilesfeaturizer-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8a1fc03f0c3665afdc6873bccb3becba0a0e65f4897554fcd8009b4be81d6b4d
MD5 c60dc9ea4ea5fccf8819604a9aa89a80
BLAKE2b-256 abd13f2fd9abb7734b324c92da1815acdab98a872033df8be542fc349a812b45

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page