Skip to main content

rule-based virtual polymer library generator

Project description

SMiPoly

license

1. What is SMiPoly?

"SMiPoly (Small Molecules into Polymers)" is rule-based virtual library generator for discovery of functional polymers. It is consist of two submodules, "monc.py" and "polg.py".
"monc.py" is a monomer classifier from a list of small molecules, and "polg.py" is a polymer repeating unit generator from the classified monomer list.

2. Current version and requirements

current version = 0.1.0
requirements

  • pyhon 3.7, 3.8, 3.9, 3.10, 3.11, 3.12
  • rdkit >= 2020.09.1.0 #(2019.09.3 is unavailable)
  • numpy >= 1.20.2
  • pandas >= 1.2.4

3. Installation and usage

3-1. Installatin

SMiPoly can be installed with pip or conda.

3-1-1. Install with pip

Create new virtual environment and activate it. To install this package, run as follows.

$pip install smipoly

3-1-2. Install with conda

Add the channel "conda-forge" if it have not been enable.

$conda config --add channels conda-forge

Create a new environment.

$conda create -n "YOUR_NEW_ENVIRONMNT_NAME" python  
or 
$conda create -n "YOUR_NEW_ENVIRONMNT_NAME" python="required version (ex. 3.10)"

Then activate it.

$conda activate "YOUR_NEW_ENVIRONMNT_NAME"

And install SMiPoly.

$conda install smipoly

Or after create and activate a new environment,

$conda install -c conda-forge smipoly

3-2. Quick start

Download 'sample_data/202207_smip_monset.csv' and 'sample_script/sample_smip_demo.ipynb' from SMiPoly repository to the same directry on your computer. Then run sample_smip_demo.ipynb. To run this demo script, Jupyter Notebook is required.

4. Module contents

4-1. monc.py

The functions of monc.py is as follows.

  • extract monomers from a list of small molecules.
  • classify extracted monomers into each monomer class.

The chemical structure of the small molecule compounds should be expressed in simplified molecular input line entry system (SMILES) and given as pandas DataFrame.

Functions
smip.monc.moncls(df, smiColn, minFG = 2, maxFG = 4, dsp_rsl=False)
smip.monc.olecls(df, smiColn, minFG = 1, maxFG = 4, dsp_rsl=False)

ARGUMENTS:

  • df: name of the object DataFrame
  • smicoln: The column label of the SMILES column, given as a str.
  • minFG: minimum number of the polymerizable functional groups in the monomer for successive polymerization (default for moncls, 2: 2 or more; for olecls, 1: 1 or more)
  • maxFG: maxmum nimber of the polymerizable functional groups in the monomer for successive polymerization (default 4: 4 or less)
  • dsp_rsl: display classified result (default False)

Defined monomer class
By the function "moncls"

  • vinylidene
  • cyclic olefin
  • epoxide and diepoxide
  • lactone
  • lactam
  • hydroxy carboxylic acid
  • amino acid
  • cyclic carboxylic acid anhydride and bis(cyclic carboxylic acid anhydride)
  • hindered phenol
  • dicarboxylic acid and acid halide
  • diol
  • diamine and primary diamine
  • diisocyanate
  • bis(halo aryl)sulfone
  • bis(fluoro aryl)ketone

By the function "olecls"
(The following class of compounds are also belong to the class "vinylidene" and / or "cyclic olefin".)

  • acryl
  • styryl
  • allyl
  • conjugated dienes
  • vinyl ether
  • vinyl ester
  • maleic imide derivatives

4-2. polg.py

"polg.py" gives all synthesizable polymer repeating units starting from the classified monomer list generated by "monc.py".
For chain polymerization (polyolefins and some polyether), it gives homo and binary-copolymers. For successive (or step) polymerization, it gives homopolymer only.

Function
smip.polg.biplym(df, targ = ['all'], Pmode = 'a', dsp_rsl=False)

ARGUMENTS:

  • df: name of the DataFrame of classified monomers generated by monm.moncls.
  • targ: targetted polymer class. When present, it can be a list of str. The selectable elements are 'polyolefin', 'polyester', 'polyether', 'polyamide', 'polyimide', 'polyurethane', 'polyoxazolidone' and 'all' (default = ['all'])
  • Pmod: generate all isomers of the polymer repeating unit ('a') or the polymer repeating unit of its representation ('r'). (default = 'a')
  • dsp_rsl: display the DataFrame of the generated polymers. (default False)

Defined polymer class

  • polyolefin, polycyclic olefin and their binary copolymers
  • polyester (from lactone, hydroxy carboxylic acid, dicarboxylic acid + diol, diol + CO and cyclic carboxylic acid anhydride + epoxide)
  • polyether (from epoxide, hindered phenol, bis(halo aryl)sulfone + diol and bis(fluoro aryl)ketone + diol)
  • polyamide (from lactam, amino acid and dicarboxylic acid + diamine)
  • polyimide (bis(cyclic carboxylic acid anhydride + primary diamine)
  • polyurethane (diisocyanate + diol)
  • polyoxazolidone (diepoxide + diisocyanate)

4-3 Sample data

The sample dataset './sample_data/202207_smip_monset.csv' includes common 1,083 monomers collected from published documents such as scientific articles, catalogues and so on.

4-4. Utilities

By using the files in './utilities' directory, one can modify or add the definition of monomers, the rules of polymerization reactions and polymer classes.
To apply the new rule(s), replace the old './smipoly/rules' directory by the new one. The files must be run according to the number assigned the head of the each filename.

  • 1_MonomerDefiner.ipynb: definitions of monomers
  • 2_Ps_rxnL.ipynb: rules of polymerization reactions
  • 3_Ps_GenL.ipynb: definitions of polymer classes with combinations of starting monomer(s) and polymerization reaction

5. Copyright and license

Copyright (c) 2022 Mitsuru Ohno
Released under the BSD-3 license, license that can be found in the LICENSE file.

6. Publications

SMiPoly: Generation of a Synthesizable Polymer Virtual Library Using Rule-Based Polymerization Reactions
Mitsuru Ohno, Yoshihiro Hayashi, Qi Zhang, Yu Kaneko, and Ryo Yoshida
Journal of Chemical Information and Modeling 2023 63 (17), 5539-5548
DOI: 10.1021/acs.jcim.3c00329
https://doi.org/10.1021/acs.jcim.3c00329
(version 0.0.1 was used)

7. Related projects

RadonPy (Fully automated calculation for a comprehensive set of polymer properties)
https://github.com/RadonPy/RadonPy

8. Directry configuration

SMiPoly
├── src
│   └── smipoly
│       ├── __init__.py
│       ├── _version.py
│       ├── smip
│          ├── __init__.py
│          ├── funclib.py
│          ├── monc.py
│          └── polg.py
│       └── rules
│           ├── excl_lst.json
│           ├── mon_dic_inv.json
│           ├── mon_dic.json
│           ├── mon_lst.json
│           ├── mon_vals.json
│           ├── ps_class.json
│           ├── ps_gen.pkl
│           └── ps.rxn.pkl
├── LICENSE
├── pyproject.toml
├── setup.py
├── setup.cfg
├── README.md
├── sample_data
│   └── 202207_smip_monset.csv
├── sample_script
│   └── sample_smip_demo.ipynb
└── utilities
    ├── 1_MonomerDefiner.ipynb
    ├── 2_Ps_rxnL.ipynb
    ├── 3_Ps_GenL.ipynb
    └── rules/

Reference

https://future-chem.com/rdkit-chemical-rxn/
https://www.daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html
https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smipoly-0.1.0.tar.gz (4.2 MB view details)

Uploaded Source

Built Distribution

smipoly-0.1.0-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file smipoly-0.1.0.tar.gz.

File metadata

  • Download URL: smipoly-0.1.0.tar.gz
  • Upload date:
  • Size: 4.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for smipoly-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7f354ca6130104d598f73742937d5c139dc0079a0a0861eeaa6fbb94be0e9267
MD5 68e7e47f42733a02e0758d8c4ae86b79
BLAKE2b-256 6ed63230403eaab1a6ac3964ef5e600a987bab920f6267dd6dd433ee7f54f723

See more details on using hashes here.

File details

Details for the file smipoly-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: smipoly-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for smipoly-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 def06f1fe371cce2133f22b2e9841b44025b4efee1bd2af5d0a26a0344fd68ed
MD5 91d1731bff71e116e1b7cb3e11f03096
BLAKE2b-256 6cb026599bb55e384c1cb896a1ca1963f0bd5f8679bb3321978109f3610c0323

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page