Extended Functional Groups
Project description
Extended Functional Groups
Extended functional group is a generalized version of traditional functional group and it also contains chemical groups that formed by only carbon atoms. It is inspired by Peter Ertl’s work:
Ertl, P. An algorithm to identify functional groups in organic molecules. J Cheminform 9, 36 (2017)
Built based on that, we also induced the idea that a moelcule should be fully covered by ‘Functional Groups’.
The philosophy of EFG (Extended functional group) is to do fragmentation on molecules so that all fragments of the molecule are chemical valid. To do that, we:
Identify aromatic structures. If two atoms shared the same aromatic ring system, they would be merged.
- Identify special substructures:
Mark all heteroatoms in a molecule
Mark ‘special’ carbon atoms (carbon atoms with double/triple bonds, acetal carbons and three-membered heterocycles.)
Merge all connected marked atoms to a single functional group
Identify simple carbon chains: sp3 carbons connected by two or more hydrogens
Other single atoms The number of single atoms can be significantly reduced by defining subclasses and merging some of them together. All atoms are classified by their aromaticity, degree and formal charge and recorded as element symbol followed by three number corresponding to above properties. For example, Hydrogen (𝐻2) would be H010, methyl group would be C010.
In order to alleviate the imbalance distribution of different EFGs, we proposed an iterative way to selectively decompose large functional groups:
Set a cut-off value α (0<α<1)
Collect sparse functional groups whose rankings are behind top α in frequency distribution
Further decompose collected functional groups:
Neighboring small functional groups which would be merged before would not be merged anymore unless they have shared atom(s).
(If i. is not applicable) Cut all single bonds
Repeat previous steps until the number of functional groups does not change.
For most molecular datasets, this method is able to describe > 99% molecules with < 1% number of EFGs.
Requirements
rdkit >= 2019.03
Installation
To install from source (with latest version):
$ git clone https://github.com/HelloJocelynLu/EFGs.git
$ cd EFGs/
$ python setup.py install
Install from pip:
$ pip install EFGs
Usage
See Tutorial.ipynb in Examples/ folder for detailed examples.
mol2frag is the core function to do the fragmentation.
Licence
MIT Licence.
Reference
Lu, J. N.; Xia, S.; Lu, J. Y.; Zhang, Y. K., Dataset Construction to Explore Chemical Space with 3D Geometry and Deep Learning. J. Chem. Inf. Model. 2021
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file EFGs-0.8.4.tar.gz
.
File metadata
- Download URL: EFGs-0.8.4.tar.gz
- Upload date:
- Size: 12.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.23.0 setuptools/49.6.0.post20210108 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0039919485ae26d892afcb705522d9a345c94878c7eddf894e85f531f8eca55a |
|
MD5 | 9daeb3af742a3316ba6a625fc0bf9173 |
|
BLAKE2b-256 | eb3eeb023bb7bf2727dce55e72dc8cfe29bda6ce77420e8ff2322deaea96a0a8 |