Skip to main content

Molecule functional group extraction and comparison

Project description

AccFG: Accurate Functional Group Extraction and Molecular Structure Comparison

Table of Contents

Introduction

AccFG is a tool for precise functional group (FG) extraction and molecular structure comparison.

Installation

To install AccFG, follow these steps:

  1. Clone/download the repository:
    git clone https://github.com/xuanliugit/AccFG.git
    
  2. Navigate to the project directory:
    cd AccFG
    
  3. Install the required dependencies:
    conda create --name accfg python=3.10
    conda activate accfg
    pip install -r requirements.txt
    
  4. Quick start:
    # Get functional groups from SMILES
    python run_accfg.py 'CN(C)/N=N/C1=C(NC=N1)C(=O)N'
    
    # Compare two molecules
    python run_accfg.py 'CNC(=O)Cc1nc(-c2ccccc2)cs1' --compare_smi 'CCNCCc1nc2ccccc2s1'
    

The FG dictionary is stored in ./accfg/fgs_common.csv and ./accfg/fgs_heterocycle.csv

Usage

FG extraction

To extract functional groups:

# example.py
from accfg import AccFG

afg = AccFG(print_load_info=True)
smi = 'CN(C)/N=N/C1=C(NC=N1)C(=O)N'

fgs,fg_graph = afg.run(smi, show_atoms=True, show_graph=True)

print_fg_tree(fg_graph, fgs.keys(), show_atom_idx=True)
'''
├──Primary amide: ((10, 12, 11),)
...
'''
print(fgs)
'''
{'Primary amide': [(10, 12, 11)], 'Triazene': [(1, 3, 4)], 'imidazole': [(5, 9, 8, 7, 6)]}
'''

User-defined FGs Example:

# example.py
from accfg import AccFG

my_fgs_dict = {'Cephem': 'O=C(O)C1=CCS[C@@H]2CC(=O)N12', 'Thioguanine': 'Nc1nc(=S)c2[nH]cnc2[nH]1'}
my_afg = AccFG(user_defined_fgs=my_fgs_dict,print_load_info=True)

cephalosporin_C = 'CC(=O)OCC1=C(N2[C@@H]([C@@H](C2=O)NC(=O)CCC[C@H](C(=O)O)N)SC1)C(=O)O'
fgs,fg_graph = my_afg.run(cephalosporin_C, show_atoms=True, show_graph=True)

print_fg_tree(fg_graph, fgs.keys(), show_atom_idx=True) # This will print the FG tree

'''
├──Primary aliphatic amine: ((21,),)
├──...
'''

To print functional groups:

print(fgs) # Show top level FGs
'''
{'Primary aliphatic amine': [(21,)],
 'Carboxylic acid': [(22, 23, 24)],
 'Carboxylic ester': [(1, 2, 3, 4)],
 'Secondary amide': [(15, 16, 14, 13)],
 'Cephem': [(8, 7, 9, 6, 5, 27, 26, 25, 13, 11, 12, 10)]}
'''

FG extraction visualization

from accfg import draw_mol_with_fgs, molimg

molimg(draw_mol_with_fgs(cephalosporin_C, afg=my_afg, img_size=(900,900)))

This will show image with FGs highlighted

Molecular structure comparison

from accfg import AccFG, compare_mols, draw_compare_mols

smi_1,smi_2 = ('CNC(=O)Cc1nc(-c2ccccc2)cs1','CCNCCc1nc2ccccc2s1')
diff = compare_mols(smi_1, smi_2)
print(diff) # This print the structure difference
'''
(([('Secondary amide', 1, [(2, 3, 1)]),
   ...
'''

draw_RascalMCES(smi_1, smi_2) # This draw the RascalMCES comparison

Molecular structure comparison visualization

img = img_grid(draw_compare_mols(smi_1, smi_2),num_columns=2)
with open('results/compare_mols.png', 'wb') as f:
    img.save(f, format='PNG')
img

Run

To run the BBBP dataset, Lipophilicity dataset, BACE dataset, and CHEMBL drugs, simply run

python run_data.py

The result is in ./molecule_data. The code to process the data is in exam_data.py

All other examples in the manuscript is in example.ipynb.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

accfg-0.0.1.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

accfg-0.0.1-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file accfg-0.0.1.tar.gz.

File metadata

  • Download URL: accfg-0.0.1.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for accfg-0.0.1.tar.gz
Algorithm Hash digest
SHA256 6fa4ee7fe7f1217af75f78a75e8309ac1329ed17ad1fef3033dd687a48a6f494
MD5 5802d664a29370e02030b6988880d66d
BLAKE2b-256 298f3130a93afd3dd4f205492240558abc75fd5775968995053800108cbe717c

See more details on using hashes here.

File details

Details for the file accfg-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: accfg-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for accfg-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 725ebb7b0489ad3170e27612cbb0ffd42e1d7a74e9a11640e65c1d964be2293d
MD5 ebe0e3b7e1f96c4bf02b2245380256bd
BLAKE2b-256 eeb95544ee063c6aea6c44333b5aa7c49cef682a84a9c3a394bf8e2f664668f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page