Skip to main content

Molecule functional group extraction and comparison

Project description

AccFG: Accurate Functional Group Extraction and Molecular Structure Comparison

PyPI version Python

Table of Contents

Introduction

AccFG is a tool for precise functional group (FG) extraction and molecular structure comparison.

Installation

We provide two methods to install AccFG:

Installation by pip (recommended)

pip install accfg

Installation from GitHub repository

To install AccFG, follow these steps:

  1. Clone/download the repository:
    git clone https://github.com/xuanliugit/AccFG.git
    
  2. Navigate to the project directory:
    cd AccFG
    
  3. Install the required dependencies:
    conda create --name accfg python=3.10
    conda activate accfg
    pip install -r requirements.txt
    
  4. Quick start:
    # Get functional groups from SMILES
    python run_accfg.py 'CN(C)/N=N/C1=C(NC=N1)C(=O)N'
    
    # Compare two molecules
    python run_accfg.py 'CNC(=O)Cc1nc(-c2ccccc2)cs1' --compare_smi 'CCNCCc1nc2ccccc2s1'
    

The FG dictionary is stored in ./accfg/fgs_common.csv and ./accfg/fgs_heterocycle.csv

Usage

FG extraction

To extract functional groups:

# example.py
from accfg import AccFG

afg = AccFG(print_load_info=True)
smi = 'CN(C)/N=N/C1=C(NC=N1)C(=O)N'

fgs,fg_graph = afg.run(smi, show_atoms=True, show_graph=True)

print_fg_tree(fg_graph, fgs.keys(), show_atom_idx=True)
'''
├──Primary amide: ((10, 12, 11),)
...
'''
print(fgs)
'''
{'Primary amide': [(10, 12, 11)], 'Triazene': [(1, 3, 4)], 'imidazole': [(5, 9, 8, 7, 6)]}
'''

User-defined FGs Example:

# example.py
from accfg import AccFG

my_fgs_dict = {'Cephem': 'O=C(O)C1=CCS[C@@H]2CC(=O)N12', 'Thioguanine': 'Nc1nc(=S)c2[nH]cnc2[nH]1'}
my_afg = AccFG(user_defined_fgs=my_fgs_dict,print_load_info=True)

cephalosporin_C = 'CC(=O)OCC1=C(N2[C@@H]([C@@H](C2=O)NC(=O)CCC[C@H](C(=O)O)N)SC1)C(=O)O'
fgs,fg_graph = my_afg.run(cephalosporin_C, show_atoms=True, show_graph=True)

print_fg_tree(fg_graph, fgs.keys(), show_atom_idx=True) # This will print the FG tree

'''
├──Primary aliphatic amine: ((21,),)
├──...
'''

To print functional groups:

print(fgs) # Show top level FGs
'''
{'Primary aliphatic amine': [(21,)],
 'Carboxylic acid': [(22, 23, 24)],
 'Carboxylic ester': [(1, 2, 3, 4)],
 'Secondary amide': [(15, 16, 14, 13)],
 'Cephem': [(8, 7, 9, 6, 5, 27, 26, 25, 13, 11, 12, 10)]}
'''

FG extraction visualization

from accfg import draw_mol_with_fgs, molimg

molimg(draw_mol_with_fgs(cephalosporin_C, afg=my_afg, img_size=(900,900)))

This will show image with FGs highlighted

Molecular structure comparison

from accfg import AccFG, compare_mols, draw_compare_mols

smi_1,smi_2 = ('CNC(=O)Cc1nc(-c2ccccc2)cs1','CCNCCc1nc2ccccc2s1')
diff = compare_mols(smi_1, smi_2)
print(diff) # This print the structure difference
'''
(([('Secondary amide', 1, [(2, 3, 1)]),
   ...
'''

draw_RascalMCES(smi_1, smi_2) # This draw the RascalMCES comparison

Molecular structure comparison visualization

img = img_grid(draw_compare_mols(smi_1, smi_2),num_columns=2)
with open('results/compare_mols.png', 'wb') as f:
    img.save(f, format='PNG')
img

Run

To run the BBBP dataset, Lipophilicity dataset, BACE dataset, and CHEMBL drugs, simply run

python run_data.py

The result is in ./molecule_data. The code to process the data is in exam_data.py

All other examples in the manuscript is in example.ipynb.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

accfg-0.0.2.tar.gz (20.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

accfg-0.0.2-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file accfg-0.0.2.tar.gz.

File metadata

  • Download URL: accfg-0.0.2.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for accfg-0.0.2.tar.gz
Algorithm Hash digest
SHA256 6fc374a410c79fe627127e2ed80fc21d8f7af453a270303b0ccedf9a3e31ccef
MD5 dd46d10e2b46133c59c6f80126c92245
BLAKE2b-256 cd48fd177ed70a47756a8e6db8d64f87d76c4200a81bae4b676af18a7889b6a3

See more details on using hashes here.

File details

Details for the file accfg-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: accfg-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 19.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for accfg-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e0aecdbea97ab558a96f99433adc9a9b374aaac6642e6c523b03cb6a4907122f
MD5 ad260e5be76c989ba72a7f90d0852172
BLAKE2b-256 be7134693993bc9112de9b340d97cf61e0fe903ae115778c13312d565c1f204f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page