Skip to main content

AccFG: Molecule functional group extraction and molecular structure comparison

Project description

AccFG: Accurate Functional Group Extraction and Molecular Structure Comparison

PyPI - Version Documentation Open In Colab Paper Code PyPI Downloads

🚀News

  • Try AccFG in Colab.

  • (9/22/2025) AccFG v0.0.7: Fix bugs during import

  • AccFG v0.0.4: Update FG names with lowercase; Update 3 new FGs (chloroformate etc.)

  • AccFG v0.0.3:

    • update AccFG.run_mol() for directing processing rdkit Mol object
    • Lite version of AccFg is available through AccFG(lite=True), this will load a simplified FG list (e.g., no primary/secondary hydroxyl but only hydroxyl)

📝Introduction

This is the official code repository for the paper AccFG: Accurate Functional Group Extraction and Molecular Structure Comparison. AccFG is a tool for precise functional group (FG) extraction and molecular structure comparison.

📥Installation

We provide two methods to install AccFG:

Method 1: Installation by pip (recommended)

pip install accfg

Method 2: Installation from GitHub repository

To install AccFG, follow these steps:

  1. Clone/download the repository and navigate to the project directory:
    git clone https://github.com/xuanliugit/AccFG.git
    cd AccFG
    
  2. Install the required dependencies:
    conda create --name accfg python=3.10
    conda activate accfg
    pip install -r requirements.txt 
    # Or "pip install -e ."
    

Functional groups in AccFG

The FG dictionary is stored in ./accfg/fgs_common.csv and ./accfg/fgs_heterocycle.csv.

Note: The two fgs*.csv files are custom-formatted to be compatible with the AccFG.csv_to_dict() function in ./accfg/main.py. Compared to standard CSV files, they include additional annotation syntax to support structured parsing. Lines that begin with % are treated as comments to enhance readability and are excluded during data import.

⚙️Usage

Quick start:

# Get functional groups from SMILES
python run_accfg.py 'CN(C)/N=N/C1=C(NC=N1)C(=O)N'

# Compare two molecules
python run_accfg.py 'CNC(=O)Cc1nc(-c2ccccc2)cs1' --compare_smi 'CCNCCc1nc2ccccc2s1'

FG extraction

To extract functional groups:

# example.py
from accfg import AccFG
from accfg.draw import print_fg_tree

afg = AccFG(print_load_info=True)
smi = 'CN(C)/N=N/C1=C(NC=N1)C(=O)N'

fgs,fg_graph = afg.run(smi, show_atoms=True, show_graph=True)

print_fg_tree(fg_graph, fgs.keys(), show_atom_idx=True)
'''
├──Primary amide: ((10, 12, 11),)
...
'''
print(fgs)
'''
{'Primary amide': [(10, 12, 11)], 'Triazene': [(1, 3, 4)], 'imidazole': [(5, 9, 8, 7, 6)]}
'''

User-defined FGs Example:

# example.py
from accfg import AccFG
from accfg.draw import print_fg_tree

my_fgs_dict = {'Cephem': 'O=C(O)C1=CCS[C@@H]2CC(=O)N12', 'Thioguanine': 'Nc1nc(=S)c2[nH]cnc2[nH]1'}
my_afg = AccFG(user_defined_fgs=my_fgs_dict,print_load_info=True)

cephalosporin_C = 'CC(=O)OCC1=C(N2[C@@H]([C@@H](C2=O)NC(=O)CCC[C@H](C(=O)O)N)SC1)C(=O)O'
fgs,fg_graph = my_afg.run(cephalosporin_C, show_atoms=True, show_graph=True)

print_fg_tree(fg_graph, fgs.keys(), show_atom_idx=True) # This will print the FG tree

'''
├──Primary aliphatic amine: ((21,),)
├──...
'''

To print functional groups:

print(fgs) # Show top level FGs
'''
{'Primary aliphatic amine': [(21,)],
 'Carboxylic acid': [(22, 23, 24)],
 'Carboxylic ester': [(1, 2, 3, 4)],
 'Secondary amide': [(15, 16, 14, 13)],
 'Cephem': [(8, 7, 9, 6, 5, 27, 26, 25, 13, 11, 12, 10)]}
'''

FG extraction visualization

from accfg.draw import draw_mol_with_fgs, molimg

molimg(draw_mol_with_fgs(cephalosporin_C, afg=my_afg, img_size=(900,900)))

This will show image with FGs highlighted

Molecular structure comparison

from accfg import AccFG, compare_mols
from accfg.draw import draw_compare_mols, draw_RascalMCES, img_grid

smi_1,smi_2 = ('CNC(=O)Cc1nc(-c2ccccc2)cs1','CCNCCc1nc2ccccc2s1')
diff = compare_mols(smi_1, smi_2)
print(diff) # This print the structure difference
'''
(([('Secondary amide', 1, [(2, 3, 1)]),
   ...
'''

draw_RascalMCES(smi_1, smi_2) # This draw the RascalMCES comparison

Molecular structure comparison visualization

img = img_grid(draw_compare_mols(smi_1, smi_2),num_columns=2)
with open('results/compare_mols.png', 'wb') as f:
    img.save(f, format='PNG')
img

⚒️Run

To run the BBBP dataset, Lipophilicity dataset, BACE dataset, and CHEMBL drugs, simply run:

python run_data.py

The result is in ./molecule_data. The code to process the data is in exam_data.py

All other examples in the manuscript are in example.ipynb.

Cite this work

@article{liu2025accfg,
  title={AccFG: Accurate Functional Group Extraction and Molecular Structure Comparison},
  author={Liu, Xuan and Swaminathan, Sarathkrishna and Zubarev, Dmitry and Ransom, Brandi and Park, Nathaniel and Schmidt, Kristin and Zhao, Huimin},
  journal={Journal of Chemical Information and Modeling},
  volume={65},
  number={16},
  pages={8593--8602},
  year={2025},
  publisher={ACS Publications}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

accfg-0.0.9.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

accfg-0.0.9-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file accfg-0.0.9.tar.gz.

File metadata

  • Download URL: accfg-0.0.9.tar.gz
  • Upload date:
  • Size: 23.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for accfg-0.0.9.tar.gz
Algorithm Hash digest
SHA256 f26ef55f993b9e57e2882752ebf938a65d15b5e570955a1b2fa7dda8b4b038d1
MD5 bd7d922f51b465f50e24541789c3751a
BLAKE2b-256 3feafcb0daa4d271d2c42adb403bc343f79bab610dfe3122fcb143949c2c1dfb

See more details on using hashes here.

File details

Details for the file accfg-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: accfg-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for accfg-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 f50dc8fae138b62784bc419e3523961c2251cc9e5c6fbcb7bbc4c4014da54b6a
MD5 ff7046862228244de83fdd9d667839eb
BLAKE2b-256 2a3b77bbb341cb54bf6a60e0d933dcbf65d244b34d345e2f49e65dd017378e05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page