Skip to main content

AccFG: Molecule functional group extraction and comparison

Project description

AccFG: Accurate Functional Group Extraction and Molecular Structure Comparison

PyPI version PyPI Downloads Python Paper Code

🚀News

  • (8/24/2025) AccFG v0.0.5: Fix bugs during import

  • AccFG v0.0.4: Update FG names with lowercase; Update 3 new FGs (chloroformate etc.)

  • AccFG v0.0.3:

    • update AccFG.run_mol() for directing processing rdkit Mol object
    • Lite version of AccFg is available through AccFG(lite=True), this will load a simplified FG list (e.g., no primary/secondary hydroxyl but only hydroxyl)

📝Introduction

This is the official code repository for the paper AccFG: Accurate Functional Group Extraction and Molecular Structure Comparison. AccFG is a tool for precise functional group (FG) extraction and molecular structure comparison.

📥Installation

We provide two methods to install AccFG:

Method 1: Installation by pip (recommended)

pip install accfg

Method 2: Installation from GitHub repository

To install AccFG, follow these steps:

  1. Clone/download the repository and navigate to the project directory:
    git clone https://github.com/xuanliugit/AccFG.git
    cd AccFG
    
  2. Install the required dependencies:
    conda create --name accfg python=3.10
    conda activate accfg
    pip install -r requirements.txt
    

☎️Call for new functional groups

The FG dictionary is stored in ./accfg/fgs_common.csv and ./accfg/fgs_heterocycle.csv. You are welcome to report new functional groups or errors in the current files by opening an issue on GitHub or emailing the author at xliu254@illinois.edu. Your contributions will be acknowledged on this page.

Note: The two fgs*.csv files are custom-formatted to be compatible with the AccFG.csv_to_dict() function in ./accfg/main.py. Compared to standard CSV files, they include additional annotation syntax to support structured parsing. Lines that begin with % are treated as comments to enhance readability and are excluded during data import.

⚙️Usage

Quick start:

# Get functional groups from SMILES
python run_accfg.py 'CN(C)/N=N/C1=C(NC=N1)C(=O)N'

# Compare two molecules
python run_accfg.py 'CNC(=O)Cc1nc(-c2ccccc2)cs1' --compare_smi 'CCNCCc1nc2ccccc2s1'

FG extraction

To extract functional groups:

# example.py
from accfg import AccFG

afg = AccFG(print_load_info=True)
smi = 'CN(C)/N=N/C1=C(NC=N1)C(=O)N'

fgs,fg_graph = afg.run(smi, show_atoms=True, show_graph=True)

print_fg_tree(fg_graph, fgs.keys(), show_atom_idx=True)
'''
├──Primary amide: ((10, 12, 11),)
...
'''
print(fgs)
'''
{'Primary amide': [(10, 12, 11)], 'Triazene': [(1, 3, 4)], 'imidazole': [(5, 9, 8, 7, 6)]}
'''

User-defined FGs Example:

# example.py
from accfg import AccFG

my_fgs_dict = {'Cephem': 'O=C(O)C1=CCS[C@@H]2CC(=O)N12', 'Thioguanine': 'Nc1nc(=S)c2[nH]cnc2[nH]1'}
my_afg = AccFG(user_defined_fgs=my_fgs_dict,print_load_info=True)

cephalosporin_C = 'CC(=O)OCC1=C(N2[C@@H]([C@@H](C2=O)NC(=O)CCC[C@H](C(=O)O)N)SC1)C(=O)O'
fgs,fg_graph = my_afg.run(cephalosporin_C, show_atoms=True, show_graph=True)

print_fg_tree(fg_graph, fgs.keys(), show_atom_idx=True) # This will print the FG tree

'''
├──Primary aliphatic amine: ((21,),)
├──...
'''

To print functional groups:

print(fgs) # Show top level FGs
'''
{'Primary aliphatic amine': [(21,)],
 'Carboxylic acid': [(22, 23, 24)],
 'Carboxylic ester': [(1, 2, 3, 4)],
 'Secondary amide': [(15, 16, 14, 13)],
 'Cephem': [(8, 7, 9, 6, 5, 27, 26, 25, 13, 11, 12, 10)]}
'''

FG extraction visualization

from accfg import draw_mol_with_fgs, molimg

molimg(draw_mol_with_fgs(cephalosporin_C, afg=my_afg, img_size=(900,900)))

This will show image with FGs highlighted

Molecular structure comparison

from accfg import AccFG, compare_mols, draw_compare_mols

smi_1,smi_2 = ('CNC(=O)Cc1nc(-c2ccccc2)cs1','CCNCCc1nc2ccccc2s1')
diff = compare_mols(smi_1, smi_2)
print(diff) # This print the structure difference
'''
(([('Secondary amide', 1, [(2, 3, 1)]),
   ...
'''

draw_RascalMCES(smi_1, smi_2) # This draw the RascalMCES comparison

Molecular structure comparison visualization

img = img_grid(draw_compare_mols(smi_1, smi_2),num_columns=2)
with open('results/compare_mols.png', 'wb') as f:
    img.save(f, format='PNG')
img

⚒️Run

To run the BBBP dataset, Lipophilicity dataset, BACE dataset, and CHEMBL drugs, simply run

python run_data.py

The result is in ./molecule_data. The code to process the data is in exam_data.py

All other examples in the manuscript is in example.ipynb.

Cite this work

@article{liu2025accfg,
  title={AccFG: Accurate Functional Group Extraction and Molecular Structure Comparison},
  author={Liu, Xuan and Swaminathan, Sarathkrishna and Zubarev, Dmitry and Ransom, Brandi and Park, Nathaniel and Schmidt, Kristin and Zhao, Huimin},
  journal={Journal of Chemical Information and Modeling},
  year={2025},
  publisher={ACS Publications}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

accfg-0.0.5.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

accfg-0.0.5-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file accfg-0.0.5.tar.gz.

File metadata

  • Download URL: accfg-0.0.5.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for accfg-0.0.5.tar.gz
Algorithm Hash digest
SHA256 5fd28192469697c294a520ca4b0b383972c5c9e9538ea0e50697b815512db7d0
MD5 9868082a81fa9521d4b4bb872d55b0f6
BLAKE2b-256 2fa5c2ace974f1f931b5efcca768e115ae9a39ebf49a997699a3e9f3fbf5610a

See more details on using hashes here.

File details

Details for the file accfg-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: accfg-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for accfg-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 43a577f5ea70aafccce36dca354bb1d1b57179899f3a24cc85a4dba1dd161d18
MD5 99163136c67ac0dd0fc6af4e34fedaa3
BLAKE2b-256 64eaef53dead46882f8227038f86a33e699868a0a285ca678f35aad8f1afe0b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page