Skip to main content

AccFG: Molecule functional group extraction and comparison

Project description

AccFG: Accurate Functional Group Extraction and Molecular Structure Comparison

PyPI version PyPI Downloads Python Paper Code

🚀News

  • (9/22/2025) AccFG v0.0.7: Fix bugs during import

  • AccFG v0.0.4: Update FG names with lowercase; Update 3 new FGs (chloroformate etc.)

  • AccFG v0.0.3:

    • update AccFG.run_mol() for directing processing rdkit Mol object
    • Lite version of AccFg is available through AccFG(lite=True), this will load a simplified FG list (e.g., no primary/secondary hydroxyl but only hydroxyl)

📝Introduction

This is the official code repository for the paper AccFG: Accurate Functional Group Extraction and Molecular Structure Comparison. AccFG is a tool for precise functional group (FG) extraction and molecular structure comparison.

📥Installation

We provide two methods to install AccFG:

Method 1: Installation by pip (recommended)

pip install accfg

Method 2: Installation from GitHub repository

To install AccFG, follow these steps:

  1. Clone/download the repository and navigate to the project directory:
    git clone https://github.com/xuanliugit/AccFG.git
    cd AccFG
    
  2. Install the required dependencies:
    conda create --name accfg python=3.10
    conda activate accfg
    pip install -r requirements.txt 
    # Or "pip install -e ."
    

☎️Call for new functional groups

The FG dictionary is stored in ./accfg/fgs_common.csv and ./accfg/fgs_heterocycle.csv. You are welcome to report new functional groups or errors in the current files by opening an issue on GitHub or emailing the author at xliu254@illinois.edu. Your contributions will be acknowledged on this page.

Note: The two fgs*.csv files are custom-formatted to be compatible with the AccFG.csv_to_dict() function in ./accfg/main.py. Compared to standard CSV files, they include additional annotation syntax to support structured parsing. Lines that begin with % are treated as comments to enhance readability and are excluded during data import.

⚙️Usage

Quick start:

# Get functional groups from SMILES
python run_accfg.py 'CN(C)/N=N/C1=C(NC=N1)C(=O)N'

# Compare two molecules
python run_accfg.py 'CNC(=O)Cc1nc(-c2ccccc2)cs1' --compare_smi 'CCNCCc1nc2ccccc2s1'

FG extraction

To extract functional groups:

# example.py
from accfg import AccFG

afg = AccFG(print_load_info=True)
smi = 'CN(C)/N=N/C1=C(NC=N1)C(=O)N'

fgs,fg_graph = afg.run(smi, show_atoms=True, show_graph=True)

print_fg_tree(fg_graph, fgs.keys(), show_atom_idx=True)
'''
├──Primary amide: ((10, 12, 11),)
...
'''
print(fgs)
'''
{'Primary amide': [(10, 12, 11)], 'Triazene': [(1, 3, 4)], 'imidazole': [(5, 9, 8, 7, 6)]}
'''

User-defined FGs Example:

# example.py
from accfg import AccFG

my_fgs_dict = {'Cephem': 'O=C(O)C1=CCS[C@@H]2CC(=O)N12', 'Thioguanine': 'Nc1nc(=S)c2[nH]cnc2[nH]1'}
my_afg = AccFG(user_defined_fgs=my_fgs_dict,print_load_info=True)

cephalosporin_C = 'CC(=O)OCC1=C(N2[C@@H]([C@@H](C2=O)NC(=O)CCC[C@H](C(=O)O)N)SC1)C(=O)O'
fgs,fg_graph = my_afg.run(cephalosporin_C, show_atoms=True, show_graph=True)

print_fg_tree(fg_graph, fgs.keys(), show_atom_idx=True) # This will print the FG tree

'''
├──Primary aliphatic amine: ((21,),)
├──...
'''

To print functional groups:

print(fgs) # Show top level FGs
'''
{'Primary aliphatic amine': [(21,)],
 'Carboxylic acid': [(22, 23, 24)],
 'Carboxylic ester': [(1, 2, 3, 4)],
 'Secondary amide': [(15, 16, 14, 13)],
 'Cephem': [(8, 7, 9, 6, 5, 27, 26, 25, 13, 11, 12, 10)]}
'''

FG extraction visualization

from accfg import draw_mol_with_fgs, molimg

molimg(draw_mol_with_fgs(cephalosporin_C, afg=my_afg, img_size=(900,900)))

This will show image with FGs highlighted

Molecular structure comparison

from accfg import AccFG, compare_mols, draw_compare_mols

smi_1,smi_2 = ('CNC(=O)Cc1nc(-c2ccccc2)cs1','CCNCCc1nc2ccccc2s1')
diff = compare_mols(smi_1, smi_2)
print(diff) # This print the structure difference
'''
(([('Secondary amide', 1, [(2, 3, 1)]),
   ...
'''

draw_RascalMCES(smi_1, smi_2) # This draw the RascalMCES comparison

Molecular structure comparison visualization

img = img_grid(draw_compare_mols(smi_1, smi_2),num_columns=2)
with open('results/compare_mols.png', 'wb') as f:
    img.save(f, format='PNG')
img

⚒️Run

To run the BBBP dataset, Lipophilicity dataset, BACE dataset, and CHEMBL drugs, simply run

python run_data.py

The result is in ./molecule_data. The code to process the data is in exam_data.py

All other examples in the manuscript is in example.ipynb.

Cite this work

@article{liu2025accfg,
  title={AccFG: Accurate Functional Group Extraction and Molecular Structure Comparison},
  author={Liu, Xuan and Swaminathan, Sarathkrishna and Zubarev, Dmitry and Ransom, Brandi and Park, Nathaniel and Schmidt, Kristin and Zhao, Huimin},
  journal={Journal of Chemical Information and Modeling},
  year={2025},
  publisher={ACS Publications}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

accfg-0.0.7.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

accfg-0.0.7-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file accfg-0.0.7.tar.gz.

File metadata

  • Download URL: accfg-0.0.7.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for accfg-0.0.7.tar.gz
Algorithm Hash digest
SHA256 bc44679160bb06d3489076d906a719bd7d4c62f4c54d5345e7918b4b8fd89ad3
MD5 c88dd2fa1beed92fea91babc243f9e26
BLAKE2b-256 5de169afe03c206b54935004b8f423f975dc1883f18ab6f9518e2f14e182c24b

See more details on using hashes here.

File details

Details for the file accfg-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: accfg-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.15

File hashes

Hashes for accfg-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 f64b7c2d028959ff7f7a1734bc3382eba2abc404e7a59572f3a2944323e10729
MD5 e8d8341a0b61d95863bda8a3e31f3d2c
BLAKE2b-256 264e0bfa50d1a4cd37135b3d06bbcd230b238d38e45e57e11b2033c8cc302524

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page