AccFG: Molecule functional group extraction and comparison
Project description
AccFG: Accurate Functional Group Extraction and Molecular Structure Comparison
🚀News
-
(9/22/2025) AccFG v0.0.7: Fix bugs during import
-
AccFG v0.0.4: Update FG names with lowercase; Update 3 new FGs (chloroformate etc.)
-
AccFG v0.0.3:
- update
AccFG.run_mol()for directing processing rdkitMolobject - Lite version of AccFg is available through
AccFG(lite=True), this will load a simplified FG list (e.g., no primary/secondary hydroxyl but only hydroxyl)
- update
📝Introduction
This is the official code repository for the paper AccFG: Accurate Functional Group Extraction and Molecular Structure Comparison. AccFG is a tool for precise functional group (FG) extraction and molecular structure comparison.
📥Installation
We provide two methods to install AccFG:
Method 1: Installation by pip (recommended)
pip install accfg
Method 2: Installation from GitHub repository
To install AccFG, follow these steps:
- Clone/download the repository and navigate to the project directory:
git clone https://github.com/xuanliugit/AccFG.git cd AccFG
- Install the required dependencies:
conda create --name accfg python=3.10 conda activate accfg pip install -r requirements.txt # Or "pip install -e ."
☎️Call for new functional groups
The FG dictionary is stored in ./accfg/fgs_common.csv and ./accfg/fgs_heterocycle.csv. You are welcome to report new functional groups or errors in the current files by opening an issue on GitHub or emailing the author at xliu254@illinois.edu. Your contributions will be acknowledged on this page.
Note: The two fgs*.csv files are custom-formatted to be compatible with the AccFG.csv_to_dict() function in ./accfg/main.py. Compared to standard CSV files, they include additional annotation syntax to support structured parsing. Lines that begin with % are treated as comments to enhance readability and are excluded during data import.
⚙️Usage
Quick start:
# Get functional groups from SMILES
python run_accfg.py 'CN(C)/N=N/C1=C(NC=N1)C(=O)N'
# Compare two molecules
python run_accfg.py 'CNC(=O)Cc1nc(-c2ccccc2)cs1' --compare_smi 'CCNCCc1nc2ccccc2s1'
FG extraction
To extract functional groups:
# example.py
from accfg import AccFG
afg = AccFG(print_load_info=True)
smi = 'CN(C)/N=N/C1=C(NC=N1)C(=O)N'
fgs,fg_graph = afg.run(smi, show_atoms=True, show_graph=True)
print_fg_tree(fg_graph, fgs.keys(), show_atom_idx=True)
'''
├──Primary amide: ((10, 12, 11),)
...
'''
print(fgs)
'''
{'Primary amide': [(10, 12, 11)], 'Triazene': [(1, 3, 4)], 'imidazole': [(5, 9, 8, 7, 6)]}
'''
User-defined FGs Example:
# example.py
from accfg import AccFG
my_fgs_dict = {'Cephem': 'O=C(O)C1=CCS[C@@H]2CC(=O)N12', 'Thioguanine': 'Nc1nc(=S)c2[nH]cnc2[nH]1'}
my_afg = AccFG(user_defined_fgs=my_fgs_dict,print_load_info=True)
cephalosporin_C = 'CC(=O)OCC1=C(N2[C@@H]([C@@H](C2=O)NC(=O)CCC[C@H](C(=O)O)N)SC1)C(=O)O'
fgs,fg_graph = my_afg.run(cephalosporin_C, show_atoms=True, show_graph=True)
print_fg_tree(fg_graph, fgs.keys(), show_atom_idx=True) # This will print the FG tree
'''
├──Primary aliphatic amine: ((21,),)
├──...
'''
To print functional groups:
print(fgs) # Show top level FGs
'''
{'Primary aliphatic amine': [(21,)],
'Carboxylic acid': [(22, 23, 24)],
'Carboxylic ester': [(1, 2, 3, 4)],
'Secondary amide': [(15, 16, 14, 13)],
'Cephem': [(8, 7, 9, 6, 5, 27, 26, 25, 13, 11, 12, 10)]}
'''
FG extraction visualization
from accfg import draw_mol_with_fgs, molimg
molimg(draw_mol_with_fgs(cephalosporin_C, afg=my_afg, img_size=(900,900)))
This will show image with FGs highlighted
Molecular structure comparison
from accfg import AccFG, compare_mols, draw_compare_mols
smi_1,smi_2 = ('CNC(=O)Cc1nc(-c2ccccc2)cs1','CCNCCc1nc2ccccc2s1')
diff = compare_mols(smi_1, smi_2)
print(diff) # This print the structure difference
'''
(([('Secondary amide', 1, [(2, 3, 1)]),
...
'''
draw_RascalMCES(smi_1, smi_2) # This draw the RascalMCES comparison
Molecular structure comparison visualization
img = img_grid(draw_compare_mols(smi_1, smi_2),num_columns=2)
with open('results/compare_mols.png', 'wb') as f:
img.save(f, format='PNG')
img
⚒️Run
To run the BBBP dataset, Lipophilicity dataset, BACE dataset, and CHEMBL drugs, simply run
python run_data.py
The result is in ./molecule_data. The code to process the data is in exam_data.py
All other examples in the manuscript is in example.ipynb.
Cite this work
@article{liu2025accfg,
title={AccFG: Accurate Functional Group Extraction and Molecular Structure Comparison},
author={Liu, Xuan and Swaminathan, Sarathkrishna and Zubarev, Dmitry and Ransom, Brandi and Park, Nathaniel and Schmidt, Kristin and Zhao, Huimin},
journal={Journal of Chemical Information and Modeling},
year={2025},
publisher={ACS Publications}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file accfg-0.0.7.tar.gz.
File metadata
- Download URL: accfg-0.0.7.tar.gz
- Upload date:
- Size: 22.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc44679160bb06d3489076d906a719bd7d4c62f4c54d5345e7918b4b8fd89ad3
|
|
| MD5 |
c88dd2fa1beed92fea91babc243f9e26
|
|
| BLAKE2b-256 |
5de169afe03c206b54935004b8f423f975dc1883f18ab6f9518e2f14e182c24b
|
File details
Details for the file accfg-0.0.7-py3-none-any.whl.
File metadata
- Download URL: accfg-0.0.7-py3-none-any.whl
- Upload date:
- Size: 20.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f64b7c2d028959ff7f7a1734bc3382eba2abc404e7a59572f3a2944323e10729
|
|
| MD5 |
e8d8341a0b61d95863bda8a3e31f3d2c
|
|
| BLAKE2b-256 |
264e0bfa50d1a4cd37135b3d06bbcd230b238d38e45e57e11b2033c8cc302524
|