Skip to main content

Extended cheminformatics package to work with acidic and basic groups in molecules.

Project description

ABCount logo

Introduction

ABCount is an extended cheminformatics package to work with acidic and basic groups in molecules. The package includes the following functionalities:

  • ABCounter: SMARTS-based matcher to determine the number of acidic and basic groups in molecules.
  • ABClassBuilder: Converter that accepts a dictionary of pKa numerical values and yields an ABClassData object with their corresponding classes such as STRONG, WEAK, and NONE.
  • IonMatcher: Matcher that accepts an ABClassData object and yields an IonDefinition containing information about the major specie at pH 7.4 and its corresponding ionic class and explanation.

How to install the tool

ABCount can be installed from pypi (https://pypi.org/project/abcount).

pip install abcount

Usage

ABCounter

from rdkit import Chem
from abcount import ABCounter

# Use the tool out of the box with default definitions.
mol = Chem.MolFromSmiles("[nH]1nnnc1-c3c2[nH]ncc2ccc3")
abc = ABCounter()
abc.count_acid_and_bases(mol)
{'acid': 2, 'base': 2}
from rdkit import Chem
from abcount import ABCounter

# Point the tool to using your own definitions.
# The format is JSON and attributes must be consistent to those in
# acid_definitions.json and base_definitions.json in abcount/data.
mol = Chem.MolFromSmiles("[nH]1nnnc1-c3c2[nH]ncc2ccc3")
abc = ABCounter(acid_defs_filepath="/my/path/acid_defs.json", base_defs_filepath="/my/path/base_defs.json")
abc.acid_matcher.definitions_fp
PosixPath('/my/path/acid_defs.json')

ABClassBuilder and ABClassData

from abcount import ABClassBuilder

abcb = ABClassBuilder()
# The builder expects two acidic and two basic groups with these key names.
predictions = {"pka_acid1": 3.5, "pka_acid2": None, "pka_base1": 9.785, "pka_base2": None}
abcb.build(predictions)
ABClassData(acid_1_class=<AcidType.STRONG: 'strong_acid'>, acid_2_class=<AcidType.NONE: 'no_acid'>, base_1_class=<BaseType.STRONG: 'strong_base'>, base_2_class=<BaseType.NONE: 'no_base'>)
# to_dict() can be used to obtain a dictionary containing a mix of objects.
# Alternatively, the output can also be serialised using to_json()
abcb.build(predictions).to_json()
'{"acid_1_class": "strong_acid", "acid_2_class": "no_acid", "base_1_class": "strong_base", "base_2_class": "no_base"}'
from abcount import ABClassBuilder, PKaClassBuilder

abcb = ABClassBuilder()
# Custom names can be passed but these need to be
# configured in a `CustomPKaAttribute` class.
predictions = {"my_pka_acid1": 3.5, "my_pka_acid2": None, "my_pka_base1": 9.785, "my_pka_base2": None}
CustomPKaAttribute = PKaClassBuilder.build(ACID_1="my_pka_acid1", BASE_1="my_pka_base1", ACID_2="my_pka_acid2", BASE_2="my_pka_base2")

# The `CustomPKaAttribute` can then be passed to the builder
# which will map the new data to the rules.
abcb.build(predictions, CustomPKaAttribute)
ABClassData(acid_1_class=<AcidType.STRONG: 'strong_acid'>, acid_2_class=<AcidType.NONE: 'no_acid'>, base_1_class=<BaseType.STRONG: 'strong_base'>, base_2_class=<BaseType.NONE: 'no_base'>)
from abcount import ABClassBuilder

abcb = ABClassBuilder()
# It is possible to work with fewer acidic or basic groups
# These can be set as arguments in the builder
predictions = {"pka_acid1": 3.5, "pka_base1": 9.785}
abcb.build(predictions, num_acids=1, num_bases=1)
# Note that despite passing only one group per
# type, the builder still returns two groups each.
ABClassData(acid_1_class=<AcidType.STRONG: 'strong_acid'>, acid_2_class=None, base_1_class=<BaseType.STRONG: 'strong_base'>, base_2_class=None)

IonMatcher

from abcount import ABClassBuilder, IonMatcher

abcb = ABClassBuilder()
predictions = {"pka_acid1": 3.5, "pka_base1": 9.785}
abcd = abcb.build(predictions, num_acids=1, num_bases=1)

ion_matcher = IonMatcher()
ion_matcher.match_class_data(abcd)
IonDefinition(class_data=ABClassData(acid_1_class=<AcidType.STRONG: 'strong_acid'>, acid_2_class=None, base_1_class=<BaseType.STRONG: 'strong_base'>, base_2_class=None), major_species_ph74_class='zwitterion', ion_class='zwitterion', explanation='zwitterion')
# to_json() can also be applied to `IonDefinition`
# to yield a fully serialised representation.
# Alternatively, to_dict() can be used to obtain 
# a dictionary containing a mix of objects.
ion_matcher.match_class_data(abcd).to_dict()
{'class_data': {'acid_1_class': <AcidType.STRONG: 'strong_acid'>, 'acid_2_class': None, 'base_1_class': <BaseType.STRONG: 'strong_base'>, 'base_2_class': None}, 'major_species_ph74_class': 'zwitterion', 'ion_class': 'zwitterion', 'explanation': 'zwitterion'}

SMARTS definitions source for ABCounter

The SMARTS patterns used in this project were obtained from the following sources. Note that definitions are not deduplicated, hence require curation to avoid redundant matching.

  • Pan, X.; Wang, H.; Li, C.; Zhang, J. Z. H.; Ji, C., MolGpka: A Web Server for Small Molecule pKa Prediction Using a Graph-Convolutional Neural Network Journal of Chemical Information and Modeling 2021, 61 (7), 3159–3165. DOI: 10.1021/acs.jcim.1c00075
  • Wu, J.; Wan, Y.; Wu, Z.; Zhang, S.; Cao, D.; Hsieh, C.-Y.; Hou, T., MF-SuP-pKa: Multi-fidelity modeling with subgraph pooling mechanism for pKa prediction Acta Pharmaceutica Sinica B 2023, 13 (6). DOI: 10.26434/chemrxiv-2022-t6q61
  • Some manually curated definitions.

Some useful commands

  • Generate acidic and basic definitions from aggregated data: python abcount/_definitions.py. A follow up on how definitions can be curated will be provided.
  • Run tests: pytest -vss tests/test.py
  • Run validation: cd tests && validation.py. This will also generate four CSV files listing out false positives and negatives for the test data.

For developers

  • The package was created using uv (https://docs.astral.sh/uv/).
  • The package can be installed from the wheel in the dist/ folder. When a new version needs to be released, a new wheel must be built. That can be done by changing the version of the package inside pyproject.toml then calling uv build which will create a new build.
  • The code can be automatically tested using pytest -vss tests/test.py which requires pytest to be installed.
  • The Makefile can also be used for building (make build) or testing (make test).
  • Before committing new code, please always check that the style and syntax are compliant using pre-commit.

Setting up your development environment

The pyproject.toml already contains the optional dependencies needed for development. Follow these steps to set up the environment.

# Make sure you have got Python >= 3.10
python --version
> Python 3.12.7

# Installs `abcount` in editable mode and with dev dependencies
pip install -e .[dev]
> ...
> Successfully installed abcount ...

# Setup pre-commit hooks
pre-commit install
> pre-commit installed at .git/hooks/pre-commit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abcount-0.2.1.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

abcount-0.2.1-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file abcount-0.2.1.tar.gz.

File metadata

  • Download URL: abcount-0.2.1.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for abcount-0.2.1.tar.gz
Algorithm Hash digest
SHA256 526128829a6e56fa6b2499a5ada85a7b6bc20bd159217dd0628141fcac19c6cb
MD5 1744fecf2df9c1a76b4b0a93c04d527d
BLAKE2b-256 d521469eb910843576a91ee4693ba659f06e37867dfca3e3e7b6628bbebce542

See more details on using hashes here.

File details

Details for the file abcount-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: abcount-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for abcount-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 02e66ada66742ca339ff91ba211ed2740830e43985c0db7caf619188fb6a5e86
MD5 c2bf5fa205e9b2e6a3b84cb12e0e87b4
BLAKE2b-256 9a0df053216f1a2e28d730257539a948dda4b474a023913272ecd5c9cad665c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page