Skip to main content

A package to use several web services to find molecule structures, synonyms and CAS.

Project description

MoleculeResolver

MoleculeResolver

The moleculeresolver was born out of the need to annotate large datasets with accurate structural information fast and to crosscheck whether given metadata (name, SMILES) agrees with each other. It also allows to efficiently compare whether structures are available in two large datasets.

In short it's a Python module that allows you to retrieve molecular structures from multiple chemical databases, perform crosschecks to ensure data reliability, and standardize the best representation of molecules. It also provides functions for comparing molecules and sets of molecules based on specific configurations. This makes it a useful tool for researchers, chemists, or anyone working in computational chemistry / cheminformatics who needs to ensure they are working with the best available data for a molecule.

Installation

The package is available on pypi:

pip install molecule-resolver

While the source code is available here: https://github.com/MoleculeResolver/molecule-resolver

Features

  • ๐Ÿ” Retrieve Molecular Structures: Fetch molecular structures from different chemical databases, including PubChem, Comptox, Chemo, and others.
  • ๐Ÿ†” Support for Different Identifier Types: Retrieve molecular structures using a variety of identifier types, including CAS numbers, SMILES, InChI, InChIkey and common names.
  • โœ… Cross-check Capabilities: Use data from multiple sources to verify molecular structures and identify the best representation.
  • ๐Ÿ”„ Molecule Comparison: Compare molecules or sets of molecules based on their structure, properties, and specified โš™๏ธ configurations.
  • โš™๏ธ Standardization: Standardize molecular structures, including handling isomers, tautomers, and isotopes.
  • ๐Ÿ’พ Caching Mechanism: Use local caching to store molecules and reduce the number of repeated requests to external services, improving performance and reducing latency.

Services used

At this moment, the following services are used to get the best structure for a given identifier. In the future, this list might be reviewed to improve perfomance, adding new services or removing some. In case you want to add an additional service, open an issue or a pull request.

The MoleculeResolver does not offer all options/configurations for each service available with the specific related repos as it focusses on getting the structure based on the identifiers and doing so as accurate as possible while still being fast using parallelization under the hood.

Service Name CAS Formula SMILES InChI InChIKey CID Batch search Repos
cas_registry โœ… โœ… โŒ โœ… โœ… โŒ โŒ โŒ
chebi โœ… โœ… โœ… โœ… โœ… โœ… โŒ โŒ
chemeo โœ… โœ… โŒ โœ… โœ… โœ… โŒ โŒ
cir โœ… โœ… โœ… โœ… โœ… โœ… โŒ โŒ - CIRpy
comptox โœ… โœ… โŒ โŒ โŒ โœ… โŒ โœ…
cts (โœ…) โœ… โŒ โœ… โŒ โŒ โŒ โŒ
nist โœ… โœ… โœ… โœ… โŒ โŒ โŒ โŒ - NistChemPy
opsin โœ… โŒ โŒ โŒ โŒ โŒ โŒ โœ… - py2opsin
- pyopsin
pubchem โœ… โœ… โœ… โœ… โœ… โœ… โœ… โœ… - PubChemPy
srs โœ… โœ… โŒ โŒ โŒ โŒ โŒ โœ…

ChemSpider was not used as it is already included in CIR [1] [2] [3]. ChemIDplus and the Drug Information Portal were retired in 2022 [4].

๐Ÿš€ Usage

Initialization

To use Molecule Resolver, first import and initialize the MoleculeResolver class. it is supposed to be used as a context manager:

from moleculeresolver import MoleculeResolver

with MoleculeResolver(available_service_API_keys={"chemeo": "YOUR_API_KEY"}) as mr:
    ...

Retrieve and Compare Molecules by Name and CAS

Retrieve a molecule using both its common name and CAS number, then compare the two to ensure they represent the same structure:

from rdkit import Chem
from moleculeresolver import MoleculeResolver

with MoleculeResolver(available_service_API_keys={"chemeo": "YOUR_API_KEY"}) as mr:
    molecule_name = mr.find_single_molecule(["aspirin"], ["name"])
    molecule_cas = mr.find_single_molecule(["50-78-2"], ["cas"])
    
    are_same = mr.are_equal(Chem.MolFromSmiles(molecule_name.SMILES), 
                            Chem.MolFromSmiles(molecule_cas.SMILES))
    print(f"Are the molecules the same? {are_same}")

Parallelized Molecule Retrieval and Saving to JSON

Use the parallelized version to retrieve multiple molecules. If a large number of molecules is searched, moleculeresolver will try to use batch download capabilities whenever the database supports this.

import json
from moleculeresolver import MoleculeResolver

molecule_names = ["aspirin", "propanol", "ibuprofen", "non-exixtent-name"]
not_found_molecules = []
molecules_dicts = {}

with MoleculeResolver(available_service_API_keys={"chemeo": "YOUR_API_KEY"}) as mr:
    molecules = mr.find_multiple_molecules_parallelized(molecule_names, [["name"]] * len(molecule_names))
    for name, molecule in zip(molecule_names, molecules):
        if molecule:
            molecules_dicts[name] = molecule.to_dict(found_molecules='remove')
        else:
            not_found_molecules.append(name)

with open("molecules.json", "w") as json_file:
    json.dump(molecules_dicts, json_file, indent=4)

print(f"Molecules not found: {not_found_molecules}")

โš™๏ธ Configuration

The MoleculeResolver class allows users to configure various options like:

  • API Keys: Set API keys for accessing different molecular databases. Currently only chemeo needs one.
  • Standardization Options: Choose how to handle molecular standardization (e.g., normalizing functional groups, disconnecting metals, handling isomers, etc.).
  • Differentiation Settings: Options for distinguishing between isomers, tautomers, and isotopes.

โš ๏ธ Warning

Inchi is included in the set of valid identifiers for various services. You should be aware that using Inchi to get SMILES using RDKit is not the most robust approach. You can read more about it here.

๐Ÿค Contributing

Contributions are welcome! If you have suggestions for improving the Molecule Resolver or want to add new features, feel free to submit an issue or a pull request on GitHub.

๐Ÿ“š Citing

If you use MoleculeResolver in your research, please cite as follows:

Mรผller, S.
How to crack a SMILES: automatic crosschecked chemical structure resolution across multiple services using MoleculeResolver
Journal of Cheminformatics, 17:117 (2025).
DOI: 10.1186/s13321-025-01064-7

@article{Muller2025MoleculeResolver,
  author       = {Mรผller, Simon},
  title        = {How to crack a SMILES: automatic crosschecked chemical structure resolution across multiple services using MoleculeResolver},
  journal      = {Journal of Cheminformatics},
  year         = {2025},
  volume       = {17},
  page         = {117},
  doi          = {10.1186/s13321-025-01064-7},
  url          = {https://doi.org/10.1186/s13321-025-01064-7}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molecule_resolver-0.4.1.tar.gz (13.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

molecule_resolver-0.4.1-py3-none-any.whl (13.0 MB view details)

Uploaded Python 3

File details

Details for the file molecule_resolver-0.4.1.tar.gz.

File metadata

  • Download URL: molecule_resolver-0.4.1.tar.gz
  • Upload date:
  • Size: 13.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for molecule_resolver-0.4.1.tar.gz
Algorithm Hash digest
SHA256 06ef1ce3136b3e1d744266950baa5432f5478600a0a7aaf8bee67cf0c1ec5cf7
MD5 b25acf9b96fbe277c0d67efbffafa4f9
BLAKE2b-256 be778547a9516cbca9f543f60329d5d8056d7f162d8833471ad326570c66ce6b

See more details on using hashes here.

Provenance

The following attestation bundles were made for molecule_resolver-0.4.1.tar.gz:

Publisher: ci.yml on MoleculeResolver/molecule-resolver

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file molecule_resolver-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for molecule_resolver-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 325d844bc3d3029f5e856da142823305292c3cb813b4580ef71d0cd6c119ba5f
MD5 ee35794aa2ca2a521d55068d10fd30b1
BLAKE2b-256 bad23823d7202406ca2db3095dd3f0532618d7906224764e8f71fec9d7624d47

See more details on using hashes here.

Provenance

The following attestation bundles were made for molecule_resolver-0.4.1-py3-none-any.whl:

Publisher: ci.yml on MoleculeResolver/molecule-resolver

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page