Skip to main content

A package to use several web services to find molecule structures, synonyms and CAS.

Project description

moleculeresolver

The moleculeresolver was born out of the need to annotate large datasets with accurate structural information fast and to crosscheck whether given metadata (name, SMILES) agrees with each other. It also allows to efficiently compare whether structures are available in two large datasets.

In short it's a Python module that allows you to retrieve molecular structures from multiple chemical databases, perform crosschecks to ensure data reliability, and standardize the best representation of molecules. It also provides functions for comparing molecules and sets of molecules based on specific configurations. This makes it a useful tool for researchers, chemists, or anyone working in computational chemistry / cheminformatics who needs to ensure they are working with the best available data for a molecule. The tool

Installation

The package is available on pypi:

pip install molecule-resolver

Features

  • 🔍 Retrieve Molecular Structures: Fetch molecular structures from different chemical databases, including PubChem, Comptox, Chemo, and others.
  • 🆔 Support for Different Identifier Types: Retrieve molecular structures using a variety of identifier types, including CAS numbers, SMILES, InChI, InChIkey and common names.
  • ✅ Cross-check Capabilities: Use data from multiple sources to verify molecular structures and identify the best representation.
  • 🔄 Molecule Comparison: Compare molecules or sets of molecules based on their structure, properties, and specified ⚙️ configurations.
  • ⚙️ Standardization: Standardize molecular structures, including handling isomers, tautomers, and isotopes.
  • 💾 Caching Mechanism: Use local caching to store molecules and reduce the number of repeated requests to external services, improving performance and reducing latency.

🚀 Usage

Initialization

To use Molecule Resolver, first import and initialize the MoleculeResolver class. it is supposed to be used as a context manager:

from moleculeresolver import MoleculeResolver

with MoleculeResolver(available_service_API_keys={"chemeo": "YOUR_API_KEY"}) as mr:
    ...

Retrieve and Compare Molecules by Name and CAS

Retrieve a molecule using both its common name and CAS number, then compare the two to ensure they represent the same structure:

from rdkit import Chem
from moleculeresolver import MoleculeResolver

with MoleculeResolver(available_service_API_keys={"chemeo": "YOUR_API_KEY"}) as mr:
    molecule_name = mr.find_single_molecule(["aspirin"], ["name"])
    molecule_cas = mr.find_single_molecule(["50-78-2"], ["cas"])
    
    are_same = mr.are_equal(Chem.MolFromSmiles(molecule_name.SMILES), 
                            Chem.MolFromSmiles(molecule_cas.SMILES))
    print(f"Are the molecules the same? {are_same}")

Parallelized Molecule Retrieval and Saving to JSON

Use the parallelized version to retrieve multiple molecules. If a large number of molecules is searched, moleculeresolver will try to use batch download capabilities whenever the database supports this.

import json
from moleculeresolver import MoleculeResolver

molecule_names = ["aspirin", "propanol", "ibuprofen", "non-exixtent-name"]
not_found_molecules = []
molecules_dicts = {}

with MoleculeResolver(available_service_API_keys={"chemeo": "YOUR_API_KEY"}) as mr:
    molecules = mr.find_multiple_molecules_parallelized(molecule_names, [["name"]] * len(molecule_names))
    for name, molecule in zip(molecule_names, molecules):
        if molecule:
            molecules_dicts[name] = molecule.to_dict(found_molecules='remove')
        else:
            not_found_molecules.append(name)

with open("molecules.json", "w") as json_file:
    json.dump(molecules_dicts, json_file, indent=4)

print(f"Molecules not found: {not_found_molecules}")

⚙️ Configuration

The MoleculeResolver class allows users to configure various options like:

  • API Keys: Set API keys for accessing different molecular databases. Currently only chemeo needs one.
  • Standardization Options: Choose how to handle molecular standardization (e.g., normalizing functional groups, disconnecting metals, handling isomers, etc.).
  • Differentiation Settings: Options for distinguishing between isomers, tautomers, and isotopes.

🤝 Contributing

Contributions are welcome! If you have suggestions for improving the Molecule Resolver or want to add new features, feel free to submit an issue or a pull request on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molecule_resolver-0.2.4.tar.gz (76.9 kB view details)

Uploaded Source

Built Distribution

molecule_resolver-0.2.4-py3-none-any.whl (78.1 kB view details)

Uploaded Python 3

File details

Details for the file molecule_resolver-0.2.4.tar.gz.

File metadata

  • Download URL: molecule_resolver-0.2.4.tar.gz
  • Upload date:
  • Size: 76.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for molecule_resolver-0.2.4.tar.gz
Algorithm Hash digest
SHA256 61e7182091131f149f686031e97024c1a9b29b49ef0bb3c249685cbc41bda0d0
MD5 f4bcfbc0c5057f6e873a9823f56099db
BLAKE2b-256 ed737c33773f08574b82e79a63c0b7d7570c983510aecf591a0c154a3172095a

See more details on using hashes here.

Provenance

The following attestation bundles were made for molecule_resolver-0.2.4.tar.gz:

Publisher: ci.yml on MoleculeResolver/molecule-resolver

Attestations:

File details

Details for the file molecule_resolver-0.2.4-py3-none-any.whl.

File metadata

File hashes

Hashes for molecule_resolver-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 05b7379554b11452fc227ecc68b722cca81c08d00f7544ba5a7d0af0ddc53c45
MD5 f5526a91b27e7fa58e5767f663ae6762
BLAKE2b-256 f582201f5bdabf68b5eaeb057512855ace92c3f06f2b931b7f3221d87217d4d4

See more details on using hashes here.

Provenance

The following attestation bundles were made for molecule_resolver-0.2.4-py3-none-any.whl:

Publisher: ci.yml on MoleculeResolver/molecule-resolver

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page