Skip to main content

A Python wrapper for the PubChem PUG REST API. Provides convenient methods to retrieve chemical information using CIDs, SIDs, CAS numbers, SMILES, and InChIKeys, and to convert between chemical identifiers. Built-in rate limiting and retry logic; RDKit integration for SDF/mol retrieval.

Project description

pubchem-compounds

PyPI License: CC BY-NC 4.0 Docs Python 3.9+

A Python wrapper for the PubChem PUG REST API, focused on Compounds and Substances. It provides:

  • Identifier conversion: CAS → CID / SID, InChIKey → CID, SMILES → CID, CID → CAS / EINECS / DTXSID
  • Batch property fetching from lists of CIDs
  • SDF / RDKit mol retrieval for CIDs, SIDs, or CAS numbers
  • SMILES and InChI extraction for any PubChem synonym
  • PFAS classification tree node queries
  • Built-in rate limiting (max 5 req/s, 400 req/min) and automatic retry on HTTP 403

Installation

From PyPI

pip install pubchem-compounds

Note: RDKit is required only for functions that return molecules (get_mols_from_cids, cas_to_mols, synonyms_to_smiles, etc.). Install it separately:

pip install rdkit          # rdkit ≥ 2023.03
# or via conda:
conda install -c conda-forge rdkit

From source

git clone https://gitlab.com/lucmiaz/pubchem.git
cd pubchem
pip install -e .

Install with optional dependencies:

pip install -e ".[dev]"   # pytest, pylint
pip install -e ".[docs]"  # sphinx, sphinx-rtd-theme

Quick start

import pubchem_compounds as pc

CAS → CID

mapping, failed = pc.cas_to_cid("7732-18-5")  # water
print(mapping)   # {'7732-18-5': [962]}

Batch CAS lookup

cas_list = ["7732-18-5", "74-82-8", "71-43-2"]
mapping, failed = pc.cas_to_cid(cas_list)
# {'7732-18-5': [962], '74-82-8': [297], '71-43-2': [241]}

CAS → SMILES

processed, failed = pc.cas_to_smiles(["7732-18-5", "74-82-8"])
print(processed["7732-18-5"])  # 'O'
print(processed["74-82-8"])    # 'C'

CAS → RDKit molecules

mols = pc.cas_to_mols(["7732-18-5", "74-82-8"])
for cas, mol_list in mols.items():
    for mol in mol_list:
        print(cas, mol.GetNumAtoms())

Fetch compound properties for a list of CIDs

data = pc.get_from_cids([962, 297, 241], target="property/MolecularFormula,MolecularWeight")
for prop in data["PropertyTable"]["Properties"]:
    print(prop["CID"], prop["MolecularFormula"], prop["MolecularWeight"])

CID → CAS (reverse lookup)

cas_list = pc.cid_to_cas(962)
print(cas_list)  # ['7732-18-5']

InChIKey → CID

cids = pc.inchikey_to_pubchem("XLYOFNOQVPJJNP-UHFFFAOYSA-N")
print(cids)  # [962]

DTXSID → SMILES

Via PubChem

processed, failed = pc.dtxsid_to_smiles(["DTXSID9020584"])
print(processed["DTXSID9020584"])

Via CompTox

from pubchemcompounds import mols_from_comptox
mols = mols_from_comptox(["DTXSID4059916"])

PFAS classification tree

# Fetch all CIDs from the OECD PFAS list node (default hnid = 5517102)
cids = pc.pubchem_pfas_tree()
print(f"{len(cids)} PFAS CIDs found")

For a full API reference and more examples, see the documentation.

Dependencies

Package Purpose
requests HTTP requests
numpy Rate-limiting random intervals
regex CAS / EINECS / DTXSID pattern matching
tqdm Progress bars for batch operations
rdkit (optional) SDF parsing and mol/SMILES generation

Licence

Copyright © 2024–2026 Luc T. Miaz.
Licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) licence.

Acknowledgments

Developed under the ZeroPM project (WP2) funded by the European Union's Horizon 2020 research and innovation programme (grant agreement No 101036756). Developed at the Department of Environmental Science, Stockholm University.

Powered by RDKit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pubchem_compounds-1.1.3.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pubchem_compounds-1.1.3-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file pubchem_compounds-1.1.3.tar.gz.

File metadata

  • Download URL: pubchem_compounds-1.1.3.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pubchem_compounds-1.1.3.tar.gz
Algorithm Hash digest
SHA256 899ade1b40863bd3f42a72e6d0a24d995640ad1b911efa0847f7f01a4bdace0d
MD5 67dda56900f6b022c2deda9145d1cc3e
BLAKE2b-256 f8147545446eb1659efc18cc98081c297dd7ab1030667229dbdf778e0c7236d0

See more details on using hashes here.

File details

Details for the file pubchem_compounds-1.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for pubchem_compounds-1.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d32ddab9f2150093deb800dd7955aec7c8cd6da19cfd77f617f70024723ae887
MD5 db06c178fe17324cb399a1e6a7b72d24
BLAKE2b-256 a754516edd477064183b06481e716186ef2c89421eebf602b5aa4b83473490e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page