Skip to main content

A Python wrapper for the PubChem PUG REST API. Provides convenient methods to retrieve chemical information using CIDs, SIDs, CAS numbers, SMILES, and InChIKeys, and to convert between chemical identifiers. Built-in rate limiting and retry logic; RDKit integration for SDF/mol retrieval.

Project description

pubchem-compounds

PyPI License: CC BY-NC 4.0 Docs Python 3.9+

A Python wrapper for the PubChem PUG REST API, focused on Compounds and Substances. It provides:

  • Identifier conversion: CAS → CID / SID, InChIKey → CID, SMILES → CID, CID → CAS / EINECS / DTXSID
  • Batch property fetching from lists of CIDs
  • SDF / RDKit mol retrieval for CIDs, SIDs, or CAS numbers
  • SMILES and InChI extraction for any PubChem synonym
  • PFAS classification tree node queries
  • Built-in rate limiting (max 5 req/s, 400 req/min) and automatic retry on HTTP 403

Installation

From PyPI

pip install pubchem-compounds

Note: RDKit is required only for functions that return molecules (get_mols_from_cids, cas_to_mols, synonyms_to_smiles, etc.). Install it separately:

pip install rdkit          # rdkit ≥ 2023.03
# or via conda:
conda install -c conda-forge rdkit

From source

git clone https://gitlab.com/lucmiaz/pubchem.git
cd pubchem
pip install -e .

Install with optional dependencies:

pip install -e ".[dev]"   # pytest, pylint
pip install -e ".[docs]"  # sphinx, sphinx-rtd-theme

Quick start

import pubchem_compounds as pc

CAS → CID

mapping, failed = pc.cas_to_cid("7732-18-5")  # water
print(mapping)   # {'7732-18-5': [962]}

Batch CAS lookup

cas_list = ["7732-18-5", "74-82-8", "71-43-2"]
mapping, failed = pc.cas_to_cid(cas_list)
# {'7732-18-5': [962], '74-82-8': [297], '71-43-2': [241]}

CAS → SMILES

processed, failed = pc.cas_to_smiles(["7732-18-5", "74-82-8"])
print(processed["7732-18-5"])  # 'O'
print(processed["74-82-8"])    # 'C'

CAS → RDKit molecules

mols = pc.cas_to_mols(["7732-18-5", "74-82-8"])
for cas, mol_list in mols.items():
    for mol in mol_list:
        print(cas, mol.GetNumAtoms())

Fetch compound properties for a list of CIDs

data = pc.get_from_cids([962, 297, 241], target="property/MolecularFormula,MolecularWeight")
for prop in data["PropertyTable"]["Properties"]:
    print(prop["CID"], prop["MolecularFormula"], prop["MolecularWeight"])

CID → CAS (reverse lookup)

cas_list = pc.cid_to_cas(962)
print(cas_list)  # ['7732-18-5']

InChIKey → CID

cids = pc.inchikey_to_pubchem("XLYOFNOQVPJJNP-UHFFFAOYSA-N")
print(cids)  # [962]

DTXSID → SMILES

processed, failed = pc.dtxsid_to_smiles(["DTXSID9020584"])
print(processed["DTXSID9020584"])

PFAS classification tree

# Fetch all CIDs from the OECD PFAS list node (default hnid = 5517102)
cids = pc.pubchem_pfas_tree()
print(f"{len(cids)} PFAS CIDs found")

For a full API reference and more examples, see the documentation.

Dependencies

Package Purpose
requests HTTP requests
numpy Rate-limiting random intervals
regex CAS / EINECS / DTXSID pattern matching
tqdm Progress bars for batch operations
rdkit (optional) SDF parsing and mol/SMILES generation

Licence

Copyright © 2024–2026 Luc T. Miaz.
Licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) licence.

Acknowledgments

Developed under the ZeroPM project (WP2) funded by the European Union's Horizon 2020 research and innovation programme (grant agreement No 101036756). Developed at the Department of Environmental Science, Stockholm University.

Powered by RDKit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pubchem_compounds-1.1.0.tar.gz (19.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pubchem_compounds-1.1.0-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file pubchem_compounds-1.1.0.tar.gz.

File metadata

  • Download URL: pubchem_compounds-1.1.0.tar.gz
  • Upload date:
  • Size: 19.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pubchem_compounds-1.1.0.tar.gz
Algorithm Hash digest
SHA256 9b8d1d2b819e576a7b963bd4e87ef23a26c186b6b4b95c7841f844110e55ff76
MD5 6daab46f06440cfac53411ca307d116a
BLAKE2b-256 92a143a108c51747bd98f2b26614d3a52102a9d1508196c549635a2aebf5f449

See more details on using hashes here.

File details

Details for the file pubchem_compounds-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pubchem_compounds-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4c01bd4b5ec13bad74080165e9268551ed48aadb86c606e744f561f8f55c7f83
MD5 8af714ff893414c48fcb25035684d7e1
BLAKE2b-256 cc7c0caf76a38f2f11492e2d8acd55f502d1abccbd63592c7c5962ba63c2bfff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page