Skip to main content

A Python wrapper for the PubChem PUG REST API. Provides convenient methods to retrieve chemical information using CIDs, SIDs, CAS numbers, SMILES, and InChIKeys, and to convert between chemical identifiers. Built-in rate limiting and retry logic; RDKit integration for SDF/mol retrieval.

Project description

pubchem-compounds

PyPI License: CC BY-NC 4.0 Docs Python 3.9+

A Python wrapper for the PubChem PUG REST API, focused on Compounds and Substances. It provides:

  • Identifier conversion: CAS → CID / SID, InChIKey → CID, SMILES → CID, CID → CAS / EINECS / DTXSID
  • Batch property fetching from lists of CIDs
  • SDF / RDKit mol retrieval for CIDs, SIDs, or CAS numbers
  • SMILES and InChI extraction for any PubChem synonym
  • PFAS classification tree node queries
  • Built-in rate limiting (max 5 req/s, 400 req/min) and automatic retry on HTTP 403

Installation

From PyPI

pip install pubchem-compounds

Note: RDKit is required only for functions that return molecules (get_mols_from_cids, cas_to_mols, synonyms_to_smiles, etc.). Install it separately:

pip install rdkit          # rdkit ≥ 2023.03
# or via conda:
conda install -c conda-forge rdkit

From source

git clone https://gitlab.com/lucmiaz/pubchem.git
cd pubchem
pip install -e .

Install with optional dependencies:

pip install -e ".[dev]"   # pytest, pylint
pip install -e ".[docs]"  # sphinx, sphinx-rtd-theme

Quick start

import pubchem_compounds as pc

CAS → CID

mapping, failed = pc.cas_to_cid("7732-18-5")  # water
print(mapping)   # {'7732-18-5': [962]}

Batch CAS lookup

cas_list = ["7732-18-5", "74-82-8", "71-43-2"]
mapping, failed = pc.cas_to_cid(cas_list)
# {'7732-18-5': [962], '74-82-8': [297], '71-43-2': [241]}

CAS → SMILES

processed, failed = pc.cas_to_smiles(["7732-18-5", "74-82-8"])
print(processed["7732-18-5"])  # 'O'
print(processed["74-82-8"])    # 'C'

CAS → RDKit molecules

mols = pc.cas_to_mols(["7732-18-5", "74-82-8"])
for cas, mol_list in mols.items():
    for mol in mol_list:
        print(cas, mol.GetNumAtoms())

Fetch compound properties for a list of CIDs

data = pc.get_from_cids([962, 297, 241], target="property/MolecularFormula,MolecularWeight")
for prop in data["PropertyTable"]["Properties"]:
    print(prop["CID"], prop["MolecularFormula"], prop["MolecularWeight"])

CID → CAS (reverse lookup)

cas_list = pc.cid_to_cas(962)
print(cas_list)  # ['7732-18-5']

InChIKey → CID

cids = pc.inchikey_to_pubchem("XLYOFNOQVPJJNP-UHFFFAOYSA-N")
print(cids)  # [962]

DTXSID → SMILES

Via PubChem

processed, failed = pc.dtxsid_to_smiles(["DTXSID9020584"])
print(processed["DTXSID9020584"])

Via CompTox

from pubchemcompounds import mols_from_comptox
mols = mols_from_comptox(["DTXSID4059916"])

PFAS classification tree

# Fetch all CIDs from the OECD PFAS list node (default hnid = 5517102)
cids = pc.pubchem_pfas_tree()
print(f"{len(cids)} PFAS CIDs found")

For a full API reference and more examples, see the documentation.

Dependencies

Package Purpose
requests HTTP requests
numpy Rate-limiting random intervals
regex CAS / EINECS / DTXSID pattern matching
tqdm Progress bars for batch operations
rdkit (optional) SDF parsing and mol/SMILES generation

Licence

Copyright © 2024–2026 Luc T. Miaz.
Licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) licence.

Acknowledgments

Developed under the ZeroPM project (WP2) funded by the European Union's Horizon 2020 research and innovation programme (grant agreement No 101036756). Developed at the Department of Environmental Science, Stockholm University.

Powered by RDKit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pubchem_compounds-1.1.1.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pubchem_compounds-1.1.1-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file pubchem_compounds-1.1.1.tar.gz.

File metadata

  • Download URL: pubchem_compounds-1.1.1.tar.gz
  • Upload date:
  • Size: 21.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pubchem_compounds-1.1.1.tar.gz
Algorithm Hash digest
SHA256 e4535a1a496e7339617976a53615d530cca7b130a079e097ec6e8a2fde8199b1
MD5 dc057facda3fb6bfd5e0a1d9f451f485
BLAKE2b-256 ab2d7166fdaf9ef62703d147e315a7a963ecb561158d9b8a8ba0738a789f46c7

See more details on using hashes here.

File details

Details for the file pubchem_compounds-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pubchem_compounds-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 631754577f4e61ed352e17aedd4551a9925ad525ea3ddb894137ba8bee8b3ced
MD5 8064d86f859b97a713035a3183cf9adf
BLAKE2b-256 7810c0ac02b6b9399a13c16837bad21968db1137aee591c197f13d00fd843239

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page