Skip to main content

A Python wrapper for the PubChem PUG REST API. Provides convenient methods to retrieve chemical information using CIDs, SIDs, CAS numbers, SMILES, and InChIKeys, and to convert between chemical identifiers. Built-in rate limiting and retry logic; RDKit integration for SDF/mol retrieval.

Project description

pubchem-compounds

PyPI License: CC BY-NC 4.0 Docs Python 3.9+

A Python wrapper for the PubChem PUG REST API, focused on Compounds and Substances. It provides:

  • Identifier conversion: CAS → CID / SID, InChIKey → CID, SMILES → CID, CID → CAS / EINECS / DTXSID
  • Batch property fetching from lists of CIDs
  • SDF / RDKit mol retrieval for CIDs, SIDs, or CAS numbers
  • SMILES and InChI extraction for any PubChem synonym
  • PFAS classification tree node queries
  • Built-in rate limiting (max 5 req/s, 400 req/min) and automatic retry on HTTP 403

Installation

From PyPI

pip install pubchem-compounds

Note: RDKit is required only for functions that return molecules (get_mols_from_cids, cas_to_mols, synonyms_to_smiles, etc.). Install it separately:

pip install rdkit          # rdkit ≥ 2023.03
# or via conda:
conda install -c conda-forge rdkit

From source

git clone https://gitlab.com/lucmiaz/pubchem.git
cd pubchem
pip install -e .

Install with optional dependencies:

pip install -e ".[dev]"   # pytest, pylint
pip install -e ".[docs]"  # sphinx, sphinx-rtd-theme

Quick start

import pubchem_compounds as pc

CAS → CID

mapping, failed = pc.cas_to_cid("7732-18-5")  # water
print(mapping)   # {'7732-18-5': [962]}

Batch CAS lookup

cas_list = ["7732-18-5", "74-82-8", "71-43-2"]
mapping, failed = pc.cas_to_cid(cas_list)
# {'7732-18-5': [962], '74-82-8': [297], '71-43-2': [241]}

CAS → SMILES

processed, failed = pc.cas_to_smiles(["7732-18-5", "74-82-8"])
print(processed["7732-18-5"])  # 'O'
print(processed["74-82-8"])    # 'C'

CAS → RDKit molecules

mols = pc.cas_to_mols(["7732-18-5", "74-82-8"])
for cas, mol_list in mols.items():
    for mol in mol_list:
        print(cas, mol.GetNumAtoms())

Fetch compound properties for a list of CIDs

data = pc.get_from_cids([962, 297, 241], target="property/MolecularFormula,MolecularWeight")
for prop in data["PropertyTable"]["Properties"]:
    print(prop["CID"], prop["MolecularFormula"], prop["MolecularWeight"])

CID → CAS (reverse lookup)

cas_list = pc.cid_to_cas(962)
print(cas_list)  # ['7732-18-5']

InChIKey → CID

cids = pc.inchikey_to_pubchem("XLYOFNOQVPJJNP-UHFFFAOYSA-N")
print(cids)  # [962]

DTXSID → SMILES

Via PubChem

processed, failed = pc.dtxsid_to_smiles(["DTXSID9020584"])
print(processed["DTXSID9020584"])

Via CompTox

from pubchemcompounds import mols_from_comptox
mols = mols_from_comptox(["DTXSID4059916"])

PFAS classification tree

# Fetch all CIDs from the OECD PFAS list node (default hnid = 5517102)
cids = pc.pubchem_pfas_tree()
print(f"{len(cids)} PFAS CIDs found")

For a full API reference and more examples, see the documentation.

Dependencies

Package Purpose
requests HTTP requests
numpy Rate-limiting random intervals
regex CAS / EINECS / DTXSID pattern matching
tqdm Progress bars for batch operations
rdkit (optional) SDF parsing and mol/SMILES generation

Licence

Copyright © 2024–2026 Luc T. Miaz.
Licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) licence.

Acknowledgments

Developed under the ZeroPM project (WP2) funded by the European Union's Horizon 2020 research and innovation programme (grant agreement No 101036756). Developed at the Department of Environmental Science, Stockholm University.

Powered by RDKit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pubchem_compounds-1.1.2.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pubchem_compounds-1.1.2-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file pubchem_compounds-1.1.2.tar.gz.

File metadata

  • Download URL: pubchem_compounds-1.1.2.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pubchem_compounds-1.1.2.tar.gz
Algorithm Hash digest
SHA256 fc1100639a8089c91f219a8049745fdf0043ed3713ab4d9a38445e7d0ca14829
MD5 61237444ac9b7d751af19cd948fae2ce
BLAKE2b-256 915e8a790260c0a1f6da151dcf5c8ca01fa099ba712d0f1e7ad1af24e525d835

See more details on using hashes here.

File details

Details for the file pubchem_compounds-1.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for pubchem_compounds-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 85ab5eda04375c396c269fa945f925e4b9059e13f66080dffae31de0ccddeab3
MD5 81c5c3298fdfbd446e988f16504166ef
BLAKE2b-256 a890ba5097a44c45ebbdc4098d43d373bb1306fe082a89a3708ece6e47046413

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page