A Python wrapper for the PubChem PUG REST API. Provides convenient methods to retrieve chemical information using CIDs, SIDs, CAS numbers, SMILES, and InChIKeys, and to convert between chemical identifiers. Built-in rate limiting and retry logic; RDKit integration for SDF/mol retrieval.
Project description
pubchem-compounds
A Python wrapper for the PubChem PUG REST API, focused on Compounds and Substances. It provides:
- Identifier conversion: CAS → CID / SID, InChIKey → CID, SMILES → CID, CID → CAS / EINECS / DTXSID
- Batch property fetching from lists of CIDs
- SDF / RDKit mol retrieval for CIDs, SIDs, or CAS numbers
- SMILES and InChI extraction for any PubChem synonym
- PFAS classification tree node queries
- Built-in rate limiting (max 5 req/s, 400 req/min) and automatic retry on HTTP 403
Installation
From PyPI
pip install pubchem-compounds
Note: RDKit is required only for functions that return molecules (
get_mols_from_cids,cas_to_mols,synonyms_to_smiles, etc.). Install it separately:pip install rdkit # rdkit ≥ 2023.03 # or via conda: conda install -c conda-forge rdkit
From source
git clone https://gitlab.com/lucmiaz/pubchem.git
cd pubchem
pip install -e .
Install with optional dependencies:
pip install -e ".[dev]" # pytest, pylint
pip install -e ".[docs]" # sphinx, sphinx-rtd-theme
Quick start
import pubchem_compounds as pc
CAS → CID
mapping, failed = pc.cas_to_cid("7732-18-5") # water
print(mapping) # {'7732-18-5': [962]}
Batch CAS lookup
cas_list = ["7732-18-5", "74-82-8", "71-43-2"]
mapping, failed = pc.cas_to_cid(cas_list)
# {'7732-18-5': [962], '74-82-8': [297], '71-43-2': [241]}
CAS → SMILES
processed, failed = pc.cas_to_smiles(["7732-18-5", "74-82-8"])
print(processed["7732-18-5"]) # 'O'
print(processed["74-82-8"]) # 'C'
CAS → RDKit molecules
mols = pc.cas_to_mols(["7732-18-5", "74-82-8"])
for cas, mol_list in mols.items():
for mol in mol_list:
print(cas, mol.GetNumAtoms())
Fetch compound properties for a list of CIDs
data = pc.get_from_cids([962, 297, 241], target="property/MolecularFormula,MolecularWeight")
for prop in data["PropertyTable"]["Properties"]:
print(prop["CID"], prop["MolecularFormula"], prop["MolecularWeight"])
CID → CAS (reverse lookup)
cas_list = pc.cid_to_cas(962)
print(cas_list) # ['7732-18-5']
InChIKey → CID
cids = pc.inchikey_to_pubchem("XLYOFNOQVPJJNP-UHFFFAOYSA-N")
print(cids) # [962]
DTXSID → SMILES
processed, failed = pc.dtxsid_to_smiles(["DTXSID9020584"])
print(processed["DTXSID9020584"])
PFAS classification tree
# Fetch all CIDs from the OECD PFAS list node (default hnid = 5517102)
cids = pc.pubchem_pfas_tree()
print(f"{len(cids)} PFAS CIDs found")
For a full API reference and more examples, see the documentation.
Dependencies
| Package | Purpose |
|---|---|
requests |
HTTP requests |
numpy |
Rate-limiting random intervals |
regex |
CAS / EINECS / DTXSID pattern matching |
tqdm |
Progress bars for batch operations |
rdkit (optional) |
SDF parsing and mol/SMILES generation |
Licence
Copyright © 2024–2026 Luc T. Miaz.
Licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) licence.
Acknowledgments
Developed under the ZeroPM project (WP2) funded by the European Union's Horizon 2020 research and innovation programme (grant agreement No 101036756). Developed at the Department of Environmental Science, Stockholm University.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pubchem_compounds-1.1.0.tar.gz.
File metadata
- Download URL: pubchem_compounds-1.1.0.tar.gz
- Upload date:
- Size: 19.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b8d1d2b819e576a7b963bd4e87ef23a26c186b6b4b95c7841f844110e55ff76
|
|
| MD5 |
6daab46f06440cfac53411ca307d116a
|
|
| BLAKE2b-256 |
92a143a108c51747bd98f2b26614d3a52102a9d1508196c549635a2aebf5f449
|
File details
Details for the file pubchem_compounds-1.1.0-py3-none-any.whl.
File metadata
- Download URL: pubchem_compounds-1.1.0-py3-none-any.whl
- Upload date:
- Size: 15.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c01bd4b5ec13bad74080165e9268551ed48aadb86c606e744f561f8f55c7f83
|
|
| MD5 |
8af714ff893414c48fcb25035684d7e1
|
|
| BLAKE2b-256 |
cc7c0caf76a38f2f11492e2d8acd55f502d1abccbd63592c7c5962ba63c2bfff
|