A robust and high-throughput Python client for the PubChem API, designed for automated data retrieval and analysis
Project description
ChemInformant is a robust data acquisition engine for the PubChem database, engineered for the modern scientific workflow. It intelligently manages network requests, performs rigorous runtime data validation, and delivers analysis-ready results, providing a dependable foundation for any computational chemistry project in Python.
✨ Key Features
-
Analysis-Ready Pandas/SQL Output: The core API (
get_properties) returns either a clean Pandas DataFrame or a direct SQL output, eliminating data wrangling boilerplate and enabling immediate integration with both the Python data science ecosystem and modern database workflows. -
Automated Network Reliability: Ensures your workflows run flawlessly with built-in persistent caching, smart rate-limiting, and automatic retries. It also transparently handles API pagination (
ListKey) for large-scale queries, delivering complete result sets without any manual intervention. -
Flexible & Fault-Tolerant Input: Natively accepts mixed lists of identifiers (names, CIDs, SMILES) and intelligently handles any invalid inputs by flagging them with a clear status in the output, ensuring a single bad entry never fails an entire batch operation.
-
A Dual API for Simplicity and Power: Offers a clear
get_<property>()convenience layer for quick lookups, backed by a powerfulget_propertiesengine for high-performance batch operations. -
Guaranteed Data Integrity: Employs Pydantic v2 models for rigorous, runtime data validation when using the object-based API, preventing malformed or unexpected data from corrupting your analysis pipeline.
-
Terminal-Ready CLI Tools: Includes
chemfetchandchemdrawfor rapid data retrieval and 2D structure visualization directly from your terminal, perfect for quick lookups without writing a script. -
Modern and Actively Maintained: Built on a contemporary tech stack for long-term consistency and compatibility, providing a reliable alternative to older or less frequently updated libraries.
📦 Installation
Install the library from PyPI:
pip install ChemInformant
To include plotting capabilities for use with the tutorial, install the [plot] extra:
pip install "ChemInformant[plot]"
🚀 Quick Start
Retrieve multiple properties for multiple compounds, directly into a Pandas DataFrame, in a single function call:
import ChemInformant as ci
# 1. Define your identifiers
identifiers = ["aspirin", "caffeine", 1983] # 1983 is paracetamol's CID
# 2. Specify the properties you need
properties = ["molecular_weight", "xlogp", "cas"]
# 3. Call the core function
df = ci.get_properties(identifiers, properties)
# 4. Save the results to an SQL database
ci.df_to_sql(df, "sqlite:///chem_data.db", "results", if_exists="replace")
# 5. Analyze your results!
print(df)
Output:
input_identifier cid status molecular_weight xlogp cas
0 aspirin 2244 OK 180.16 1.2 50-78-2
1 caffeine 2519 OK 194.19 -0.1 58-08-2
2 1983 1983 OK 151.16 0.5 103-90-2
➡️ Click to see Convenience API Cheatsheet
| Function | Description |
|---|---|
get_weight(id) |
Molecular weight (float) |
get_formula(id) |
Molecular formula (str) |
get_cas(id) |
CAS Registry Number (str) |
get_iupac_name(id) |
IUPAC name (str) |
get_canonical_smiles(id) |
Canonical SMILES with Canonical→Connectivity fallback (str) |
get_isomeric_smiles(id) |
Isomeric SMILES with Isomeric→SMILES fallback (str) |
get_xlogp(id) |
XLogP (calculated hydrophobicity) (float) |
get_synonyms(id) |
List of synonyms (List[str]) |
get_compound(id) |
Full, validated Compound object (Pydantic v2 model) |
Note: This table shows key convenience functions for demonstration. ChemInformant provides 22 convenience functions in total, covering molecular descriptors, mass properties, stereochemistry, and more.
All functions accept a CID, name, or SMILES and return None/[] on failure.
ChemInformant also includes handy command-line tools for quick lookups directly from your terminal:
-
chemfetch: Fetches properties for one or more compounds.chemfetch aspirin --props "cas,molecular_weight,iupac_name"
-
chemdraw: Renders the 2D structure of a compound.chemdraw aspirin
📚 Documentation & Examples
For a deep dive, please see our detailed guides:
- ➡️ Online Documentation: The official documentation site contains complete API references, guides, and usage examples. This is the most comprehensive resource.
- ➡️ Interactive User Manual: Our Jupyter Notebook Tutorial provides a complete, end-to-end walkthrough. This is the best place to start for a hands-on experience.
- ➡️ Performance Benchmarks: Run integrated benchmarks with
pytest tests/test_benchmarks.py --benchmark-onlyto see the performance advantages of batching and caching.
📖 Additional Resources & Use Cases
- Basic Usage Guide - Quick start examples for common tasks
- Advanced Usage Guide - Complex workflows and batch processing
- Caching Guide - Optimize performance with intelligent caching
- CLI Tools Documentation - Complete reference for
chemfetchandchemdraw - API Reference - Full function documentation with examples
🤔 Why ChemInformant?
ChemInformant's core mission is to serve as a high-performance data backbone for the Python cheminformatics ecosystem. By delivering clean, validated, and analysis-ready Pandas DataFrames, it enables researchers to effortlessly pipe PubChem data into powerful toolkits like RDKit, Scikit-learn, or custom machine learning models, transforming multi-step data acquisition and wrangling tasks into single, elegant lines of code.
A detailed comparison with other existing tools is provided in our JOSS paper.
🤝 Contributing
Contributions are welcome! For guidelines on how to get started, please read our contributing guide. You can open an issue to report bugs or suggest features, or submit a pull request to contribute code.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
📑 Citation
@article{He2025,
doi = {10.21105/joss.08341},
url = {https://doi.org/10.21105/joss.08341},
year = {2025},
publisher = {The Open Journal},
volume = {10},
number = {112},
pages = {8341},
author = {He, Zhiang},
title = {ChemInformant: A Robust and Workflow-Centric Python Client for High-Throughput PubChem Access},
journal = {Journal of Open Source Software}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cheminformant-2.4.3.tar.gz.
File metadata
- Download URL: cheminformant-2.4.3.tar.gz
- Upload date:
- Size: 44.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b93bb53cf4b2f8ba7a0f3c8d3829b052b0f4d2098ca39c1edc03388ee5549765
|
|
| MD5 |
92fb3e57858da9ac4bf97b788afbda5a
|
|
| BLAKE2b-256 |
120b69d59deb97c6e8726c35447e1d038ea3069794d216c42ebddcb73f5e7142
|
File details
Details for the file cheminformant-2.4.3-py3-none-any.whl.
File metadata
- Download URL: cheminformant-2.4.3-py3-none-any.whl
- Upload date:
- Size: 28.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ece6c3ab762e7a258d92d169f7bb510c77d047365c2050f7e12c8bfd38f69a29
|
|
| MD5 |
d8f02d7c846f219b65f3336ae5d8a275
|
|
| BLAKE2b-256 |
7f81455358012711bfb48167943c4afd7f44234e6de54a8bc3165b3ac42725b9
|