A robust, AI-optimized Python client for the PubChem API, designed for automated data retrieval, machine learning workflows, and chemical informatics analysis
Project description
ChemInformant is a robust data acquisition engine for the PubChem database, engineered for the modern scientific workflow. It intelligently manages network requests, performs rigorous runtime data validation, and delivers analysis-ready results, providing a dependable foundation for any computational chemistry project in Python.
Release, Review, and Citation Status
ChemInformant is released under the MIT license, published in the Journal of Open Source Software, and accepted into the pyOpenSci ecosystem through open software peer review.
Published package artifacts are tracked on PyPI and GitHub Releases. If repository documentation or source metadata is ahead of PyPI, treat the PyPI page as the install source of truth for pip install ChemInformant.
Additional project records:
✨ Key Features
-
Analysis-Ready Pandas Output with SQL Export: The core API (
get_properties) returns a clean Pandas DataFrame, and a dedicateddf_to_sql()helper (plus thechemfetch --format sqlCLI mode) persists results directly into SQLite / PostgreSQL / any SQLAlchemy backend — so you can move from query to database in two lines without hand-writing wrangling code. -
Automated Network Reliability: Ensures your workflows run flawlessly with built-in persistent caching, smart rate-limiting, and automatic retries. It also transparently handles API pagination (
ListKey) for large-scale queries, delivering complete result sets without any manual intervention. -
Flexible & Fault-Tolerant Input: Natively accepts mixed lists of identifiers (names, CIDs, SMILES) and intelligently handles any invalid inputs by flagging them with a clear status in the output, ensuring a single bad entry never fails an entire batch operation.
-
A Dual API for Simplicity and Power: Offers a clear
get_<property>()convenience layer for quick lookups, backed by a powerfulget_propertiesengine for high-performance batch operations. -
Guaranteed Data Integrity: Employs Pydantic v2 models for rigorous, runtime data validation when using the object-based API, preventing malformed or unexpected data from corrupting your analysis pipeline.
-
Terminal-Ready CLI Tools: Includes
chemfetchandchemdrawfor rapid data retrieval and 2D structure visualization directly from your terminal, perfect for quick lookups without writing a script. -
Modern and Actively Maintained: Built on a contemporary tech stack for long-term consistency and compatibility, providing a reliable alternative to older or less frequently updated libraries.
📦 Installation
Install the library from PyPI:
pip install ChemInformant
The PyPI project page shows the latest published release available through pip.
To include plotting capabilities for use with the tutorial, install the [plot] extra:
pip install "ChemInformant[plot]"
🚀 Quick Start
Retrieve multiple properties for multiple compounds, directly into a Pandas DataFrame, in a single function call:
import ChemInformant as ci
# 1. Define your identifiers
identifiers = ["aspirin", "caffeine", 1983] # 1983 is paracetamol's CID
# 2. Specify the properties you need
properties = ["molecular_weight", "xlogp", "cas"]
# 3. Call the core function
df = ci.get_properties(identifiers, properties)
# 4. Save the results to an SQL database
ci.df_to_sql(df, "sqlite:///chem_data.db", "results", if_exists="replace")
# 5. Analyze your results!
print(df)
Output:
input_identifier cid status molecular_weight xlogp cas
0 aspirin 2244 OK 180.16 1.2 50-78-2
1 caffeine 2519 OK 194.19 -0.1 58-08-2
2 1983 1983 OK 151.16 0.5 103-90-2
➡️ Click to see Convenience API Cheatsheet
| Function | Description |
|---|---|
get_weight(id) |
Molecular weight (float) |
get_formula(id) |
Molecular formula (str) |
get_cas(id) |
CAS Registry Number (str) |
get_iupac_name(id) |
IUPAC name (str) |
get_canonical_smiles(id) |
Canonical SMILES with Canonical→Connectivity fallback (str) |
get_isomeric_smiles(id) |
Isomeric SMILES with Isomeric→SMILES fallback (str) |
get_xlogp(id) |
XLogP (calculated hydrophobicity) (float) |
get_synonyms(id) |
List of synonyms (List[str]) |
get_compound(id) |
Validated Compound object (Pydantic v2 model) |
Note: This table shows key convenience functions for demonstration. ChemInformant provides 22 convenience functions in total, covering molecular descriptors, mass properties, stereochemistry, and more.
All scalar get_<property>() functions accept a CID, name, or SMILES and return None/[] on failure. get_compound() / get_compounds() instead raise NotFoundError or AmbiguousIdentifierError so you can handle resolution failures explicitly.
ChemInformant also includes handy command-line tools for quick lookups directly from your terminal:
-
chemfetch: Fetches properties for one or more compounds.chemfetch aspirin --props "cas,molecular_weight,iupac_name"
-
chemdraw: Renders the 2D structure of a compound.chemdraw aspirin
📚 Documentation & Examples
For a deep dive, please see our detailed guides:
- ➡️ Online Documentation: The official documentation site contains complete API references, guides, and usage examples. This is the most comprehensive resource.
- ➡️ Interactive User Manual: Our Jupyter Notebook Tutorial provides a complete, end-to-end walkthrough. This is the best place to start for a hands-on experience.
- ➡️ Performance Benchmarks: Run integrated benchmarks with
pytest tests/test_benchmarks.py --benchmark-onlyto see the performance advantages of batching and caching. - ➡️ Release Notes: See the release notes for package changes and release-readiness checks.
📖 Additional Resources & Use Cases
- Basic Usage Guide - Quick start examples for common tasks
- Advanced Usage Guide - Complex workflows and batch processing
- Caching Guide - Optimize performance with intelligent caching
- CLI Tools Documentation - Complete reference for
chemfetchandchemdraw - API Reference - Full function documentation with examples
🤔 Why ChemInformant?
ChemInformant's core mission is to serve as a high-performance data backbone for the Python cheminformatics ecosystem. As a software package that has undergone rigorous peer review by both the Journal of Open Source Software (JOSS) and pyOpenSci, it delivers clean, validated, and analysis-ready Pandas DataFrames. This enables researchers to effortlessly pipe PubChem data into powerful toolkits like RDKit, Scikit-learn, or custom machine learning models, transforming multi-step data acquisition and wrangling tasks into single, elegant lines of code.
A detailed comparison with other existing tools is provided in our JOSS paper. For the story and the "why" behind the code, we've shared our thoughts in a post on the official pyOpenSci website.
🤝 Contributing
Contributions are welcome! For guidelines on how to get started, please read our contributing guide. You can open an issue to report bugs or suggest features, or submit a pull request to contribute code.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
📑 Citation
@article{He2025,
doi = {10.21105/joss.08341},
url = {https://doi.org/10.21105/joss.08341},
year = {2025},
publisher = {The Open Journal},
volume = {10},
number = {112},
pages = {8341},
author = {He, Zhiang},
title = {ChemInformant: A Robust and Workflow-Centric Python Client for High-Throughput PubChem Access},
journal = {Journal of Open Source Software}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cheminformant-2.5.0.tar.gz.
File metadata
- Download URL: cheminformant-2.5.0.tar.gz
- Upload date:
- Size: 53.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47c32ee1a8b4b7aeca27bf1bf7747cb26896aa423e5bff53b13a79e4db21f586
|
|
| MD5 |
9f5a32d0bfe2519e474ba783d3fb5e9d
|
|
| BLAKE2b-256 |
5702b346373f6249f28f6798c664501a5bce1328a15edc874b9e0a31b01b24f8
|
File details
Details for the file cheminformant-2.5.0-py3-none-any.whl.
File metadata
- Download URL: cheminformant-2.5.0-py3-none-any.whl
- Upload date:
- Size: 32.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbc81229b1ca1223bcef4ca648d976b7c4071f6bbaf841a0e632196554b5d51b
|
|
| MD5 |
ff37969a41048d2a0421a8e763ba99c6
|
|
| BLAKE2b-256 |
11a3e2b1a7e0212ae5a621c8f4d5cdc5cd9419224cd084e559361df364c79687
|