Skip to main content

A robust, AI-optimized Python client for the PubChem API, designed for automated data retrieval, machine learning workflows, and chemical informatics analysis

Project description

ChemInformant

A Robust Data Acquisition Engine for the Modern Scientific Workflow


Total Downloads

JOSS Journal Publication DOI 10.21105/joss.08341 pyOpenSci Peer-Reviewed

PyPI version Python Version License Tests Status Docs Build Status Code Coverage Test Results Codacy Badge Awesome Python Chemistry


ChemInformant is a robust data acquisition engine for the PubChem database, engineered for the modern scientific workflow. It intelligently manages network requests, performs rigorous runtime data validation, and delivers analysis-ready results, providing a dependable foundation for any computational chemistry project in Python.


Release, Review, and Citation Status

ChemInformant is released under the MIT license, published in the Journal of Open Source Software, and accepted into the pyOpenSci ecosystem through open software peer review.

Published package artifacts are tracked on PyPI and GitHub Releases. If repository documentation or source metadata is ahead of PyPI, treat the PyPI page as the install source of truth for pip install ChemInformant.

Additional project records:


✨ Key Features

  • Analysis-Ready Pandas Output with SQL Export: The core API (get_properties) returns a clean Pandas DataFrame, and a dedicated df_to_sql() helper (plus the chemfetch --format sql CLI mode) persists results directly into SQLite / PostgreSQL / any SQLAlchemy backend — so you can move from query to database in two lines without hand-writing wrangling code.

  • Automated Network Reliability: Ensures your workflows run flawlessly with built-in persistent caching, smart rate-limiting, and automatic retries. It also transparently handles API pagination (ListKey) for large-scale queries, delivering complete result sets without any manual intervention.

  • Flexible & Fault-Tolerant Input: Natively accepts mixed lists of identifiers (names, CIDs, SMILES) and intelligently handles any invalid inputs by flagging them with a clear status in the output, ensuring a single bad entry never fails an entire batch operation.

  • A Dual API for Simplicity and Power: Offers a clear get_<property>() convenience layer for quick lookups, backed by a powerful get_properties engine for high-performance batch operations.

  • Guaranteed Data Integrity: Employs Pydantic v2 models for rigorous, runtime data validation when using the object-based API, preventing malformed or unexpected data from corrupting your analysis pipeline.

  • Terminal-Ready CLI Tools: Includes chemfetch and chemdraw for rapid data retrieval and 2D structure visualization directly from your terminal, perfect for quick lookups without writing a script.

  • Modern and Actively Maintained: Built on a contemporary tech stack for long-term consistency and compatibility, providing a reliable alternative to older or less frequently updated libraries.


📦 Installation

Install the library from PyPI:

pip install ChemInformant

The PyPI project page shows the latest published release available through pip.

To include plotting capabilities for use with the tutorial, install the [plot] extra:

pip install "ChemInformant[plot]"

🚀 Quick Start

Retrieve multiple properties for multiple compounds, directly into a Pandas DataFrame, in a single function call:

import ChemInformant as ci

# 1. Define your identifiers
identifiers = ["aspirin", "caffeine", 1983] # 1983 is paracetamol's CID

# 2. Specify the properties you need
properties = ["molecular_weight", "xlogp", "cas"]

# 3. Call the core function
df = ci.get_properties(identifiers, properties)

# 4. Save the results to an SQL database
ci.df_to_sql(df, "sqlite:///chem_data.db", "results", if_exists="replace")

# 5. Analyze your results!
print(df)

Output:

  input_identifier   cid status  molecular_weight  xlogp       cas
0          aspirin  2244     OK            180.16    1.2   50-78-2
1         caffeine  2519     OK            194.19   -0.1   58-08-2
2             1983  1983     OK            151.16    0.5  103-90-2
➡️ Click to see Convenience API Cheatsheet
Function Description
get_weight(id) Molecular weight (float)
get_formula(id) Molecular formula (str)
get_cas(id) CAS Registry Number (str)
get_iupac_name(id) IUPAC name (str)
get_canonical_smiles(id) Canonical SMILES with Canonical→Connectivity fallback (str)
get_isomeric_smiles(id) Isomeric SMILES with Isomeric→SMILES fallback (str)
get_xlogp(id) XLogP (calculated hydrophobicity) (float)
get_synonyms(id) List of synonyms (List[str])
get_compound(id) Validated Compound object (Pydantic v2 model)

Note: This table shows key convenience functions for demonstration. ChemInformant provides 22 convenience functions in total, covering molecular descriptors, mass properties, stereochemistry, and more.

All scalar get_<property>() functions accept a CID, name, or SMILES and return None/[] on failure. get_compound() / get_compounds() instead raise NotFoundError or AmbiguousIdentifierError so you can handle resolution failures explicitly.

ChemInformant also includes handy command-line tools for quick lookups directly from your terminal:

  • chemfetch: Fetches properties for one or more compounds.

    chemfetch aspirin --props "cas,molecular_weight,iupac_name"
    
  • chemdraw: Renders the 2D structure of a compound.

    chemdraw aspirin
    


📚 Documentation & Examples

For a deep dive, please see our detailed guides:

  • ➡️ Online Documentation: The official documentation site contains complete API references, guides, and usage examples. This is the most comprehensive resource.
  • ➡️ Interactive User Manual: Our Jupyter Notebook Tutorial provides a complete, end-to-end walkthrough. This is the best place to start for a hands-on experience.
  • ➡️ Performance Benchmarks: Run integrated benchmarks with pytest tests/test_benchmarks.py --benchmark-only to see the performance advantages of batching and caching.
  • ➡️ Release Notes: See the release notes for package changes and release-readiness checks.

📖 Additional Resources & Use Cases


🤔 Why ChemInformant?

ChemInformant's core mission is to serve as a high-performance data backbone for the Python cheminformatics ecosystem. As a software package that has undergone rigorous peer review by both the Journal of Open Source Software (JOSS) and pyOpenSci, it delivers clean, validated, and analysis-ready Pandas DataFrames. This enables researchers to effortlessly pipe PubChem data into powerful toolkits like RDKit, Scikit-learn, or custom machine learning models, transforming multi-step data acquisition and wrangling tasks into single, elegant lines of code.

A detailed comparison with other existing tools is provided in our JOSS paper. For the story and the "why" behind the code, we've shared our thoughts in a post on the official pyOpenSci website.

🤝 Contributing

Contributions are welcome! For guidelines on how to get started, please read our contributing guide. You can open an issue to report bugs or suggest features, or submit a pull request to contribute code.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📑 Citation

@article{He2025,
  doi       = {10.21105/joss.08341},
  url       = {https://doi.org/10.21105/joss.08341},
  year      = {2025},
  publisher = {The Open Journal},
  volume    = {10},
  number    = {112},
  pages     = {8341},
  author    = {He, Zhiang},
  title     = {ChemInformant: A Robust and Workflow-Centric Python Client for High-Throughput PubChem Access},
  journal   = {Journal of Open Source Software}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cheminformant-2.5.0.tar.gz (53.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cheminformant-2.5.0-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file cheminformant-2.5.0.tar.gz.

File metadata

  • Download URL: cheminformant-2.5.0.tar.gz
  • Upload date:
  • Size: 53.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cheminformant-2.5.0.tar.gz
Algorithm Hash digest
SHA256 47c32ee1a8b4b7aeca27bf1bf7747cb26896aa423e5bff53b13a79e4db21f586
MD5 9f5a32d0bfe2519e474ba783d3fb5e9d
BLAKE2b-256 5702b346373f6249f28f6798c664501a5bce1328a15edc874b9e0a31b01b24f8

See more details on using hashes here.

File details

Details for the file cheminformant-2.5.0-py3-none-any.whl.

File metadata

  • Download URL: cheminformant-2.5.0-py3-none-any.whl
  • Upload date:
  • Size: 32.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cheminformant-2.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fbc81229b1ca1223bcef4ca648d976b7c4071f6bbaf841a0e632196554b5d51b
MD5 ff37969a41048d2a0421a8e763ba99c6
BLAKE2b-256 11a3e2b1a7e0212ae5a621c8f4d5cdc5cd9419224cd084e559361df364c79687

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page