Skip to main content

A robust and high-throughput Python client for the PubChem API, designed for automated data retrieval and analysis

Project description

ChemInformant

A Robust Data Acquisition Engine for the Modern Scientific Workflow


Total Downloads

DOI PyPI version Python Version License

Tests Status Docs Build Status coverage Ruff Codacy Badge pyOpenSci First JOSS Track Review Awesome Python Chemistry


ChemInformant is a robust data acquisition engine for the PubChem database, engineered for the modern scientific workflow. It intelligently manages network requests, performs rigorous runtime data validation, and delivers analysis-ready results, providing a dependable foundation for any computational chemistry project in Python.


✨ Key Features

  • Analysis-Ready Pandas/SQL Output: The core API (get_properties) returns either a clean Pandas DataFrame or a direct SQL output, eliminating data wrangling boilerplate and enabling immediate integration with both the Python data science ecosystem and modern database workflows.

  • Automated Network Reliability: Ensures your workflows run flawlessly with built-in persistent caching, smart rate-limiting, and automatic retries. It also transparently handles API pagination (ListKey) for large-scale queries, delivering complete result sets without any manual intervention.

  • Flexible & Fault-Tolerant Input: Natively accepts mixed lists of identifiers (names, CIDs, SMILES) and intelligently handles any invalid inputs by flagging them with a clear status in the output, ensuring a single bad entry never fails an entire batch operation.

  • A Dual API for Simplicity and Power: Offers a clear get_<property>() convenience layer for quick lookups, backed by a powerful get_properties engine for high-performance batch operations.

  • Guaranteed Data Integrity: Employs Pydantic v2 models for rigorous, runtime data validation when using the object-based API, preventing malformed or unexpected data from corrupting your analysis pipeline.

  • Terminal-Ready CLI Tools: Includes chemfetch and chemdraw for rapid data retrieval and 2D structure visualization directly from your terminal, perfect for quick lookups without writing a script.

  • Modern and Actively Maintained: Built on a contemporary tech stack for long-term consistency and compatibility, providing a reliable alternative to older or less frequently updated libraries.


📦 Installation

Install the library from PyPI:

pip install ChemInformant

To include plotting capabilities for use with the tutorial, install the [plot] extra:

pip install "ChemInformant[plot]"

🚀 Quick Start

Retrieve multiple properties for multiple compounds, directly into a Pandas DataFrame, in a single function call:

import ChemInformant as ci

# 1. Define your identifiers
identifiers = ["aspirin", "caffeine", 1983] # 1983 is paracetamol's CID

# 2. Specify the properties you need
properties = ["molecular_weight", "xlogp", "cas"]

# 3. Call the core function
df = ci.get_properties(identifiers, properties)

# 4. Save the results to an SQL database
ci.df_to_sql(df, "sqlite:///chem_data.db", "results", if_exists="replace")

# 5. Analyze your results!
print(df)

Output:

  input_identifier   cid status  molecular_weight  xlogp       cas
0          aspirin  2244     OK            180.16    1.2   50-78-2
1         caffeine  2519     OK            194.19   -0.1   58-08-2
2             1983  1983     OK            151.16    0.5  103-90-2
➡️ Click to see Convenience API Cheatsheet
Function Description
get_weight(id) Molecular weight (float)
get_formula(id) Molecular formula (str)
get_cas(id) CAS Registry Number (str)
get_iupac_name(id) IUPAC name (str)
get_canonical_smiles(id) Canonical SMILES with Canonical→Connectivity fallback (str)
get_isomeric_smiles(id) Isomeric SMILES with Isomeric→SMILES fallback (str)
get_xlogp(id) XLogP (calculated hydrophobicity) (float)
get_synonyms(id) List of synonyms (List[str])
get_compound(id) Full, validated Compound object (Pydantic v2 model)

Note: This table shows key convenience functions for demonstration. ChemInformant provides 22 convenience functions in total, covering molecular descriptors, mass properties, stereochemistry, and more.

All functions accept a CID, name, or SMILES and return None/[] on failure.

ChemInformant also includes handy command-line tools for quick lookups directly from your terminal:

  • chemfetch: Fetches properties for one or more compounds.

    chemfetch aspirin --props "cas,molecular_weight,iupac_name"
    
  • chemdraw: Renders the 2D structure of a compound.

    chemdraw aspirin
    


📚 Documentation & Examples

For a deep dive, please see our detailed guides:

  • ➡️ Online Documentation: The official documentation site contains complete API references, guides, and usage examples. This is the most comprehensive resource.
  • ➡️ Interactive User Manual: Our Jupyter Notebook Tutorial provides a complete, end-to-end walkthrough. This is the best place to start for a hands-on experience.
  • ➡️ Performance Benchmarks: Run integrated benchmarks with pytest tests/test_benchmarks.py --benchmark-only to see the performance advantages of batching and caching.

📖 Additional Resources & Use Cases


🤔 Why ChemInformant?

ChemInformant's core mission is to serve as a high-performance data backbone for the Python cheminformatics ecosystem. By delivering clean, validated, and analysis-ready Pandas DataFrames, it enables researchers to effortlessly pipe PubChem data into powerful toolkits like RDKit, Scikit-learn, or custom machine learning models, transforming multi-step data acquisition and wrangling tasks into single, elegant lines of code.

A detailed comparison with other existing tools is provided in our JOSS paper.

🤝 Contributing

Contributions are welcome! For guidelines on how to get started, please read our contributing guide. You can open an issue to report bugs or suggest features, or submit a pull request to contribute code.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📑 Citation

@article{He2025,
  doi       = {10.21105/joss.08341},
  url       = {https://doi.org/10.21105/joss.08341},
  year      = {2025},
  publisher = {The Open Journal},
  volume    = {10},
  number    = {112},
  pages     = {8341},
  author    = {He, Zhiang},
  title     = {ChemInformant: A Robust and Workflow-Centric Python Client for High-Throughput PubChem Access},
  journal   = {Journal of Open Source Software}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cheminformant-2.4.3.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cheminformant-2.4.3-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file cheminformant-2.4.3.tar.gz.

File metadata

  • Download URL: cheminformant-2.4.3.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.1

File hashes

Hashes for cheminformant-2.4.3.tar.gz
Algorithm Hash digest
SHA256 b93bb53cf4b2f8ba7a0f3c8d3829b052b0f4d2098ca39c1edc03388ee5549765
MD5 92fb3e57858da9ac4bf97b788afbda5a
BLAKE2b-256 120b69d59deb97c6e8726c35447e1d038ea3069794d216c42ebddcb73f5e7142

See more details on using hashes here.

File details

Details for the file cheminformant-2.4.3-py3-none-any.whl.

File metadata

  • Download URL: cheminformant-2.4.3-py3-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.1

File hashes

Hashes for cheminformant-2.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ece6c3ab762e7a258d92d169f7bb510c77d047365c2050f7e12c8bfd38f69a29
MD5 d8f02d7c846f219b65f3336ae5d8a275
BLAKE2b-256 7f81455358012711bfb48167943c4afd7f44234e6de54a8bc3165b3ac42725b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page