Skip to main content

Simplifies interaction with the PubChem database via PUG-REST API.

Project description

PubChmAPI Library

Overview

Introduction

The PubChmAPI Python package simplifies interaction with the PubChem database via the PUG-REST API. Unlike traditional wrappers with hard-coded functions, PubChmAPI uses dynamic metaprogramming to generate endpoints, ensuring full coverage of the PubChem schema. It handles URL generation, automatic batching, and throttling to provide a seamless data retrieval experience.


Naming Convention

Functions in PubChmAPI follow a strict semantic naming convention to eliminate ambiguity:

domain_identifier_get_operation_option

  • Domain: The primary database being queried (e.g., compound, substance, assay, gene).
  • Identifier: The input type provided (e.g., cid, name, smiles, geneid).
  • Operation: The specific data to retrieve (e.g., properties, aids, synonyms).
  • Option (Optional): Filters or variants (e.g., active, inactive, 2d).

Functions

Compound Property Functions (By Name)

Retrieve calculated properties using a compound name (e.g., "Aspirin"). Format: compound_name_get_[Property](identifier)

  • compound_name_get_Title(identifier)
  • compound_name_get_MolecularFormula(identifier)
  • compound_name_get_MolecularWeight(identifier)
  • compound_name_get_CanonicalSMILES(identifier)
  • compound_name_get_IsomericSMILES(identifier)
  • compound_name_get_InChI(identifier)
  • compound_name_get_InChIKey(identifier)
  • compound_name_get_IUPACName(identifier)
  • compound_name_get_XLogP(identifier)
  • compound_name_get_ExactMass(identifier)
  • compound_name_get_MonoisotopicMass(identifier)
  • compound_name_get_TPSA(identifier)
  • compound_name_get_Complexity(identifier)
  • compound_name_get_Charge(identifier)
  • compound_name_get_HBondDonorCount(identifier)
  • compound_name_get_HBondAcceptorCount(identifier)
  • compound_name_get_RotatableBondCount(identifier)
  • compound_name_get_HeavyAtomCount(identifier)
  • compound_name_get_IsotopeAtomCount(identifier)
  • compound_name_get_AtomStereoCount(identifier)
  • compound_name_get_DefinedAtomStereoCount(identifier)
  • compound_name_get_UndefinedAtomStereoCount(identifier)
  • compound_name_get_BondStereoCount(identifier)
  • compound_name_get_DefinedBondStereoCount(identifier)
  • compound_name_get_UndefinedBondStereoCount(identifier)
  • compound_name_get_CovalentUnitCount(identifier)
  • compound_name_get_Volume3D(identifier)
  • compound_name_get_ConformerModelRMSD3D(identifier)
  • compound_name_get_EffectiveRotorCount3D(identifier)
  • compound_name_get_ConformerCount3D(identifier)
  • compound_name_get_Fingerprint2D(identifier)
  • compound_name_get_FeatureCount3D(identifier)
  • compound_name_get_FeatureAcceptorCount3D(identifier)
  • compound_name_get_FeatureDonorCount3D(identifier)
  • compound_name_get_FeatureAnionCount3D(identifier)
  • compound_name_get_FeatureCationCount3D(identifier)
  • compound_name_get_FeatureRingCount3D(identifier)
  • compound_name_get_FeatureHydrophobeCount3D(identifier)
  • compound_name_get_XStericQuadrupole3D(identifier)
  • compound_name_get_YStericQuadrupole3D(identifier)
  • compound_name_get_ZStericQuadrupole3D(identifier)

Compound CID Functions

Retrieve data using a Compound Identifier (CID). Format: compound_cid_get_[Operation](identifier)

General & Conversion

  • compound_cid_get_description(identifier)
  • compound_cid_get_synonyms(identifier)
  • compound_cid_get_sids(identifier) (Get Substance IDs)
  • compound_cid_get_cids(identifier) (Self-retrieval/Verification)
  • compound_cid_get_conformers(identifier)

Images

  • compound_cid_get_png(identifier) (Default)
  • compound_cid_get_png_2d(identifier)
  • compound_cid_get_png_3d(identifier)

Assays (Biological Activity)

  • compound_cid_get_aids(identifier) (All associated Assays)
  • compound_cid_get_aids_active(identifier) (Only Active Assays)
  • compound_cid_get_aids_inactive(identifier) (Only Inactive Assays)
  • compound_cid_get_assaysummary(identifier)

Structural & Isotopic Variants

  • compound_cid_get_cids_same_isotopes(identifier)
  • compound_cid_get_cids_same_connectivity(identifier)
  • compound_cid_get_cids_same_tautomer(identifier)
  • compound_cid_get_cids_same_stereo(identifier)
  • compound_cid_get_cids_parent(identifier)
  • compound_cid_get_cids_original(identifier)
  • compound_cid_get_cids_component(identifier)
  • compound_cid_get_cids_preferred(identifier)

Properties (Batch)

  • compound_cid_get_all_properties(identifier) (Retrieves all properties as CSV)
  • compound_cid_get_properties(identifier, properties=[...]) (Retrieve specific list)

Note: All single property functions listed under "Compound Name" are also available for CIDs (e.g., compound_cid_get_MolecularWeight).

Structural Search Functions

Retrieve CIDs based on structural similarity or substructures. Format: compound_[Method]_[InputType]_get_cids(identifier)

  • compound_fastsimilarity_2d_cid_get_cids(cid)
  • compound_fastsubstructure_smiles_get_cids(smiles)
  • compound_fastidentity_smiles_get_cids(smiles)
  • compound_similarity_3d_cid_get_cids(cid)

Biological Domain Functions

Retrieve data related to proteins, genes, taxonomy, and cell lines.

Protein Functions

Format: protein_[Identifier]_get_[Operation]

  • protein_accession_get_summary(accession)
  • protein_accession_get_aids(accession)
  • protein_gi_get_summary(gi)
  • protein_synonym_get_aids(synonym)

Gene Functions

Format: gene_[Identifier]_get_[Operation]

  • gene_geneid_get_summary(geneid)
  • gene_geneid_get_aids(geneid)
  • gene_genesymbol_get_summary(symbol)
  • gene_genesymbol_get_aids(symbol)

Taxonomy Functions

Format: taxonomy_[Identifier]_get_[Operation]

  • taxonomy_taxid_get_summary(taxid)
  • taxonomy_taxid_get_aids(taxid)
  • taxonomy_synonym_get_aids(synonym)

Cell Line Functions

Format: cell_[Identifier]_get_[Operation]

  • cell_cellacc_get_summary(cellacc)
  • cell_cellacc_get_aids(cellacc)
  • cell_synonym_get_summary(synonym)

Biological Functions Example

This example demonstrates how to retrieve biological data using PubChmAPI.

Code:

# test_pubchmapi_biological.py
from PubChmAPI import (
    # Protein
    protein_accession_get_summary,
    protein_accession_get_aids,
    protein_gi_get_summary,
    protein_synonym_get_aids,
    # Gene
    gene_geneid_get_summary,
    gene_geneid_get_aids,
    gene_genesymbol_get_summary,
    gene_genesymbol_get_aids,
    # Taxonomy
    taxonomy_taxid_get_summary,
    taxonomy_taxid_get_aids,
    taxonomy_synonym_get_aids,
    # Cell line
    cell_cellacc_get_summary,
    cell_cellacc_get_aids,
    cell_synonym_get_summary
)

def test_biological_functions():
    print("Testing PubChmAPI Biological Functions\n")

    # ------------------- Protein -------------------
    accession = "P68871"  # Human Hemoglobin subunit beta
    gi = "4506723"  # Example GI number
    protein_syn = "Hemoglobin"

    print("Protein Functions:")
    print("Summary:", protein_accession_get_summary(accession))
    print("AIDs:", protein_accession_get_aids(accession)[:5], "...")
    print("GI Summary:", protein_gi_get_summary(gi))
    print("Synonym AIDs:", protein_synonym_get_aids(protein_syn)[:5], "...\n")

    # ------------------- Gene -------------------
    geneid = "3043"  # HBB gene
    symbol = "HBB"

    print("Gene Functions:")
    print("GeneID Summary:", gene_geneid_get_summary(geneid))
    print("GeneID AIDs:", gene_geneid_get_aids(geneid)[:5], "...")
    print("Symbol Summary:", gene_genesymbol_get_summary(symbol))
    print("Symbol AIDs:", gene_genesymbol_get_aids(symbol)[:5], "...\n")

    # ------------------- Taxonomy -------------------
    taxid = "9606"  # Homo sapiens
    tax_syn = "Human"

    print("Taxonomy Functions:")
    print("TaxID Summary:", taxonomy_taxid_get_summary(taxid))
    print("TaxID AIDs:", taxonomy_taxid_get_aids(taxid)[:5], "...")
    print("Synonym AIDs:", taxonomy_synonym_get_aids(tax_syn)[:5], "...\n")

    # ------------------- Cell Line -------------------
    cellacc = "CVCL_0030"  # HeLa
    cell_syn = "HeLa"

    print("Cell Line Functions:")
    print("Cell Summary:", cell_cellacc_get_summary(cellacc))
    print("Cell AIDs:", cell_cellacc_get_aids(cellacc)[:5], "...")
    print("Synonym Summary:", cell_synonym_get_summary(cell_syn))

if __name__ == "__main__":
    test_biological_functions()

Output:

Testing PubChmAPI Biological Functions

Protein Functions:
Summary: ['https://pubchem.ncbi.nlm.nih.gov/rest/pug/protein/accession/P68871/summary/json']
AIDs: ['https://pubchem.ncbi.nlm.nih.gov/rest/pug/protein/accession/P68871/aids/txt'] ...
GI Summary: ['https://pubchem.ncbi.nlm.nih.gov/rest/pug/protein/gi/4506723/summary/json']
Synonym AIDs: ['https://pubchem.ncbi.nlm.nih.gov/rest/pug/protein/synonym/Hemoglobin/aids/txt'] ...

Gene Functions:
GeneID Summary: ['https://pubchem.ncbi.nlm.nih.gov/rest/pug/gene/geneid/3043/summary/json']
GeneID AIDs: ['https://pubchem.ncbi.nlm.nih.gov/rest/pug/gene/geneid/3043/aids/txt'] ...
Symbol Summary: ['https://pubchem.ncbi.nlm.nih.gov/rest/pug/gene/genesymbol/HBB/summary/json']
Symbol AIDs: ['https://pubchem.ncbi.nlm.nih.gov/rest/pug/gene/genesymbol/HBB/aids/txt'] ...

Taxonomy Functions:
TaxID Summary: ['https://pubchem.ncbi.nlm.nih.gov/rest/pug/taxonomy/taxid/9606/summary/json']
TaxID AIDs: ['https://pubchem.ncbi.nlm.nih.gov/rest/pug/taxonomy/taxid/9606/aids/txt'] ...
Synonym AIDs: ['https://pubchem.ncbi.nlm.nih.gov/rest/pug/taxonomy/synonym/Human/aids/txt'] ...

Cell Line Functions:
Cell Summary: ['https://pubchem.ncbi.nlm.nih.gov/rest/pug/cell/cellacc/CVCL_0030/summary/json']
Cell AIDs: ['https://pubchem.ncbi.nlm.nih.gov/rest/pug/cell/cellacc/CVCL_0030/aids/txt'] ...
Synonym Summary: ['https://pubchem.ncbi.nlm.nih.gov/rest/pug/cell/synonym/HeLa/summary/json']

Ahmed Alhilal

0.0.42 (12/12/2025)

  • First Release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pubchmapi-0.0.46.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pubchmapi-0.0.46-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file pubchmapi-0.0.46.tar.gz.

File metadata

  • Download URL: pubchmapi-0.0.46.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for pubchmapi-0.0.46.tar.gz
Algorithm Hash digest
SHA256 ef6689252e626b3039dc435a96457a04a149d571bf272a6d520372b32d531a96
MD5 7d5b9bb764d1ffa97e0f877918e9cf81
BLAKE2b-256 b5518e49423e11d6f2a78e3a7a8c63dc2e97557560aa0f48428076a26ca4385e

See more details on using hashes here.

File details

Details for the file pubchmapi-0.0.46-py3-none-any.whl.

File metadata

  • Download URL: pubchmapi-0.0.46-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for pubchmapi-0.0.46-py3-none-any.whl
Algorithm Hash digest
SHA256 9f8d4ff93f19eafdf8db0df77c50cbfef4ad1c5d6392d5a6920f32fc59684cc5
MD5 ef95c202f2de92c9ac6975d84fe1c3f5
BLAKE2b-256 76fc13389b22afd03259c5f555f16685c538fc28c515d6aa5108a9786426e4a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page