A short description
Project description
drugs
Lightweight Python utilities to work with small-molecule identifiers and metadata across PubChem and ChEMBL. The library exposes a single Drug class that lazily resolves identifiers (PubChem CID, ChEMBL ID, InChIKey), fetches PubChem properties/text, pulls ChEMBL mechanisms, and provides hooks for plugging in your own text or protein embedding functions with optional on-disk caching.
Highlights
- Lazy identifier translation between PubChem CID, ChEMBL ID, and InChIKey (via UniChem and PUG-REST)
- PubChem properties and PUG-View text retrieval with curated heading presets
- Structure representations: canonical SMILES + SELFIES
- Fingerprints (Morgan/MACCS/Daylight) with Tanimoto/Dice similarity + batch similarity matrices
- ChEMBL mechanisms, target details, and bioactivity rows (pChEMBL/IC50/EC50 filters)
- Drug-drug interactions via RxNav
- RDKit molecular property panel (QED, TPSA, Lipinski violations, synthetic accessibility)
- Embedding hooks for text and protein/sequence features, with simple caching helpers
- Markdown report generation for a drug snapshot
Installation
Python 3.9+ is required.
pip install -e .
For development (linting/tests/docs):
pip install -e ".[dev]"
Quick start
from drugs import Drug, PUBCHEM_MINIMAL_STABLE
# Start from any identifier
aspirin = Drug.from_pubchem_cid(2244)
# or: Drug.from_chembl_id("CHEMBL25") / Drug.from_inchikey("BSYNRYMUTXBXSQ-UHFFFAOYSA-N")
print(aspirin.map_ids())
props = aspirin.fetch_pubchem_properties()
text = aspirin.fetch_pubchem_text(PUBCHEM_MINIMAL_STABLE)
mechs = aspirin.fetch_chembl_mechanisms()
targets = aspirin.target_accessions()
# Structural views
print(aspirin.smiles())
print(aspirin.selfies())
# Fingerprints + similarity
fp = aspirin.molecular_fingerprint(method="morgan")
ibuprofen = Drug.from_chembl_id("CHEMBL521")
sim = aspirin.similarity_to(ibuprofen)
# Bioactivities and DDIs
acts = aspirin.fetch_chembl_bioactivities(min_pchembl=6.0, assay_types=["B", "F"])
ddis = aspirin.fetch_drug_interactions()
# Batch helpers
batch = Drug.from_batch([2244, "CHEMBL521", "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"])
sim_matrix = Drug.batch_similarity_matrix(batch)
# RDKit property panel
print(aspirin.molecular_properties())
# Plug in your own embedding functions
vec = aspirin.text_embedding(lambda s: s.upper()) # replace with your model
# Write a markdown report
aspirin.write_drug_markdown(output_path="aspirin.md")
Caching
API responses (PubChem/ChEMBL/RxNav) are cached to artifacts/cache/api_cache.json by default with a 24h TTL.
Configure via environment variables:
DRUGS_CACHE_PATH– override cache pathDRUGS_CACHE_TTL_SECONDS– TTL in secondsDRUGS_CACHE_DISABLED=1– disable disk caching
API surface
Drug.pubchem_cid,Drug.chembl_id,Drug.inchikey: resolved identifiersDrug.fetch_pubchem_properties(): dict of core PubChem propertiesDrug.fetch_pubchem_text(headings): filtered PUG-View text sections- Structure:
Drug.smiles(),Drug.selfies(),Drug.molecular_fingerprint(),Drug.similarity_to() - Bioactivity/targets:
Drug.fetch_chembl_mechanisms(),Drug.fetch_chembl_bioactivities(),Drug.fetch_target_details(),Drug.target_accessions(),Drug.target_gene_symbols() - Safety:
Drug.fetch_drug_interactions() - RDKit properties:
Drug.molecular_properties() - Batch helpers:
Drug.from_batch(),Drug.batch_similarity_matrix() - Embedding helpers:
text_embedding,text_embedding_cached,protein_embedding,protein_embedding_cached - Reporting:
write_drug_markdown
Heading presets
Curated heading sets live in drugs.constants (e.g., PUBCHEM_MINIMAL_STABLE, PUBCHEM_ADME_PK, PUBCHEM_MEANING, etc.). Use drugs.core.list_pubchem_text_headings(cid) to inspect available headings for a given CID.
Tests and quality
make test # runs pytest
make lint # ruff + mypy
make format # black + autofix lint
Documentation
Build and view the Sphinx docs locally:
pip install -e ".[docs]"
cd docs
make html # or: python -m sphinx -b html . _build/html
Then open _build/html/index.html in your browser.
Publishing to GitHub Pages
A GitHub Actions workflow (.github/workflows/docs.yml) builds the Sphinx HTML
docs on every push to main and publishes them to GitHub Pages.
One-time repo setup:
- In GitHub, go to Settings → Pages and set Source to GitHub Actions.
Manual trigger: use Actions → docs → Run workflow to publish immediately.
Publishing
This project uses Hatchling. To build and publish (requires valid PyPI credentials):
pip install hatch
hatch build
hatch publish
Notes
- Network access is required for live API calls to PubChem, ChEMBL, and UniChem.
- Protein embedding cache utilities expect
torchif you useprotein_embedding_cached; otherwise no heavy dependencies are required.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file drugs-0.1.2.tar.gz.
File metadata
- Download URL: drugs-0.1.2.tar.gz
- Upload date:
- Size: 7.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.2 cpython/3.13.6 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3581c11f6ff559beb0c445f9e81b0dafc1f7d7e21323a920cd5b8801d6024c60
|
|
| MD5 |
5ec4eba78d285c4ee94394077fef900e
|
|
| BLAKE2b-256 |
8907180a397acc52a76de7eac889c89214c6d41a35490488b31a8547c9b18b11
|
File details
Details for the file drugs-0.1.2-py3-none-any.whl.
File metadata
- Download URL: drugs-0.1.2-py3-none-any.whl
- Upload date:
- Size: 28.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.2 cpython/3.13.6 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea6d3ba6334ad5fbe5b044c08ecad0cee65fd2d3033e23dcf7fdf4869071bb86
|
|
| MD5 |
d8a7871be862c18a7f0c444edd6d65bd
|
|
| BLAKE2b-256 |
a6c44434a105b29056886aed912d00f640bb6c7d7fc1acd38decf1d6224a647d
|