Skip to main content

Drug target intelligence aggregator — fetch, collate, and visualize public data for any protein target in one command.

Project description

Python License Version Databases PyPI

TargetRecon

Drug target intelligence aggregator — fetch, collate, and visualize public data for any protein target in one command.

Aggregate UniProt · PDB · AlphaFold · ChEMBL · BindingDB into a single interactive report — in seconds.


What is TargetRecon?

TargetRecon is a Python CLI and web app that pulls data from 5 public databases and compiles it into a single, richly formatted report for any protein drug target. No API keys. No account. No manual copy-pasting.

Think of it as gget for drug discovery — or TargetDB reimagined for the AlphaFold era.


Five Data Sources, One Report

Source Data
UniProt Function, subcellular location, GO terms, diseases, keywords
RCSB PDB Experimental structures filtered by resolution
AlphaFold DB Predicted structure with pLDDT confidence coloring
ChEMBL Bioactivity data (IC50, Ki, Kd, EC50) sorted by pChEMBL descending
BindingDB Binding affinity measurements converted to pChEMBL, sorted by potency

Intelligent ID Resolution

Accepts gene names, UniProt accessions, or ChEMBL target IDs:

targetrecon EGFR          # Gene name
targetrecon P00533        # UniProt accession
targetrecon CHEMBL203     # ChEMBL target ID

Bioactivity Data

What is pChEMBL?

pChEMBL is a unified potency scale — the negative log₁₀ of the molar affinity. Higher = more potent.

pChEMBL = -log₁₀(affinity_M)
pChEMBL Affinity Interpretation
9 1 nM Very potent
7 100 nM Potent
6 1 µM Moderate
< 5 > 10 µM Weak

ChEMBL natively reports pChEMBL. BindingDB reports raw affinities in nM, which TargetRecon converts using the same formula:

pChEMBL = -log₁₀(affinity_nM × 10⁻⁹)

This makes ChEMBL and BindingDB values directly comparable on the same scale.

Full pipeline — what happens when you run a query

1. Resolve query → UniProt ID + ChEMBL target ID
         │
         ▼
2. Fetch in parallel (async):
   ├── ChEMBL API  → top-N bioactivity records, sorted by pChEMBL desc (server-side)
   ├── BindingDB API → all records for this UniProt ID (one bulk request)
   ├── RCSB PDB    → experimental structures
   ├── AlphaFold   → predicted structure
   └── STRING DB   → protein interactions
         │
         ▼
3. BindingDB: sort by pChEMBL descending (client-side), apply cap
         │
         ▼
4. Merge ChEMBL + BindingDB records into one list
         │
         ▼
5. Apply min_pchembl filter (if set) across the merged list
         │
         ▼
6. Deduplicate by canonical SMILES → Ligand Summary
   (same molecule from both sources → one entry, best pChEMBL kept)
         │
         ▼
7. Sort Ligand Summary by best pChEMBL descending
         │
         ▼
8. Output: TargetReport (bioactivities + ligand_summary + structures + ...)

Fetching strategy per source

ChEMBL — server-side sort + pagination:

  • Sends order_by=-pchembl_value to the API
  • Only the top-N most potent records are fetched — no wasted API calls
  • Records with no pChEMBL value are excluded at the API level

BindingDB — one bulk request + client-side sort:

  • Fetches all records for the UniProt ID in a single HTTP call (no pagination)
  • Sorts by pChEMBL descending locally, then applies the cap
  • Records with no valid affinity value are discarded

Cap behavior

The default is 1000 per DB:

Setting ChEMBL BindingDB Total
Default (1000) top 1000 most potent top 1000 most potent up to 2000
--max-bioactivities 500 top 500 most potent top 500 most potent up to 1000
--max-bioactivities all all records all records all available

Because sorting happens before the cap, you always get the most potent compounds — never a random subset.

Interface No-limit syntax
CLI --max-bioactivities all
Web UI Drag the Max bioactivities per DB slider to All
Python API max_bioactivities=None

Ligand deduplication

After merging ChEMBL and BindingDB, all records are grouped by canonical SMILES (via RDKit). If the same molecule appears in both databases:

  • It becomes one entry in the Ligand Summary
  • The best pChEMBL across all assays is kept
  • sources lists both databases (e.g. ["ChEMBL", "BindingDB"])
  • num_assays counts the total assay measurements across both sources

The final Ligand Summary is sorted by best pChEMBL descending — the most potent unique compound is always first.


CLI

pip install targetrecon

targetrecon / targetrecon run — Single target

targetrecon EGFR
targetrecon P00533 -f html -f json -f sdf -o ./reports/
targetrecon BRAF --min-pchembl 7.0 --max-resolution 2.5
targetrecon CDK2 --max-bioactivities 5000         # up to 5000 per source
targetrecon CDK2 --max-bioactivities all          # no limit
targetrecon CDK2 --no-bindingdb                   # ChEMBL only
targetrecon CDK2 --no-chembl                      # BindingDB only
Option Default Description
-f, --format [json|html|sdf] html json sdf Output formats (repeat for multiple)
-o, --output PATH . Output directory
--max-resolution FLOAT 4.0 Max PDB resolution in Å
--max-bioactivities INT|all 1000 Max records per DB (ChEMBL + BindingDB separately); all = no limit
--min-pchembl FLOAT Minimum pChEMBL value filter
--top-ligands INT 20 Number of top ligands for SDF export
--use-chembl / --no-chembl on Include ChEMBL bioactivity data
--use-bindingdb / --no-bindingdb on Include BindingDB bioactivity data
-q, --quiet off Suppress progress messages

targetrecon batch — Multiple targets

# Pass targets directly
targetrecon batch EGFR BRAF CDK2 ABL1

# From a file (one target per line, # = comment)
targetrecon batch -i targets.txt

# With filters and format selection
targetrecon batch -i targets.txt -f html -f sdf --min-pchembl 6.0 --skip-errors

# ChEMBL only for all targets
targetrecon batch EGFR BRAF --no-bindingdb

# Unlimited bioactivities
targetrecon batch -i targets.txt --max-bioactivities all
Option Default Description
-i, --input PATH Text file, one target per line
-o, --output PATH ./batch_reports Output directory
-f, --format [json|html|sdf] html json sdf Output formats (repeat for multiple)
--max-resolution FLOAT 4.0 Max PDB resolution in Å
--max-bioactivities INT|all 1000 Max records per DB (ChEMBL + BindingDB separately); all = no limit
--min-pchembl FLOAT Minimum pChEMBL value filter
--top-ligands INT 20 Ligands per SDF file
--use-chembl / --no-chembl on Include ChEMBL bioactivity data
--use-bindingdb / --no-bindingdb on Include BindingDB bioactivity data
--skip-errors off Continue if a single target fails
-q, --quiet off Suppress progress messages

After completion, a summary table is printed showing structures / bioactivities / ligands per target.

targetrecon serve — Launch web interface

targetrecon serve                  # http://localhost:5000
targetrecon serve --port 8080
targetrecon serve --host 0.0.0.0   # expose on all interfaces
Option Default Description
--port INT 5000 Port to listen on
--host TEXT 0.0.0.0 Host to bind
--debug off Enable Flask debug mode

Web UI

targetrecon serve
# Open http://localhost:5000
  • Dark-themed interface with animated molecular backdrop
  • Search by gene name, UniProt accession, or ChEMBL target ID
  • Molecule sketcher (Ketcher) — draw a structure to find matching targets
  • Sidebar controls: max PDB resolution, min pChEMBL, ChEMBL/BindingDB toggles, max bioactivities slider (100–5000, or drag to All for no limit)

Report tabs

Tab Contents
Overview UniProt summary, GO terms, diseases, protein stats
3D Viewer AlphaFold pLDDT coloring + PDB experimental structures (3Dmol.js)
Bioactivity pChEMBL distribution histogram, method breakdown chart
Ligands Sortable table ranked by potency — SMILES, ChEMBL links, activity type, source
PDB All experimental structures with resolution, method, ligand count
Interactions STRING protein–protein interaction network (Cytoscape.js)

Export from the UI

Every report page has one-click download buttons:

  • JSON — full machine-readable report
  • HTML — self-contained interactive report (works fully offline)
  • SDF — top ligands with 3D conformers, ready for docking

AI Agent

An AI chat panel is available on every report page. Click the AI button (bottom-right corner) to open it.

Providers & models

Provider Models
Anthropic claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5
OpenAI gpt-4o, gpt-4o-mini
Groq llama-3.3-70b, mixtral
  • Bring your own API key — keys are never stored, forgotten after each browser session
  • Context-aware: the agent already knows the target you're looking at
  • Tools: search targets, fetch bioactivities, query PDB structures, protein interactions, compare targets
  • Streaming responses with stop button
  • Resizable and minimizable panel

Example questions

What are the best scaffolds for covalent inhibition?
Which PDB structures are most suitable for docking?
Compare the selectivity profile of this target vs CDK4.
Summarize the druggability of this target.

Python API

import targetrecon

# Single target — works in scripts and Jupyter
report = targetrecon.recon("EGFR")
print(report.uniprot.protein_name)      # "Epidermal growth factor receptor"
print(report.num_pdb_structures)         # 50
print(report.num_bioactivities)          # up to 2000 (1000 ChEMBL + 1000 BindingDB)
print(report.best_ligand.best_pchembl)   # e.g. 10.52

# With options
report = targetrecon.recon(
    "BRAF",
    max_bioactivities=5000,   # up to 5000 per source
    min_pchembl=7.0,
    max_pdb_resolution=2.5,
)

# No limit — fetch all available records
report = targetrecon.recon("BRAF", max_bioactivities=None)

# Async (for use with asyncio.run or inside async functions)
import asyncio
report = asyncio.run(targetrecon.recon_async("BRAF"))

# Async with all options
report = asyncio.run(targetrecon.recon_async(
    "CDK2",
    use_chembl=True,
    use_bindingdb=False,      # ChEMBL only
    max_bioactivities=2000,
    # max_bioactivities=None  # no limit
))

Accessing the data

# UniProt info
report.uniprot.protein_name           # e.g. "Epidermal growth factor receptor"
report.uniprot.gene_name              # e.g. "EGFR"
report.uniprot.organism               # e.g. "Homo sapiens"
report.uniprot.function_description   # functional annotation text
report.uniprot.subcellular_locations  # list[str]
report.uniprot.disease_associations   # list[str]
report.uniprot.keywords               # list[str]
report.uniprot.go_terms               # list[GoTerm] — each has .go_id, .term, .category
report.uniprot.sequence_length        # int

# PDB structures
for pdb in report.pdb_structures[:5]:
    print(pdb.pdb_id, pdb.resolution, pdb.method)
    for lig in pdb.ligands:           # list[PDBLigand] — each has .ligand_id, .smiles, .name
        print(lig.ligand_id, lig.name)

# AlphaFold
report.alphafold.pdb_url        # URL to AlphaFold PDB structure
report.alphafold.model_url      # URL to AlphaFold CIF model

# Bioactivity records (sorted by pChEMBL descending)
for b in report.bioactivities[:10]:
    print(b.source, b.activity_type, b.value, b.pchembl_value, b.smiles)

# Ligand summary (deduplicated by canonical SMILES, sorted by best pChEMBL)
for lig in report.ligand_summary[:10]:
    print(lig.name, lig.chembl_id, lig.best_pchembl, lig.best_activity_type, lig.num_assays)
    print(lig.sources)                # e.g. ["ChEMBL", "BindingDB"]

report.best_ligand               # most potent unique ligand overall

Export

from targetrecon.core import save_html, save_json, save_sdf

save_html(report, "EGFR_report.html")
save_json(report, "EGFR_report.json")

# SDF with filters
save_sdf(report, "EGFR_ligands.sdf",
         top_n=50,              # limit to top 50
         min_pchembl=7.0,       # only pChEMBL ≥ 7
         activity_type="IC50")  # only IC50 records

Batch (async, concurrent)

import asyncio, targetrecon

async def run_batch(targets):
    reports = await asyncio.gather(*[
        targetrecon.recon_async(t) for t in targets
    ])
    return reports

reports = asyncio.run(run_batch(["EGFR", "BRAF", "CDK2"]))
for r in reports:
    print(r.uniprot.gene_name, r.num_bioactivities, r.num_unique_ligands)

Comparison

Feature TargetRecon TargetDB (2020) gget Open Targets
AlphaFold integration ✅ (web)
ChEMBL bioactivity Partial
BindingDB binding constants
Interactive HTML report Web only
3D structure viewer Web only
Molecule sketcher → targets
Docking-ready SDF export
AI agent chat
Batch CLI processing N/A
pip install + single command Partial N/A

Installation

pip install targetrecon

Quick start:

targetrecon EGFR

Produces EGFR_report.html (interactive, self-contained), EGFR_report.json, and EGFR_top_ligands.sdf — ready for docking.

Development:

git clone https://github.com/nagarh/targetrecon.git
cd targetrecon
pip install -e ".[dev]"

Architecture

src/targetrecon/
├── cli.py           # Click CLI — run, batch, serve
├── webapp.py        # Flask web app — UI, report pages, AI agent routes
├── core.py          # Orchestration, aggregation, export (HTML/JSON/SDF)
├── models.py        # Pydantic data models
├── resolver.py      # Gene → UniProt → ChEMBL ID resolution
├── report.py        # Jinja2 HTML report generator (standalone)
├── agent_chat.py    # AI agent — tool definitions, streaming, multi-provider
└── clients/
    ├── uniprot.py   # UniProt REST API
    ├── pdb_client.py# RCSB PDB REST + Search API
    ├── alphafold.py # AlphaFold Database API
    ├── chembl.py    # ChEMBL REST API
    └── bindingdb.py # BindingDB REST API

Acknowledgments

Data from: UniProt · RCSB PDB · AlphaFold DB · ChEMBL · BindingDB

Visualization: 3Dmol.js · Chart.js · Cytoscape.js

Sketcher: Ketcher

Inspired by TargetDB and gget.


License

MIT License — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

targetrecon-0.1.6.tar.gz (55.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

targetrecon-0.1.6-py3-none-any.whl (55.9 MB view details)

Uploaded Python 3

File details

Details for the file targetrecon-0.1.6.tar.gz.

File metadata

  • Download URL: targetrecon-0.1.6.tar.gz
  • Upload date:
  • Size: 55.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for targetrecon-0.1.6.tar.gz
Algorithm Hash digest
SHA256 c01b6e91973d0ec5739624f441fe999d82124e08ddf802037170f457de3e9a46
MD5 45594fa90539ce0134ec99d66295ce8a
BLAKE2b-256 6c831b121446352eb2525d981cef2f3d3c64aacdc9104704c49e189f77883afe

See more details on using hashes here.

File details

Details for the file targetrecon-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: targetrecon-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 55.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for targetrecon-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 03382bddf9ebf4fa3906ff78265d906bc549b135f75d1750ae9ce83b8159a302
MD5 0b62241fc2be1c503fca845ea29a3d09
BLAKE2b-256 607eb8b328eedf076206db0c51ab3fd281d07a0e54cc4a336a671614af03c7e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page