Drug target intelligence aggregator — fetch, collate, and visualize public data for any protein target in one command.
Project description
TargetRecon
Drug target intelligence aggregator — fetch, collate, and visualize public data for any protein target in one command.
Aggregate UniProt · PDB · AlphaFold · ChEMBL · BindingDB into a single interactive report — in seconds.
What is TargetRecon?
TargetRecon is a Python CLI and web app that pulls data from 5 public databases and compiles it into a single, richly formatted report for any protein drug target. No API keys. No account. No manual copy-pasting.
Think of it as gget for drug discovery — or TargetDB reimagined for the AlphaFold era.
Five Data Sources, One Report
| Source | Data |
|---|---|
| UniProt | Function, subcellular location, GO terms, diseases, keywords |
| RCSB PDB | Experimental structures filtered by resolution |
| AlphaFold DB | Predicted structure with pLDDT confidence coloring |
| ChEMBL | Bioactivity data (IC50, Ki, Kd, EC50) sorted by pChEMBL descending |
| BindingDB | Binding affinity measurements converted to pChEMBL, sorted by potency |
Intelligent ID Resolution
Accepts gene names, UniProt accessions, or ChEMBL target IDs:
targetrecon EGFR # Gene name
targetrecon P00533 # UniProt accession
targetrecon CHEMBL203 # ChEMBL target ID
Bioactivity Data
What is pChEMBL?
pChEMBL is a unified potency scale — the negative log₁₀ of the molar affinity. Higher = more potent.
pChEMBL = -log₁₀(affinity_M)
| pChEMBL | Affinity | Interpretation |
|---|---|---|
| 9 | 1 nM | Very potent |
| 7 | 100 nM | Potent |
| 6 | 1 µM | Moderate |
| < 5 | > 10 µM | Weak |
ChEMBL natively reports pChEMBL. BindingDB reports raw affinities in nM, which TargetRecon converts using the same formula:
pChEMBL = -log₁₀(affinity_nM × 10⁻⁹)
This makes ChEMBL and BindingDB values directly comparable on the same scale.
Full pipeline — what happens when you run a query
1. Resolve query → UniProt ID + ChEMBL target ID
│
▼
2. Fetch in parallel (async):
├── ChEMBL API → top-N bioactivity records, sorted by pChEMBL desc (server-side)
├── BindingDB API → all records for this UniProt ID (one bulk request)
├── RCSB PDB → experimental structures
├── AlphaFold → predicted structure
└── STRING DB → protein interactions
│
▼
3. BindingDB: sort by pChEMBL descending (client-side), apply cap
│
▼
4. Merge ChEMBL + BindingDB records into one list
│
▼
5. Apply min_pchembl filter (if set) across the merged list
│
▼
6. Deduplicate by canonical SMILES → Ligand Summary
(same molecule from both sources → one entry, best pChEMBL kept)
│
▼
7. Sort Ligand Summary by best pChEMBL descending
│
▼
8. Output: TargetReport (bioactivities + ligand_summary + structures + ...)
Fetching strategy per source
ChEMBL — server-side sort + pagination:
- Sends
order_by=-pchembl_valueto the API - Only the top-N most potent records are fetched — no wasted API calls
- Records with no pChEMBL value are excluded at the API level
BindingDB — one bulk request + client-side sort:
- Fetches all records for the UniProt ID in a single HTTP call (no pagination)
- Sorts by pChEMBL descending locally, then applies the cap
- Records with no valid affinity value are discarded
Cap behavior
The default is 1000 per DB:
| Setting | ChEMBL | BindingDB | Total |
|---|---|---|---|
| Default (1000) | top 1000 most potent | top 1000 most potent | up to 2000 |
--max-bioactivities 500 |
top 500 most potent | top 500 most potent | up to 1000 |
--max-bioactivities all |
all records | all records | all available |
Because sorting happens before the cap, you always get the most potent compounds — never a random subset.
| Interface | No-limit syntax |
|---|---|
| CLI | --max-bioactivities all |
| Web UI | Drag the Max bioactivities per DB slider to All |
| Python API | max_bioactivities=None |
Ligand deduplication
After merging ChEMBL and BindingDB, all records are grouped by canonical SMILES (via RDKit). If the same molecule appears in both databases:
- It becomes one entry in the Ligand Summary
- The best pChEMBL across all assays is kept
sourceslists both databases (e.g.["ChEMBL", "BindingDB"])num_assayscounts the total assay measurements across both sources
The final Ligand Summary is sorted by best pChEMBL descending — the most potent unique compound is always first.
CLI
pip install targetrecon
targetrecon / targetrecon run — Single target
targetrecon EGFR
targetrecon P00533 -f html -f json -f sdf -o ./reports/
targetrecon BRAF --min-pchembl 7.0 --max-resolution 2.5
targetrecon CDK2 --max-bioactivities 5000 # up to 5000 per source
targetrecon CDK2 --max-bioactivities all # no limit
targetrecon CDK2 --no-bindingdb # ChEMBL only
targetrecon CDK2 --no-chembl # BindingDB only
| Option | Default | Description |
|---|---|---|
-f, --format [json|html|sdf] |
html json sdf |
Output formats (repeat for multiple) |
-o, --output PATH |
. |
Output directory |
--max-resolution FLOAT |
4.0 |
Max PDB resolution in Å |
--max-bioactivities INT|all |
1000 |
Max records per DB (ChEMBL + BindingDB separately); all = no limit |
--min-pchembl FLOAT |
— | Minimum pChEMBL value filter |
--top-ligands INT |
20 |
Number of top ligands for SDF export |
--use-chembl / --no-chembl |
on | Include ChEMBL bioactivity data |
--use-bindingdb / --no-bindingdb |
on | Include BindingDB bioactivity data |
-q, --quiet |
off | Suppress progress messages |
targetrecon batch — Multiple targets
# Pass targets directly
targetrecon batch EGFR BRAF CDK2 ABL1
# From a file (one target per line, # = comment)
targetrecon batch -i targets.txt
# With filters and format selection
targetrecon batch -i targets.txt -f html -f sdf --min-pchembl 6.0 --skip-errors
# ChEMBL only for all targets
targetrecon batch EGFR BRAF --no-bindingdb
# Unlimited bioactivities
targetrecon batch -i targets.txt --max-bioactivities all
| Option | Default | Description |
|---|---|---|
-i, --input PATH |
— | Text file, one target per line |
-o, --output PATH |
./batch_reports |
Output directory |
-f, --format [json|html|sdf] |
html json sdf |
Output formats (repeat for multiple) |
--max-resolution FLOAT |
4.0 |
Max PDB resolution in Å |
--max-bioactivities INT|all |
1000 |
Max records per DB (ChEMBL + BindingDB separately); all = no limit |
--min-pchembl FLOAT |
— | Minimum pChEMBL value filter |
--top-ligands INT |
20 |
Ligands per SDF file |
--use-chembl / --no-chembl |
on | Include ChEMBL bioactivity data |
--use-bindingdb / --no-bindingdb |
on | Include BindingDB bioactivity data |
--skip-errors |
off | Continue if a single target fails |
-q, --quiet |
off | Suppress progress messages |
After completion, a summary table is printed showing structures / bioactivities / ligands per target.
targetrecon serve — Launch web interface
targetrecon serve # http://localhost:5000
targetrecon serve --port 8080
targetrecon serve --host 0.0.0.0 # expose on all interfaces
| Option | Default | Description |
|---|---|---|
--port INT |
5000 |
Port to listen on |
--host TEXT |
0.0.0.0 |
Host to bind |
--debug |
off | Enable Flask debug mode |
Web UI
targetrecon serve
# Open http://localhost:5000
- Dark-themed interface with animated molecular backdrop
- Search by gene name, UniProt accession, or ChEMBL target ID
- Molecule sketcher (Ketcher) — draw a structure to find matching targets
- Sidebar controls: max PDB resolution, min pChEMBL, ChEMBL/BindingDB toggles, max bioactivities slider (100–5000, or drag to All for no limit)
Report tabs
| Tab | Contents |
|---|---|
| Overview | UniProt summary, GO terms, diseases, protein stats |
| 3D Viewer | AlphaFold pLDDT coloring + PDB experimental structures (3Dmol.js) |
| Bioactivity | pChEMBL distribution histogram, method breakdown chart |
| Ligands | Sortable table ranked by potency — SMILES, ChEMBL links, activity type, source |
| PDB | All experimental structures with resolution, method, ligand count |
| Interactions | STRING protein–protein interaction network (Cytoscape.js) |
Export from the UI
Every report page has one-click download buttons:
- JSON — full machine-readable report
- HTML — self-contained interactive report (works fully offline)
- SDF — top ligands with 3D conformers, ready for docking
AI Agent
An AI chat panel is available on every report page. Click the AI button (bottom-right corner) to open it.
Providers & models
| Provider | Models |
|---|---|
| Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5 |
| OpenAI | gpt-4o, gpt-4o-mini |
| Groq | llama-3.3-70b, mixtral |
- Bring your own API key — keys are never stored, forgotten after each browser session
- Context-aware: the agent already knows the target you're looking at
- Tools: search targets, fetch bioactivities, query PDB structures, protein interactions, compare targets
- Streaming responses with stop button
- Resizable and minimizable panel
Example questions
What are the best scaffolds for covalent inhibition?
Which PDB structures are most suitable for docking?
Compare the selectivity profile of this target vs CDK4.
Summarize the druggability of this target.
Python API
import targetrecon
# Single target — works in scripts and Jupyter
report = targetrecon.recon("EGFR")
print(report.uniprot.protein_name) # "Epidermal growth factor receptor"
print(report.num_pdb_structures) # 50
print(report.num_bioactivities) # up to 2000 (1000 ChEMBL + 1000 BindingDB)
print(report.best_ligand.best_pchembl) # e.g. 10.52
# With options
report = targetrecon.recon(
"BRAF",
max_bioactivities=5000, # up to 5000 per source
min_pchembl=7.0,
max_pdb_resolution=2.5,
)
# No limit — fetch all available records
report = targetrecon.recon("BRAF", max_bioactivities=None)
# Async (for use with asyncio.run or inside async functions)
import asyncio
report = asyncio.run(targetrecon.recon_async("BRAF"))
# Async with all options
report = asyncio.run(targetrecon.recon_async(
"CDK2",
use_chembl=True,
use_bindingdb=False, # ChEMBL only
max_bioactivities=2000,
# max_bioactivities=None # no limit
))
Accessing the data
# UniProt info
report.uniprot.protein_name # e.g. "Epidermal growth factor receptor"
report.uniprot.gene_name # e.g. "EGFR"
report.uniprot.organism # e.g. "Homo sapiens"
report.uniprot.function_description # functional annotation text
report.uniprot.subcellular_locations # list[str]
report.uniprot.disease_associations # list[str]
report.uniprot.keywords # list[str]
report.uniprot.go_terms # list[GoTerm] — each has .go_id, .term, .category
report.uniprot.sequence_length # int
# PDB structures
for pdb in report.pdb_structures[:5]:
print(pdb.pdb_id, pdb.resolution, pdb.method)
for lig in pdb.ligands: # list[PDBLigand] — each has .ligand_id, .smiles, .name
print(lig.ligand_id, lig.name)
# AlphaFold
report.alphafold.pdb_url # URL to AlphaFold PDB structure
report.alphafold.model_url # URL to AlphaFold CIF model
# Bioactivity records (sorted by pChEMBL descending)
for b in report.bioactivities[:10]:
print(b.source, b.activity_type, b.value, b.pchembl_value, b.smiles)
# Ligand summary (deduplicated by canonical SMILES, sorted by best pChEMBL)
for lig in report.ligand_summary[:10]:
print(lig.name, lig.chembl_id, lig.best_pchembl, lig.best_activity_type, lig.num_assays)
print(lig.sources) # e.g. ["ChEMBL", "BindingDB"]
report.best_ligand # most potent unique ligand overall
Export
from targetrecon.core import save_html, save_json, save_sdf
save_html(report, "EGFR_report.html")
save_json(report, "EGFR_report.json")
# SDF with filters
save_sdf(report, "EGFR_ligands.sdf",
top_n=50, # limit to top 50
min_pchembl=7.0, # only pChEMBL ≥ 7
activity_type="IC50") # only IC50 records
Batch (async, concurrent)
import asyncio, targetrecon
async def run_batch(targets):
reports = await asyncio.gather(*[
targetrecon.recon_async(t) for t in targets
])
return reports
reports = asyncio.run(run_batch(["EGFR", "BRAF", "CDK2"]))
for r in reports:
print(r.uniprot.gene_name, r.num_bioactivities, r.num_unique_ligands)
Comparison
| Feature | TargetRecon | TargetDB (2020) | gget | Open Targets |
|---|---|---|---|---|
| AlphaFold integration | ✅ | ❌ | ✅ | ✅ (web) |
| ChEMBL bioactivity | ✅ | ✅ | ❌ | Partial |
| BindingDB binding constants | ✅ | ❌ | ❌ | ❌ |
| Interactive HTML report | ✅ | ❌ | ❌ | Web only |
| 3D structure viewer | ✅ | ❌ | ❌ | Web only |
| Molecule sketcher → targets | ✅ | ❌ | ❌ | ❌ |
| Docking-ready SDF export | ✅ | ❌ | ❌ | ❌ |
| AI agent chat | ✅ | ❌ | ❌ | ❌ |
| Batch CLI processing | ✅ | ❌ | ✅ | N/A |
| pip install + single command | ✅ | Partial | ✅ | N/A |
Installation
pip install targetrecon
Quick start:
targetrecon EGFR
Produces EGFR_report.html (interactive, self-contained), EGFR_report.json, and EGFR_top_ligands.sdf — ready for docking.
Development:
git clone https://github.com/nagarh/targetrecon.git
cd targetrecon
pip install -e ".[dev]"
Architecture
src/targetrecon/
├── cli.py # Click CLI — run, batch, serve
├── webapp.py # Flask web app — UI, report pages, AI agent routes
├── core.py # Orchestration, aggregation, export (HTML/JSON/SDF)
├── models.py # Pydantic data models
├── resolver.py # Gene → UniProt → ChEMBL ID resolution
├── report.py # Jinja2 HTML report generator (standalone)
├── agent_chat.py # AI agent — tool definitions, streaming, multi-provider
└── clients/
├── uniprot.py # UniProt REST API
├── pdb_client.py# RCSB PDB REST + Search API
├── alphafold.py # AlphaFold Database API
├── chembl.py # ChEMBL REST API
└── bindingdb.py # BindingDB REST API
Acknowledgments
Data from: UniProt · RCSB PDB · AlphaFold DB · ChEMBL · BindingDB
Visualization: 3Dmol.js · Chart.js · Cytoscape.js
Sketcher: Ketcher
Inspired by TargetDB and gget.
License
MIT License — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file targetrecon-0.1.3.tar.gz.
File metadata
- Download URL: targetrecon-0.1.3.tar.gz
- Upload date:
- Size: 95.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e091257dff838555213c687a7f17b7ea7120d49956469d75ec837f803b31445
|
|
| MD5 |
85635a0b9732e687bf9961f9c6a87525
|
|
| BLAKE2b-256 |
07edd72efae132e0329042d09c1c5b2149b5aefc336d0f1ba630ebf47e9a13f6
|
File details
Details for the file targetrecon-0.1.3-py3-none-any.whl.
File metadata
- Download URL: targetrecon-0.1.3-py3-none-any.whl
- Upload date:
- Size: 95.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7d73a9d9c9f367390f11de820f63dc35ccb7cde3d6f744c45f7920295634bcd
|
|
| MD5 |
cb6b840455c5284fef08c5e3dd4bd987
|
|
| BLAKE2b-256 |
43fc86e32b63b3eff546d3d1db1907ee385db9cdbb540d55c0db4b763bc3d840
|