Skip to main content

Drug target intelligence aggregator โ€” fetch, collate, and visualize public data for any protein target in one command.

Project description


title: TargetRecon emoji: ๐ŸŽฏ colorFrom: blue colorTo: green sdk: docker app_file: app.py pinned: false

Python License Version Databases PyPI

TargetRecon

Drug target intelligence aggregator โ€” fetch, collate, and visualize public data for any protein target in one command.

Aggregate UniProt ยท PDB ยท AlphaFold ยท ChEMBL ยท STRING-DB into a single interactive report โ€” in seconds.


What is TargetRecon?

TargetRecon is a Python CLI and web app that pulls data from 4 public databases and compiles it into a single, richly formatted report for any protein drug target. No API keys. No account. No manual copy-pasting.

Think of it as gget for drug discovery โ€” or TargetDB reimagined for the AlphaFold era.


Four Data Sources, One Report

Source Data
UniProt Function, subcellular location, GO terms, diseases, keywords
RCSB PDB Up to 50 experimental structures, filtered by resolution (default โ‰ค 4.0 ร…), sorted by resolution ascending
AlphaFold DB Predicted structure with pLDDT confidence coloring
ChEMBL Bioactivity data (IC50, Ki, Kd, EC50) sorted by pChEMBL descending
STRING-DB Proteinโ€“protein interaction network

Intelligent ID Resolution

Accepts gene names, UniProt accessions, or ChEMBL target IDs:

targetrecon EGFR          # Gene name
targetrecon P00533        # UniProt accession
targetrecon CHEMBL203     # ChEMBL target ID

Bioactivity Data

What is pChEMBL?

pChEMBL is a unified potency scale โ€” the negative logโ‚โ‚€ of the molar affinity. Higher = more potent.

pChEMBL = -logโ‚โ‚€(affinity_M)
pChEMBL Affinity Interpretation
9 1 nM Very potent
7 100 nM Potent
6 1 ยตM Moderate
< 5 > 10 ยตM Weak

ChEMBL natively reports pChEMBL values. TargetRecon uses this scale directly.

Full pipeline โ€” what happens when you run a query

1. Resolve query โ†’ UniProt ID / ChEMBL target ID
         โ”‚
         โ–ผ
2. Fetch in parallel (async):
   โ”œโ”€โ”€ ChEMBL API  โ†’ top-N bioactivity records, sorted by pChEMBL desc (server-side)
   โ”œโ”€โ”€ RCSB PDB    โ†’ experimental structures
   โ”œโ”€โ”€ AlphaFold   โ†’ predicted structure
   โ””โ”€โ”€ STRING DB   โ†’ protein interactions
         โ”‚
         โ–ผ
3. Apply min_pchembl filter (if set) to ChEMBL records
         โ”‚
         โ–ผ
4. Deduplicate by canonical SMILES โ†’ Ligand Summary
         โ”‚
         โ–ผ
5. Sort Ligand Summary by best pChEMBL descending
         โ”‚
         โ–ผ
6. Output: TargetReport (bioactivities + ligand_summary + structures + ...)

Fetching strategy

ChEMBL โ€” server-side sort + pagination:

  • Sends order_by=-pchembl_value to the API
  • Only the top-N most potent records are fetched โ€” no wasted API calls
  • Records with no pChEMBL value are excluded at the API level

Cap behavior

The default is 1000 records:

Setting ChEMBL Total
Default (1000) top 1000 most potent up to 1000
--max-bioactivities 500 top 500 most potent up to 500
--max-bioactivities all all records all available

Because sorting happens before the cap, you always get the most potent compounds โ€” never a random subset.

Interface No-limit syntax
CLI --max-bioactivities all
Web UI Drag the Max bioactivities slider to All
Python API max_bioactivities=None

Ligand deduplication

ChEMBL records are grouped by canonical SMILES (via RDKit). If the same molecule appears in multiple assays:

  • It becomes one entry in the Ligand Summary
  • The best pChEMBL across all assays is kept
  • num_assays counts the total assay measurements

The final Ligand Summary is sorted by best pChEMBL descending โ€” the most potent unique compound is always first.


CLI

pip install targetrecon

targetrecon / targetrecon run โ€” Single target

targetrecon EGFR
targetrecon P00533 -f html -f json -f sdf -o ./reports/
targetrecon BRAF --min-pchembl 7.0 --max-resolution 2.5
targetrecon CDK2 --max-bioactivities 5000         # up to 5000 records
targetrecon CDK2 --max-bioactivities all          # no limit
Option Default Description
-f, --format [json|html|sdf] html json sdf Output formats (repeat for multiple)
-o, --output PATH . Output directory
--max-resolution FLOAT 4.0 Max PDB resolution in ร… (up to 50 structures returned, sorted by resolution)
--max-bioactivities INT|all 1000 Max ChEMBL bioactivity records; all = no limit
--min-pchembl FLOAT โ€” Minimum pChEMBL value filter
--top-ligands INT 20 Number of top ligands for SDF export
-q, --quiet off Suppress progress messages

targetrecon batch โ€” Multiple targets

# Pass targets directly
targetrecon batch EGFR BRAF CDK2 ABL1

# From a file (one target per line, # = comment)
targetrecon batch -i targets.txt

# With filters and format selection
targetrecon batch -i targets.txt -f html -f sdf --min-pchembl 6.0 --skip-errors

# Unlimited bioactivities
targetrecon batch -i targets.txt --max-bioactivities all
Option Default Description
-i, --input PATH โ€” Text file, one target per line
-o, --output PATH ./batch_reports Output directory
-f, --format [json|html|sdf] html json sdf Output formats (repeat for multiple)
--max-resolution FLOAT 4.0 Max PDB resolution in ร… (up to 50 structures returned, sorted by resolution)
--max-bioactivities INT|all 1000 Max ChEMBL bioactivity records; all = no limit
--min-pchembl FLOAT โ€” Minimum pChEMBL value filter
--top-ligands INT 20 Ligands per SDF file
--skip-errors off Continue if a single target fails
-q, --quiet off Suppress progress messages

After completion, a summary table is printed showing structures / bioactivities / ligands per target.

targetrecon serve โ€” Launch web interface

targetrecon serve                  # http://localhost:5000
targetrecon serve --port 8080
targetrecon serve --host 0.0.0.0   # expose on all interfaces
Option Default Description
--port INT 5000 Port to listen on
--host TEXT 0.0.0.0 Host to bind
--debug off Enable Flask debug mode

Web UI

targetrecon serve
# Open http://localhost:5000
  • Dark-themed interface with animated molecular backdrop
  • Search by gene name, UniProt accession, or ChEMBL target ID
  • Molecule sketcher (Ketcher) โ€” draw a structure to find matching targets
  • Sidebar controls: max PDB resolution, min pChEMBL, ChEMBL toggle, max bioactivities slider (100โ€“5000, or drag to All for no limit)

Report tabs

Tab Contents
Overview UniProt summary, GO terms, diseases, protein stats
3D Viewer AlphaFold pLDDT coloring + PDB experimental structures (3Dmol.js)
Bioactivity pChEMBL distribution histogram, method breakdown chart
Ligands Sortable table ranked by potency โ€” SMILES, ChEMBL links, activity type, source
PDB All experimental structures with resolution, method, ligand count
Interactions STRING proteinโ€“protein interaction network (Cytoscape.js)

Export from the UI

Every report page has one-click download buttons:

  • JSON โ€” full machine-readable report
  • HTML โ€” self-contained interactive report (works fully offline)
  • SDF โ€” top ligands with 3D conformers, ready for docking

AI Agent

An AI chat panel is available on every report page. Click the AI button (bottom-right corner) to open it.

Providers & models

Provider Models
Anthropic claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5
OpenAI gpt-4o, gpt-4o-mini
Groq llama-3.3-70b, mixtral
  • Bring your own API key โ€” keys are never stored, forgotten after each browser session
  • Context-aware: the agent already knows the target you're looking at
  • Tools: search targets, fetch bioactivities, query PDB structures, protein interactions, compare targets
  • Streaming responses with stop button
  • Resizable and minimizable panel

Example questions

What are the best scaffolds for covalent inhibition?
Which PDB structures are most suitable for docking?
Compare the selectivity profile of this target vs CDK4.
Summarize the druggability of this target.

Python API

import targetrecon

# Single target โ€” works in scripts and Jupyter
report = targetrecon.recon("EGFR")
print(report.uniprot.protein_name)      # "Epidermal growth factor receptor"
print(report.num_pdb_structures)         # 50
print(report.num_bioactivities)          # up to 1000 (default ChEMBL cap)
print(report.best_ligand.best_pchembl)   # e.g. 10.52

# With options
report = targetrecon.recon(
    "BRAF",
    max_bioactivities=5000,   # up to 5000 per source
    min_pchembl=7.0,
    max_pdb_resolution=2.5,
)

# No limit โ€” fetch all available records
report = targetrecon.recon("BRAF", max_bioactivities=None)

# Async (for use with asyncio.run or inside async functions)
import asyncio
report = asyncio.run(targetrecon.recon_async("BRAF"))

# Async with all options
report = asyncio.run(targetrecon.recon_async(
    "CDK2",
    max_bioactivities=2000,
    # max_bioactivities=None  # no limit
))

Accessing the data

# UniProt info
report.uniprot.protein_name           # e.g. "Epidermal growth factor receptor"
report.uniprot.gene_name              # e.g. "EGFR"
report.uniprot.organism               # e.g. "Homo sapiens"
report.uniprot.function_description   # functional annotation text
report.uniprot.subcellular_locations  # list[str]
report.uniprot.disease_associations   # list[str]
report.uniprot.keywords               # list[str]
report.uniprot.go_terms               # list[GoTerm] โ€” each has .go_id, .term, .category
report.uniprot.sequence_length        # int

# PDB structures
for pdb in report.pdb_structures[:5]:
    print(pdb.pdb_id, pdb.resolution, pdb.method)
    for lig in pdb.ligands:           # list[PDBLigand] โ€” each has .ligand_id, .smiles, .name
        print(lig.ligand_id, lig.name)

# AlphaFold
report.alphafold.pdb_url        # URL to AlphaFold PDB structure
report.alphafold.model_url      # URL to AlphaFold CIF model

# Bioactivity records (sorted by pChEMBL descending)
for b in report.bioactivities[:10]:
    print(b.source, b.activity_type, b.value, b.pchembl_value, b.smiles)

# Ligand summary (deduplicated by canonical SMILES, sorted by best pChEMBL)
for lig in report.ligand_summary[:10]:
    print(lig.name, lig.chembl_id, lig.best_pchembl, lig.best_activity_type, lig.num_assays)
    print(lig.sources)                # e.g. ["ChEMBL"]

report.best_ligand               # most potent unique ligand overall

Export

from targetrecon.core import save_html, save_json, save_sdf

save_html(report, "EGFR_report.html")
save_json(report, "EGFR_report.json")

# SDF with filters
save_sdf(report, "EGFR_ligands.sdf",
         top_n=50,              # limit to top 50
         min_pchembl=7.0,       # only pChEMBL โ‰ฅ 7
         activity_type="IC50")  # only IC50 records

Batch (async, concurrent)

import asyncio, targetrecon

async def run_batch(targets):
    reports = await asyncio.gather(*[
        targetrecon.recon_async(t) for t in targets
    ])
    return reports

reports = asyncio.run(run_batch(["EGFR", "BRAF", "CDK2"]))
for r in reports:
    print(r.uniprot.gene_name, r.num_bioactivities, r.num_unique_ligands)

Comparison

Feature TargetRecon TargetDB (2020) gget Open Targets
AlphaFold integration โœ… โŒ โœ… โœ… (web)
ChEMBL bioactivity โœ… โœ… โŒ Partial
Interactive HTML report โœ… โŒ โŒ Web only
3D structure viewer โœ… โŒ โŒ Web only
Molecule sketcher โ†’ targets โœ… โŒ โŒ โŒ
Docking-ready SDF export โœ… โŒ โŒ โŒ
AI agent chat โœ… โŒ โŒ โŒ
Batch CLI processing โœ… โŒ โœ… N/A
pip install + single command โœ… Partial โœ… N/A

Installation

pip install targetrecon

Quick start:

targetrecon EGFR

Produces EGFR_report.html (interactive, self-contained), EGFR_report.json, and EGFR_top_ligands.sdf โ€” ready for docking.

Development:

git clone https://github.com/nagarh/targetrecon.git
cd targetrecon
pip install -e ".[dev]"

Architecture

src/targetrecon/
โ”œโ”€โ”€ cli.py           # Click CLI โ€” run, batch, serve
โ”œโ”€โ”€ webapp.py        # Flask web app โ€” UI, report pages, AI agent routes
โ”œโ”€โ”€ core.py          # Orchestration, aggregation, export (HTML/JSON/SDF)
โ”œโ”€โ”€ models.py        # Pydantic data models
โ”œโ”€โ”€ resolver.py      # Gene โ†’ UniProt โ†’ ChEMBL ID resolution
โ”œโ”€โ”€ report.py        # Jinja2 HTML report generator (standalone)
โ”œโ”€โ”€ agent_chat.py    # AI agent โ€” tool definitions, streaming, multi-provider
โ””โ”€โ”€ clients/
    โ”œโ”€โ”€ uniprot.py   # UniProt REST API
    โ”œโ”€โ”€ pdb_client.py# RCSB PDB REST + Search API
    โ”œโ”€โ”€ alphafold.py # AlphaFold Database API
    โ”œโ”€โ”€ chembl.py    # ChEMBL REST API
    โ””โ”€โ”€ string_db.py # STRING-DB REST API

Author

Hemantn Nagar ๐Ÿ“ง hn533621@ohio.edu ๐Ÿ”— github.com/nagarh


References

Data from: UniProt ยท RCSB PDB ยท AlphaFold DB ยท ChEMBL ยท STRING-DB

Visualization: 3Dmol.js ยท Chart.js ยท Cytoscape.js

Sketcher: Ketcher

Inspired by TargetDB and gget.


License

MIT License โ€” see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

targetrecon-0.1.12.tar.gz (44.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

targetrecon-0.1.12-py3-none-any.whl (44.4 MB view details)

Uploaded Python 3

File details

Details for the file targetrecon-0.1.12.tar.gz.

File metadata

  • Download URL: targetrecon-0.1.12.tar.gz
  • Upload date:
  • Size: 44.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for targetrecon-0.1.12.tar.gz
Algorithm Hash digest
SHA256 7b8827c36c5228d82ddce57abbbda084bf4eed505b5b5aa477a3f37d3d085b65
MD5 9a6bfac6be6bb58d069923f107636a77
BLAKE2b-256 d27748fed25ccfd2a5eb4c7aa668ca8135cd90302b07db488bb7bca17c3736f5

See more details on using hashes here.

File details

Details for the file targetrecon-0.1.12-py3-none-any.whl.

File metadata

  • Download URL: targetrecon-0.1.12-py3-none-any.whl
  • Upload date:
  • Size: 44.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for targetrecon-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 93a4ab43f9bc2af91eab9f777a737736107d7fd131dfa9cddb2fa8378ed36bec
MD5 50255eed8e5a45ec39531782242f0469
BLAKE2b-256 1d664f8fc3049db4a4a9d6dc7844b4d87eda1a2fea2cceb17dbc161b160e60b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page