Skip to main content

Drug target intelligence aggregator โ€” fetch, collate, and visualize public data for any protein target in one command.

Project description


title: TargetRecon emoji: ๐ŸŽฏ colorFrom: blue colorTo: green sdk: docker app_file: app.py pinned: false

Python License Version Databases PyPI

TargetRecon

Drug target intelligence aggregator โ€” fetch, collate, and visualize public data for any protein target in one command.

Aggregate UniProt ยท PDB ยท AlphaFold ยท ChEMBL ยท STRING-DB into a single interactive report โ€” in seconds.


What is TargetRecon?

TargetRecon is a Python CLI and web app that pulls data from 4 public databases and compiles it into a single, richly formatted report for any protein drug target. No API keys. No account. No manual copy-pasting.

Think of it as gget for drug discovery โ€” or TargetDB reimagined for the AlphaFold era.


Four Data Sources, One Report

Source Data
UniProt Function, subcellular location, GO terms, diseases, keywords
RCSB PDB Up to 50 experimental structures, filtered by resolution (default โ‰ค 4.0 ร…), sorted by resolution ascending
AlphaFold DB Predicted structure with pLDDT confidence coloring
ChEMBL Bioactivity data (IC50, Ki, Kd, EC50) sorted by pChEMBL descending
STRING-DB Proteinโ€“protein interaction network

Intelligent ID Resolution

Accepts gene names, UniProt accessions, or ChEMBL target IDs:

targetrecon EGFR          # Gene name
targetrecon P00533        # UniProt accession
targetrecon CHEMBL203     # ChEMBL target ID

Bioactivity Data

What is pChEMBL?

pChEMBL is a unified potency scale โ€” the negative logโ‚โ‚€ of the molar affinity. Higher = more potent.

pChEMBL = -logโ‚โ‚€(affinity_M)
pChEMBL Affinity Interpretation
9 1 nM Very potent
7 100 nM Potent
6 1 ยตM Moderate
< 5 > 10 ยตM Weak

ChEMBL natively reports pChEMBL values. TargetRecon uses this scale directly.

Full pipeline โ€” what happens when you run a query

1. Resolve query โ†’ UniProt ID / ChEMBL target ID
         โ”‚
         โ–ผ
2. Fetch in parallel (async):
   โ”œโ”€โ”€ ChEMBL API  โ†’ top-N bioactivity records, sorted by pChEMBL desc (server-side)
   โ”œโ”€โ”€ RCSB PDB    โ†’ experimental structures
   โ”œโ”€โ”€ AlphaFold   โ†’ predicted structure
   โ””โ”€โ”€ STRING DB   โ†’ protein interactions
         โ”‚
         โ–ผ
3. Apply min_pchembl filter (if set) to ChEMBL records
         โ”‚
         โ–ผ
4. Deduplicate by canonical SMILES โ†’ Ligand Summary
         โ”‚
         โ–ผ
5. Sort Ligand Summary by best pChEMBL descending
         โ”‚
         โ–ผ
6. Output: TargetReport (bioactivities + ligand_summary + structures + ...)

Fetching strategy

ChEMBL โ€” server-side sort + pagination:

  • Sends order_by=-pchembl_value to the API
  • Only the top-N most potent records are fetched โ€” no wasted API calls
  • Records with no pChEMBL value are excluded at the API level

Cap behavior

The default is 1000 records:

Setting ChEMBL Total
Default (1000) top 1000 most potent up to 1000
--max-bioactivities 500 top 500 most potent up to 500
--max-bioactivities all all records all available

Because sorting happens before the cap, you always get the most potent compounds โ€” never a random subset.

Interface No-limit syntax
CLI --max-bioactivities all
Web UI Drag the Max bioactivities slider to All
Python API max_bioactivities=None

Ligand deduplication

ChEMBL records are grouped by canonical SMILES (via RDKit). If the same molecule appears in multiple assays:

  • It becomes one entry in the Ligand Summary
  • The best pChEMBL across all assays is kept
  • num_assays counts the total assay measurements

The final Ligand Summary is sorted by best pChEMBL descending โ€” the most potent unique compound is always first.


CLI

pip install targetrecon

targetrecon / targetrecon run โ€” Single target

targetrecon EGFR
targetrecon P00533 -f html -f json -f sdf -o ./reports/
targetrecon BRAF --min-pchembl 7.0 --max-resolution 2.5
targetrecon CDK2 --max-bioactivities 5000         # up to 5000 records
targetrecon CDK2 --max-bioactivities all          # no limit
Option Default Description
-f, --format [json|html|sdf] html json sdf Output formats (repeat for multiple)
-o, --output PATH . Output directory
--max-resolution FLOAT 4.0 Max PDB resolution in ร… (up to 50 structures returned, sorted by resolution)
--max-bioactivities INT|all 1000 Max ChEMBL bioactivity records; all = no limit
--min-pchembl FLOAT โ€” Minimum pChEMBL value filter
--top-ligands INT 20 Number of top ligands for SDF export
-q, --quiet off Suppress progress messages

targetrecon batch โ€” Multiple targets

# Pass targets directly
targetrecon batch EGFR BRAF CDK2 ABL1

# From a file (one target per line, # = comment)
targetrecon batch -i targets.txt

# With filters and format selection
targetrecon batch -i targets.txt -f html -f sdf --min-pchembl 6.0 --skip-errors

# Unlimited bioactivities
targetrecon batch -i targets.txt --max-bioactivities all
Option Default Description
-i, --input PATH โ€” Text file, one target per line
-o, --output PATH ./batch_reports Output directory
-f, --format [json|html|sdf] html json sdf Output formats (repeat for multiple)
--max-resolution FLOAT 4.0 Max PDB resolution in ร… (up to 50 structures returned, sorted by resolution)
--max-bioactivities INT|all 1000 Max ChEMBL bioactivity records; all = no limit
--min-pchembl FLOAT โ€” Minimum pChEMBL value filter
--top-ligands INT 20 Ligands per SDF file
--skip-errors off Continue if a single target fails
-q, --quiet off Suppress progress messages

After completion, a summary table is printed showing structures / bioactivities / ligands per target.

targetrecon serve โ€” Launch web interface

targetrecon serve                  # http://localhost:5000
targetrecon serve --port 8080
targetrecon serve --host 0.0.0.0   # expose on all interfaces
Option Default Description
--port INT 5000 Port to listen on
--host TEXT 0.0.0.0 Host to bind
--debug off Enable Flask debug mode

Web UI

targetrecon serve
# Open http://localhost:5000
  • Dark-themed interface with animated molecular backdrop
  • Search by gene name, UniProt accession, or ChEMBL target ID
  • Molecule sketcher (Ketcher) โ€” draw a structure to find matching targets
  • Sidebar controls: max PDB resolution, min pChEMBL, ChEMBL toggle, max bioactivities slider (100โ€“5000, or drag to All for no limit)

Report tabs

Tab Contents
Overview UniProt summary, GO terms, diseases, protein stats
3D Viewer AlphaFold pLDDT coloring + PDB experimental structures (3Dmol.js)
Bioactivity pChEMBL distribution histogram, method breakdown chart
Ligands Sortable table ranked by potency โ€” SMILES, ChEMBL links, activity type, source
PDB All experimental structures with resolution, method, ligand count
Interactions STRING proteinโ€“protein interaction network (Cytoscape.js)

Export from the UI

Every report page has one-click download buttons:

  • JSON โ€” full machine-readable report
  • HTML โ€” self-contained interactive report (works fully offline)
  • SDF โ€” top ligands with 3D conformers, ready for docking

AI Agent

An AI chat panel is available on every report page. Click the AI button (bottom-right corner) to open it.

Providers & models

Provider Models
Anthropic claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5
OpenAI gpt-4o, gpt-4o-mini
Groq llama-3.3-70b, mixtral
  • Bring your own API key โ€” keys are never stored, forgotten after each browser session
  • Context-aware: the agent already knows the target you're looking at
  • Tools: search targets, fetch bioactivities, query PDB structures, protein interactions, compare targets
  • Streaming responses with stop button
  • Resizable and minimizable panel

Example questions

What are the best scaffolds for covalent inhibition?
Which PDB structures are most suitable for docking?
Compare the selectivity profile of this target vs CDK4.
Summarize the druggability of this target.

Python API

import targetrecon

# Single target โ€” works in scripts and Jupyter
report = targetrecon.recon("EGFR")
print(report.uniprot.protein_name)      # "Epidermal growth factor receptor"
print(report.num_pdb_structures)         # 50
print(report.num_bioactivities)          # up to 1000 (default ChEMBL cap)
print(report.best_ligand.best_pchembl)   # e.g. 10.52

# With options
report = targetrecon.recon(
    "BRAF",
    max_bioactivities=5000,   # up to 5000 per source
    min_pchembl=7.0,
    max_pdb_resolution=2.5,
)

# No limit โ€” fetch all available records
report = targetrecon.recon("BRAF", max_bioactivities=None)

# Async (for use with asyncio.run or inside async functions)
import asyncio
report = asyncio.run(targetrecon.recon_async("BRAF"))

# Async with all options
report = asyncio.run(targetrecon.recon_async(
    "CDK2",
    max_bioactivities=2000,
    # max_bioactivities=None  # no limit
))

Accessing the data

# UniProt info
report.uniprot.protein_name           # e.g. "Epidermal growth factor receptor"
report.uniprot.gene_name              # e.g. "EGFR"
report.uniprot.organism               # e.g. "Homo sapiens"
report.uniprot.function_description   # functional annotation text
report.uniprot.subcellular_locations  # list[str]
report.uniprot.disease_associations   # list[str]
report.uniprot.keywords               # list[str]
report.uniprot.go_terms               # list[GoTerm] โ€” each has .go_id, .term, .category
report.uniprot.sequence_length        # int

# PDB structures
for pdb in report.pdb_structures[:5]:
    print(pdb.pdb_id, pdb.resolution, pdb.method)
    for lig in pdb.ligands:           # list[PDBLigand] โ€” each has .ligand_id, .smiles, .name
        print(lig.ligand_id, lig.name)

# AlphaFold
report.alphafold.pdb_url        # URL to AlphaFold PDB structure
report.alphafold.model_url      # URL to AlphaFold CIF model

# Bioactivity records (sorted by pChEMBL descending)
for b in report.bioactivities[:10]:
    print(b.source, b.activity_type, b.value, b.pchembl_value, b.smiles)

# Ligand summary (deduplicated by canonical SMILES, sorted by best pChEMBL)
for lig in report.ligand_summary[:10]:
    print(lig.name, lig.chembl_id, lig.best_pchembl, lig.best_activity_type, lig.num_assays)
    print(lig.sources)                # e.g. ["ChEMBL"]

report.best_ligand               # most potent unique ligand overall

Export

from targetrecon.core import save_html, save_json, save_sdf

save_html(report, "EGFR_report.html")
save_json(report, "EGFR_report.json")

# SDF with filters
save_sdf(report, "EGFR_ligands.sdf",
         top_n=50,              # limit to top 50
         min_pchembl=7.0,       # only pChEMBL โ‰ฅ 7
         activity_type="IC50")  # only IC50 records

Batch (async, concurrent)

import asyncio, targetrecon

async def run_batch(targets):
    reports = await asyncio.gather(*[
        targetrecon.recon_async(t) for t in targets
    ])
    return reports

reports = asyncio.run(run_batch(["EGFR", "BRAF", "CDK2"]))
for r in reports:
    print(r.uniprot.gene_name, r.num_bioactivities, r.num_unique_ligands)

Comparison

Feature TargetRecon TargetDB (2020) gget Open Targets
AlphaFold integration โœ… โŒ โœ… โœ… (web)
ChEMBL bioactivity โœ… โœ… โŒ Partial
Interactive HTML report โœ… โŒ โŒ Web only
3D structure viewer โœ… โŒ โŒ Web only
Molecule sketcher โ†’ targets โœ… โŒ โŒ โŒ
Docking-ready SDF export โœ… โŒ โŒ โŒ
AI agent chat โœ… โŒ โŒ โŒ
Batch CLI processing โœ… โŒ โœ… N/A
pip install + single command โœ… Partial โœ… N/A

Installation

pip install targetrecon

Quick start:

targetrecon EGFR

Produces EGFR_report.html (interactive, self-contained), EGFR_report.json, and EGFR_top_ligands.sdf โ€” ready for docking.

Development:

git clone https://github.com/nagarh/targetrecon.git
cd targetrecon
pip install -e ".[dev]"

Architecture

src/targetrecon/
โ”œโ”€โ”€ cli.py           # Click CLI โ€” run, batch, serve
โ”œโ”€โ”€ webapp.py        # Flask web app โ€” UI, report pages, AI agent routes
โ”œโ”€โ”€ core.py          # Orchestration, aggregation, export (HTML/JSON/SDF)
โ”œโ”€โ”€ models.py        # Pydantic data models
โ”œโ”€โ”€ resolver.py      # Gene โ†’ UniProt โ†’ ChEMBL ID resolution
โ”œโ”€โ”€ report.py        # Jinja2 HTML report generator (standalone)
โ”œโ”€โ”€ agent_chat.py    # AI agent โ€” tool definitions, streaming, multi-provider
โ””โ”€โ”€ clients/
    โ”œโ”€โ”€ uniprot.py   # UniProt REST API
    โ”œโ”€โ”€ pdb_client.py# RCSB PDB REST + Search API
    โ”œโ”€โ”€ alphafold.py # AlphaFold Database API
    โ”œโ”€โ”€ chembl.py    # ChEMBL REST API
    โ””โ”€โ”€ string_db.py # STRING-DB REST API

Author

Hemantn Nagar ๐Ÿ“ง hn533621@ohio.edu ๐Ÿ”— github.com/nagarh


References

Data from: UniProt ยท RCSB PDB ยท AlphaFold DB ยท ChEMBL ยท STRING-DB

Visualization: 3Dmol.js ยท Chart.js ยท Cytoscape.js

Sketcher: Ketcher

Inspired by TargetDB and gget.


License

MIT License โ€” see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

targetrecon-0.1.10.tar.gz (15.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

targetrecon-0.1.10-py3-none-any.whl (15.9 MB view details)

Uploaded Python 3

File details

Details for the file targetrecon-0.1.10.tar.gz.

File metadata

  • Download URL: targetrecon-0.1.10.tar.gz
  • Upload date:
  • Size: 15.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for targetrecon-0.1.10.tar.gz
Algorithm Hash digest
SHA256 cf0f88495e145c7db94c59f14f87d2f97589c089fe176c9ca11f055733b16d25
MD5 e177a3aaa4b79beeb1484c3e158775fd
BLAKE2b-256 2e135c3a2852c3641698e902b2dc42037f3e4f5d50743f2b89ce3fbefc91884a

See more details on using hashes here.

File details

Details for the file targetrecon-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: targetrecon-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 15.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for targetrecon-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 b1c6ecc8eeea7a58f123c5c411bf0e458ccb72f5da54e6d71656621c8dcb8cf8
MD5 8b92dae397f30411be983ac95c9ac0a6
BLAKE2b-256 cb5a5bd182f562effeb2e5abf9d2dcc418434e0b19020fae6337e5574669f109

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page