Normalize and cross-reference global public health data for medical hypothesis generation
Project description
hypokrates
Democratizing pharmacovigilance through open public health data.
hypokrates is an open-source Python library that normalizes and cross-references 15 global pharmacovigilance and drug safety databases, exposing them via MCP so that any person with access to an LLM can generate medical hypotheses.
Hippocrates observed a few patients. Today we can observe millions. What's missing is the tool to ask better questions.
The name comes from the original Greek spelling of Hippocrates (Hippokrates) — who broke the model of his era by making medical knowledge open instead of guarded by temple priests. The "hypo" prefix also evokes "hypothesis". Public health data collected with public money, normalized and cross-referenced in an open library, so any doctor in the world can generate hypotheses that save lives.
The problem
Medical knowledge discovery has a bottleneck: hypothesis generation.
Tools like OpenEvidence and PubMed solve literature search — finding what has been studied. But they cannot find what has not been studied yet. Signals that exist in pharmacovigilance data (20M+ adverse event reports across 3 countries), molecular mechanism databases, and drug labels — but that no one has cross-referenced because the data lives in silos with different formats, vocabularies, and access patterns.
hypokrates cross-references FAERS + JADER + Canada Vigilance + PubMed + DailyMed + DrugBank + OpenTargets + ChEMBL + OnSIDES + PharmGKB + ClinicalTrials.gov + ANVISA — and returns a structured hypothesis with evidence level, in seconds.
Install
pip install hypokrates
# Optional extras
pip install hypokrates[trials] # ClinicalTrials.gov (Cloudflare bypass via curl_cffi)
pip install hypokrates[mcp] # MCP server (typer + mcp)
Quick start
from hypokrates.config import configure
# Optional: API keys raise rate limits
configure(
openfda_api_key="your-key", # 40 -> 240 req/min
ncbi_api_key="your-key", # 180 -> 600 req/min
ncbi_email="you@example.com",
)
Signal detection
Disproportionality analysis (PRR, ROR, IC, EBGM) for any drug-event pair:
from hypokrates.sync import stats
result = stats.signal("sugammadex", "bradycardia")
print(f"PRR: {result.prr.value:.2f}")
print(f"Signal: {result.signal_detected}") # >= 2/3 measures significant
Hypothesis generation
Cross-reference FAERS signal + PubMed + up to 10 optional sources:
from hypokrates.sync import cross
result = cross.hypothesis(
"sugammadex", "bradycardia",
check_label=True, # DailyMed FDA label
check_trials=True, # ClinicalTrials.gov
check_chembl=True, # ChEMBL mechanism
check_opentargets=True, # OpenTargets LRT score
check_canada=True, # Canada Vigilance cross-validation
check_jader=True, # JADER (Japan) cross-validation
)
print(result.classification) # novel_hypothesis | emerging_signal | known_association | no_signal
print(result.summary)
Automated drug scanning
Scan top adverse events with parallel hypothesis generation:
from hypokrates.sync import scan
result = scan.scan_drug(
"sugammadex",
top_n=15,
check_labels=True,
check_chembl=True,
primary_suspect_only=True, # PS-only role filter (bulk data)
check_direction=True, # base PRR vs PS-only comparison
)
for item in result.items:
print(f"#{item.rank} {item.event}: {item.classification.value} (score={item.score:.1f})")
Data sources
15 sources across 3 countries, all publicly accessible:
| Source | Module | Coverage | Auth |
|---|---|---|---|
| OpenFDA/FAERS | faers |
USA, 20M+ reports | Optional API key |
| FAERS Bulk | faers_bulk |
USA, deduplicated | Local quarterly ZIPs |
| Canada Vigilance | canada |
Canada, 738K+ reports | Local bulk download |
| JADER (PMDA) | jader |
Japan, 1M+ reports | Local CSVs (free) |
| PubMed | pubmed |
Global, 36M+ papers | Optional API key |
| DailyMed | dailymed |
USA FDA labels | None |
| ClinicalTrials.gov | trials |
Global | None (needs curl_cffi) |
| DrugBank | drugbank |
Global | Local XML (free academic) |
| OpenTargets | opentargets |
Global | None |
| ChEMBL | chembl |
Global | None |
| OnSIDES | onsides |
US/EU/UK/JP labels | Local CSVs (free) |
| PharmGKB | pharmgkb |
Global pharmacogenomics | None |
| ANVISA | anvisa |
Brazil drug registry | None (auto-download) |
| RxNorm | vocab |
Drug name normalization | None |
| MeSH | vocab |
Medical term mapping | None |
Demographic stratification
FAERS Bulk and Canada Vigilance support filtering by sex and age group:
from hypokrates.faers_bulk.models import StrataFilter
from hypokrates.sync import faers_bulk
result = faers_bulk.bulk_signal(
"rocuronium", "anaphylactic shock",
strata=StrataFilter(sex="F", age_group="65+"),
)
Cross-country validation
The same drug-event pair checked across USA, Canada, and Japan:
from hypokrates.sync import stats, canada, jader
usa = stats.signal("rocuronium", "anaphylactic shock")
can = canada.canada_signal("rocuronium", "anaphylactic shock")
jpn = jader.jader_signal("rocuronium", "anaphylactic shock")
MCP Server
44 tools available for LLM integration via Model Context Protocol:
python -m hypokrates.mcp
Configure in Claude Desktop, Cursor, or any MCP client:
{
"mcpServers": {
"hypokrates": {
"type": "stdio",
"command": "python",
"args": ["-m", "hypokrates.mcp"],
"env": {
"OPENFDA_API_KEY": "your-key",
"NCBI_API_KEY": "your-key",
"NCBI_EMAIL": "you@example.com",
"DRUGBANK_PATH": "/path/to/drugbank.xml",
"ONSIDES_PATH": "/path/to/onsides/csvs/",
"CANADA_BULK_PATH": "/path/to/canada/extracted/",
"JADER_BULK_PATH": "/path/to/jader/csvs/",
"FAERS_BULK_DIR": "/path/to/faers/quarterly/"
}
}
}
}
Core tools: signal, hypothesis, scan_drug, compare_signals, compare_class
Source tools: adverse_events, top_events, drugs_by_event, search_papers, label_events, check_label, search_trials, drug_info, drug_interactions, drug_mechanism, drug_adverse_events, drug_safety_score, onsides_events, pgx_annotations, normalize_drug, map_to_mesh
Bulk tools: faers_bulk_signal, faers_bulk_load, canada_signal, canada_top_events, jader_signal, jader_top_events
Architecture
hypokrates/
├── faers/ # OpenFDA FAERS API (adverse events, co-suspect detection)
├── faers_bulk/ # FAERS quarterly ASCII (dedup, role filter, strata)
├── stats/ # Disproportionality measures (PRR, ROR, IC, EBGM)
├── cross/ # Hypothesis generation (signal + literature + enrichments)
├── scan/ # Automated drug scanning with scoring
├── evidence/ # Evidence blocks with provenance and limitations
├── pubmed/ # PubMed/NCBI E-utilities
├── vocab/ # RxNorm normalization + MedDRA synonym grouping
├── dailymed/ # FDA label parsing (SPL XML)
├── trials/ # ClinicalTrials.gov (curl_cffi for Cloudflare)
├── drugbank/ # DrugBank XML (mechanism, interactions, enzymes)
├── opentargets/ # OpenTargets Platform (GraphQL, LRT scores)
├── chembl/ # ChEMBL (mechanism, targets, metabolism)
├── onsides/ # OnSIDES international labels (NLP-extracted)
├── pharmgkb/ # PharmGKB pharmacogenomics (CPIC/DPWG guidelines)
├── canada/ # Canada Vigilance (cross-country validation)
├── jader/ # JADER/PMDA Japan (cross-country, JP→EN translation)
├── anvisa/ # ANVISA Brazil (drug registry, PT↔EN mapping)
├── cache/ # DuckDB HTTP cache (thread-safe singleton)
├── http/ # BaseClient with retry, rate limiting, auth
└── mcp/ # MCP server (44 tools)
Async-first with sync wrappers. DuckDB for cache and bulk stores. Pydantic 2 for all models. mypy strict. 1349 tests.
Who is this for
- The anesthesiologist who saw a pattern in patients and wants to know if it's real
- The researcher at a public university without a bioinformatics team
- The resident who disagrees with a protocol and wants data to support their case
- The doctor in rural Brazil who can't access institutional research infrastructure
- Any medical professional with access to an LLM who wants to generate evidence-based hypotheses
Important disclaimers
- Not for clinical use. hypokrates generates hypotheses, not diagnoses.
- PRR is not absolute risk. Disproportionality measures detect reporting patterns, not causation.
- FAERS is voluntary reporting. Underreporting is systematic. Absence of signal does not mean absence of risk.
- Cross-country comparison requires caution. Different reporting cultures, populations, and healthcare systems.
- Every output includes explicit limitations and confidence levels.
Status
Alpha (v0.7.0) — 1349 tests, mypy strict, ruff clean. Under active development.
License
AGPL-3.0-only — Public data, public code, public benefit.
"First, do no harm." — Hippocratic Oath
"First, make the data accessible." — hypokrates
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hypokrates-0.7.0.tar.gz.
File metadata
- Download URL: hypokrates-0.7.0.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76fdae678c60e5b9b3733d8c6c8e50fb2b97e58f2a2009d24c1fdaa598c38091
|
|
| MD5 |
7a4fa6e634b83a2eafdb8c5f96b872cd
|
|
| BLAKE2b-256 |
2c107971fdce367044c7419c4567395934426e828b1fdb635dcbb050ef568556
|
File details
Details for the file hypokrates-0.7.0-py3-none-any.whl.
File metadata
- Download URL: hypokrates-0.7.0-py3-none-any.whl
- Upload date:
- Size: 241.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42764704c1771ee97aa8daeee2c64009addb5eb6f0cbd7a46f737711de6d2c23
|
|
| MD5 |
99326ed3b2a82a7c370ce9562add30cf
|
|
| BLAKE2b-256 |
5440dabca3a70eebf2387c188d855d9df050ca252f60186b57a55ca8fe03e0e9
|