The most precise FHIR agent for Latin America
Project description
SaludAI
The first open-source FHIR reasoning agent with multi-LLM benchmarks and LATAM localization — every reasoning step auditable via Langfuse, designed for public health systems.
Ask: "Pacientes con diabetes tipo 2 mayores de 60 en Buenos Aires"
Get: A structured, sourced answer — with every reasoning step traced in Langfuse.
SaludAI translates clinical questions in natural language into FHIR R4 API calls, resolves medical terminology (SNOMED CT, CIE-10, LOINC, ATC), navigates multi-resource references, and returns traceable answers. Built and tested against Argentine synthetic data on HAPI FHIR. Designed to extend to other FHIR-compliant systems via locale packs.
Multi-LLM Benchmark
Evaluated on 100 questions across 200 synthetic Argentine patients (3,182 FHIR resources, 10 resource types). Inspired by FHIR-AgentBench (Verily/KAIST/MIT).
| Model | Accuracy | Simple (16) | Medium (41) | Complex (43) | Errors* | P50 Latency |
|---|---|---|---|---|---|---|
| Claude Sonnet 4.5 | 84.0% | 94% | 93% | 72% | 8 | 12.7s |
| Claude Haiku 4.5 | 77.0% | 100% | 80% | 65% | 7 | 6.6s |
| GPT-4o | 63.0% | 100% | 73% | 40% | 3 | 14.4s |
| Llama 3.3 70B | 48.0% | 94% | 63% | 16% | 9 | 6.5s |
| Qwen 3.5 9B | 25.0% | 50% | 29% | 12% | 1 | 11.8s |
*Errors = agent exceeded iteration budget (8 steps) and could not produce an answer. These count as incorrect in the accuracy score.
All models use the same agent loop, tools, and system prompt. Differences reflect reasoning ability, tool calling reliability, and schema handling. Questions cover terminology resolution, multi-hop reference traversal, server-side counting, aggregation, and temporal filtering across 10 FHIR resource types.
Benchmark scope: 100 internally-written questions evaluated on 200 synthetic patients with curated terminology codes. This benchmark tracks our development progress — it is not comparable to clinical benchmarks like FHIR-AgentBench (2,931 clinician-written questions on real de-identified data). We plan to evaluate against their public dataset. See experiment log for detailed methodology and per-question analysis.
Architecture
graph TB
User["User (NL query)"]
subgraph Agent["saludai-agent"]
Planner["Query Planner<br/>(FHIR knowledge graph)"]
Loop["Agent Loop<br/>(plan → execute → evaluate)"]
Tools["Tools"]
end
subgraph ToolSet["Tool Registry"]
T1["resolve_terminology<br/>SNOMED CT · CIE-10 · LOINC · ATC"]
T2["search_fhir<br/>with auto-pagination"]
T3["count_fhir<br/>server-side _summary=count"]
T4["get_resource<br/>direct reference lookup"]
T5["execute_code<br/>sandboxed Python"]
end
subgraph Core["saludai-core"]
FHIR["FHIR Client (httpx)"]
Term["Terminology Resolver<br/>(rapidfuzz)"]
QB["Query Builder"]
Locale["Locale Pack (AR)"]
end
HAPI["HAPI FHIR R4<br/>200 patients · 3,182 resources"]
LLM["LLM Provider<br/>(Anthropic · OpenAI · Ollama)"]
Langfuse["Langfuse<br/>(observability)"]
User --> Loop
Loop --> Planner
Planner --> Loop
Loop --> Tools
Tools --> T1 & T2 & T3 & T4 & T5
T1 --> Term
T2 & T3 & T4 --> FHIR
T5 --> Loop
FHIR --> HAPI
Loop --> LLM
Loop --> Langfuse
Term --> Locale
QB --> Locale
Key design decisions:
- No LangChain. The agent loop is ~300 lines of Python. Every step is auditable and traceable. We chose simplicity over framework magic — see ADR-002.
- Hybrid Query Planner. A plan-and-execute pattern with a FHIR knowledge graph (resource relationships + query pattern catalog). The planner classifies the question and selects a strategy before the agent starts calling tools — see ADR-009.
- Action Space Reduction. Instead of suggesting tools via prompt, we remove irrelevant tools from the LLM's context based on the query plan. The model can't misuse what it can't see.
- Provider-agnostic. Same agent loop works with Claude, GPT-4o, Llama, or Qwen. Swap the model, keep everything else.
Quick Start
# Clone and install
git clone https://github.com/saludai-labs/saludai.git
cd saludai
uv sync
# Start HAPI FHIR with 200 synthetic Argentine patients
docker compose up -d
# Wait ~30s for seeding, then verify
curl http://localhost:8890/fhir/Patient?_summary=count
# Run the agent
uv run saludai query "¿Cuántos pacientes tienen diabetes tipo 2?"
# Run the benchmark
uv run python -m benchmarks.run_eval
# Run tests (696 tests, 95% coverage)
uv run pytest
Prerequisites: Python 3.12+, UV, Docker
Usage
MCP Server (Claude Desktop / Claude Code / Cursor)
SaludAI exposes its tools via the Model Context Protocol:
# Start MCP server (stdio transport)
uv run saludai-mcp
Add to your MCP client config (claude_desktop_config.json):
{
"mcpServers": {
"saludai": {
"command": "uv",
"args": ["run", "--directory", "/path/to/saludai", "saludai-mcp"],
"env": {
"SALUDAI_FHIR_SERVER_URL": "http://localhost:8890/fhir"
}
}
}
}
REST API
uv run saludai serve
# POST http://localhost:8000/query {"query": "Pacientes con hipertensión en Córdoba"}
CLI
uv run saludai query "Medicaciones más frecuentes en pacientes mayores de 70"
Project Structure
saludai/
├── packages/
│ ├── saludai-core/ # FHIR client, terminology resolver, query builder, locale packs
│ ├── saludai-agent/ # Agent loop, planner, tools, LLM abstraction
│ ├── saludai-mcp/ # MCP server (Claude Desktop, Cursor, etc.)
│ └── saludai-api/ # FastAPI REST interface
├── benchmarks/ # 100-question eval framework + results
├── data/seed/ # Deterministic synthetic data generator (200 AR patients)
├── notebooks/ # Interactive Jupyter demos (3 notebooks)
└── docs/ # Architecture, ADRs, experiments, roadmap
Built for Latin America
SaludAI is open source, auditable, and self-hostable — built for Argentina's health system, with an architecture designed to scale across Latin America:
- Argentine terminology: SNOMED CT Argentine edition, CIE-10 (Argentine adaptation), LOINC, ATC — with fuzzy matching via rapidfuzz
- Locale packs: Country-specific bundles of terminology, system prompts, and FHIR metadata. Argentina ships built-in; add your country by implementing a locale pack
- openRSD-aware: Locale pack references Argentina's national FHIR profiles
- Synthetic data that looks real: 200 patients with Argentine names, DNI, 18 provinces weighted by population
- Spanish-first prompts: The agent reasons in the language of the data
from saludai_core.locales import load_locale_pack
pack = load_locale_pack("ar") # SNOMED CT AR + CIE-10 AR + LOINC + ATC
Observability
Every agent run is fully traced in Langfuse:
- Query plan generation (planner output)
- Each iteration: LLM call, tool selection, tool execution, result
- Token usage and cost per step
- Final answer with evaluation score
Set up Langfuse Cloud (free tier) or self-hosted:
export LANGFUSE_PUBLIC_KEY=pk-...
export LANGFUSE_SECRET_KEY=sk-...
export LANGFUSE_HOST=https://cloud.langfuse.com
Notebooks
| Notebook | Description |
|---|---|
| 01-getting-started | FHIR client, terminology resolver, query builder |
| 02-agent-queries | Natural language queries with the agent loop |
| 03-benchmark-eval | Run and analyze the FHIR-AgentBench evaluation |
Contributing
Contributions are welcome! See CONTRIBUTING.md for development setup, code style, and PR guidelines.
License
Apache 2.0 — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file saludai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: saludai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c62cd669ac828593653d50a8c0a6227c75731158f80bc3a0c5b4343361763c7
|
|
| MD5 |
4a185eb8894b6bf9526bccabe8496521
|
|
| BLAKE2b-256 |
27b45d17a078d47414b98b7ced1250cf14c96e5d74fe6e53b494eeee48518155
|
Provenance
The following attestation bundles were made for saludai-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on saludai-labs/saludai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
saludai-0.1.0-py3-none-any.whl -
Subject digest:
9c62cd669ac828593653d50a8c0a6227c75731158f80bc3a0c5b4343361763c7 - Sigstore transparency entry: 1201648561
- Sigstore integration time:
-
Permalink:
saludai-labs/saludai@eda559d1986dd792cfb1a3572aa543ca39a799c9 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/saludai-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@eda559d1986dd792cfb1a3572aa543ca39a799c9 -
Trigger Event:
release
-
Statement type: