Skip to main content

The most precise FHIR agent for Latin America

Project description

SaludAI

The first open-source FHIR reasoning agent with multi-LLM benchmarks and LATAM localization — every reasoning step auditable via Langfuse, designed for public health systems.

CI License: Apache 2.0 Benchmark: 84% Python 3.12+ Tests: 696 Coverage: 95%


Ask: "Pacientes con diabetes tipo 2 mayores de 60 en Buenos Aires"

Get: A structured, sourced answer — with every reasoning step traced in Langfuse.

SaludAI translates clinical questions in natural language into FHIR R4 API calls, resolves medical terminology (SNOMED CT, CIE-10, LOINC, ATC), navigates multi-resource references, and returns traceable answers. Built and tested against Argentine synthetic data on HAPI FHIR. Designed to extend to other FHIR-compliant systems via locale packs.

Multi-LLM Benchmark

Evaluated on 100 questions across 200 synthetic Argentine patients (3,182 FHIR resources, 10 resource types). Inspired by FHIR-AgentBench (Verily/KAIST/MIT).

Model Accuracy Simple (16) Medium (41) Complex (43) Errors* P50 Latency
Claude Sonnet 4.5 84.0% 94% 93% 72% 8 12.7s
Claude Haiku 4.5 77.0% 100% 80% 65% 7 6.6s
GPT-4o 63.0% 100% 73% 40% 3 14.4s
Llama 3.3 70B 48.0% 94% 63% 16% 9 6.5s
Qwen 3.5 9B 25.0% 50% 29% 12% 1 11.8s

*Errors = agent exceeded iteration budget (8 steps) and could not produce an answer. These count as incorrect in the accuracy score.

All models use the same agent loop, tools, and system prompt. Differences reflect reasoning ability, tool calling reliability, and schema handling. Questions cover terminology resolution, multi-hop reference traversal, server-side counting, aggregation, and temporal filtering across 10 FHIR resource types.

Benchmark scope: 100 internally-written questions evaluated on 200 synthetic patients with curated terminology codes. This benchmark tracks our development progress — it is not comparable to clinical benchmarks like FHIR-AgentBench (2,931 clinician-written questions on real de-identified data). We plan to evaluate against their public dataset. See experiment log for detailed methodology and per-question analysis.

Architecture

graph TB
    User["User (NL query)"]

    subgraph Agent["saludai-agent"]
        Planner["Query Planner<br/>(FHIR knowledge graph)"]
        Loop["Agent Loop<br/>(plan → execute → evaluate)"]
        Tools["Tools"]
    end

    subgraph ToolSet["Tool Registry"]
        T1["resolve_terminology<br/>SNOMED CT · CIE-10 · LOINC · ATC"]
        T2["search_fhir<br/>with auto-pagination"]
        T3["count_fhir<br/>server-side _summary=count"]
        T4["get_resource<br/>direct reference lookup"]
        T5["execute_code<br/>sandboxed Python"]
    end

    subgraph Core["saludai-core"]
        FHIR["FHIR Client (httpx)"]
        Term["Terminology Resolver<br/>(rapidfuzz)"]
        QB["Query Builder"]
        Locale["Locale Pack (AR)"]
    end

    HAPI["HAPI FHIR R4<br/>200 patients · 3,182 resources"]
    LLM["LLM Provider<br/>(Anthropic · OpenAI · Ollama)"]
    Langfuse["Langfuse<br/>(observability)"]

    User --> Loop
    Loop --> Planner
    Planner --> Loop
    Loop --> Tools
    Tools --> T1 & T2 & T3 & T4 & T5
    T1 --> Term
    T2 & T3 & T4 --> FHIR
    T5 --> Loop
    FHIR --> HAPI
    Loop --> LLM
    Loop --> Langfuse
    Term --> Locale
    QB --> Locale

Key design decisions:

  • No LangChain. The agent loop is ~300 lines of Python. Every step is auditable and traceable. We chose simplicity over framework magic — see ADR-002.
  • Hybrid Query Planner. A plan-and-execute pattern with a FHIR knowledge graph (resource relationships + query pattern catalog). The planner classifies the question and selects a strategy before the agent starts calling tools — see ADR-009.
  • Action Space Reduction. Instead of suggesting tools via prompt, we remove irrelevant tools from the LLM's context based on the query plan. The model can't misuse what it can't see.
  • Provider-agnostic. Same agent loop works with Claude, GPT-4o, Llama, or Qwen. Swap the model, keep everything else.

Quick Start

# Clone and install
git clone https://github.com/saludai-labs/saludai.git
cd saludai
uv sync

# Start HAPI FHIR with 200 synthetic Argentine patients
docker compose up -d

# Wait ~30s for seeding, then verify
curl http://localhost:8890/fhir/Patient?_summary=count

# Run the agent
uv run saludai query "¿Cuántos pacientes tienen diabetes tipo 2?"

# Run the benchmark
uv run python -m benchmarks.run_eval

# Run tests (696 tests, 95% coverage)
uv run pytest

Prerequisites: Python 3.12+, UV, Docker

Usage

MCP Server (Claude Desktop / Claude Code / Cursor)

SaludAI exposes its tools via the Model Context Protocol:

# Start MCP server (stdio transport)
uv run saludai-mcp

Add to your MCP client config (claude_desktop_config.json):

{
  "mcpServers": {
    "saludai": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/saludai", "saludai-mcp"],
      "env": {
        "SALUDAI_FHIR_SERVER_URL": "http://localhost:8890/fhir"
      }
    }
  }
}

REST API

uv run saludai serve
# POST http://localhost:8000/query {"query": "Pacientes con hipertensión en Córdoba"}

CLI

uv run saludai query "Medicaciones más frecuentes en pacientes mayores de 70"

Project Structure

saludai/
├── packages/
│   ├── saludai-core/       # FHIR client, terminology resolver, query builder, locale packs
│   ├── saludai-agent/      # Agent loop, planner, tools, LLM abstraction
│   ├── saludai-mcp/        # MCP server (Claude Desktop, Cursor, etc.)
│   └── saludai-api/        # FastAPI REST interface
├── benchmarks/             # 100-question eval framework + results
├── data/seed/              # Deterministic synthetic data generator (200 AR patients)
├── notebooks/              # Interactive Jupyter demos (3 notebooks)
└── docs/                   # Architecture, ADRs, experiments, roadmap

Built for Latin America

SaludAI is open source, auditable, and self-hostable — built for Argentina's health system, with an architecture designed to scale across Latin America:

  • Argentine terminology: SNOMED CT Argentine edition, CIE-10 (Argentine adaptation), LOINC, ATC — with fuzzy matching via rapidfuzz
  • Locale packs: Country-specific bundles of terminology, system prompts, and FHIR metadata. Argentina ships built-in; add your country by implementing a locale pack
  • openRSD-aware: Locale pack references Argentina's national FHIR profiles
  • Synthetic data that looks real: 200 patients with Argentine names, DNI, 18 provinces weighted by population
  • Spanish-first prompts: The agent reasons in the language of the data
from saludai_core.locales import load_locale_pack

pack = load_locale_pack("ar")  # SNOMED CT AR + CIE-10 AR + LOINC + ATC

Observability

Every agent run is fully traced in Langfuse:

  • Query plan generation (planner output)
  • Each iteration: LLM call, tool selection, tool execution, result
  • Token usage and cost per step
  • Final answer with evaluation score

Set up Langfuse Cloud (free tier) or self-hosted:

export LANGFUSE_PUBLIC_KEY=pk-...
export LANGFUSE_SECRET_KEY=sk-...
export LANGFUSE_HOST=https://cloud.langfuse.com

Notebooks

Notebook Description
01-getting-started FHIR client, terminology resolver, query builder
02-agent-queries Natural language queries with the agent loop
03-benchmark-eval Run and analyze the FHIR-AgentBench evaluation

Contributing

Contributions are welcome! See CONTRIBUTING.md for development setup, code style, and PR guidelines.

License

Apache 2.0 — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

saludai-0.1.0-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file saludai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: saludai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for saludai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9c62cd669ac828593653d50a8c0a6227c75731158f80bc3a0c5b4343361763c7
MD5 4a185eb8894b6bf9526bccabe8496521
BLAKE2b-256 27b45d17a078d47414b98b7ced1250cf14c96e5d74fe6e53b494eeee48518155

See more details on using hashes here.

Provenance

The following attestation bundles were made for saludai-0.1.0-py3-none-any.whl:

Publisher: publish.yml on saludai-labs/saludai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page