AI-powered clinical data mapping SDK

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

tharathip.k

These details have not been verified by PyPI

Project links

Project description

Portiere Logo

Portiere

AI-Powered Clinical and Health Data Mapping Tool

Documentation · Quick Start · Examples · Issues

What is Portiere?

Mapping clinical data to standard models like OMOP CDM, FHIR R4, HL7 v2, and OpenEHR is one of the most time-consuming and error-prone tasks in health informatics. It typically requires domain experts to manually map hundreds of source fields and thousands of clinical codes — a process that can take weeks or months.

Portiere automates this with an AI-powered 5-stage pipeline that handles schema mapping, concept mapping, ETL generation, and data quality validation — all running locally on your machine with no cloud dependency required.

flowchart LR
    A["Source Data"] --> B["Ingest & Profile (Stage 1)"]
    B --> C["Schema Mapping (Stage 2)"]
    C --> D["Concept Mapping (Stage 3)"]
    D --> E["ETL Generation (Stage 4)"]
    E --> F["Validation (Stage 5)"]

Portiere combines clinical-domain embeddings (SapBERT as default model), lexical search (BM25s), cross-encoder reranking, and optional LLM verification to achieve high-accuracy mappings with confidence routing — automatically accepting high-confidence results while flagging uncertain ones for human review.

Key Features

Multi-Standard Support — OMOP CDM v5.4 (19 entities, incl. core vocabulary tables), FHIR R4 (18 resources), HL7 v2.5.1, OpenEHR 1.0.4 (extensible via YAML). Full coverage details below.
Real Plausibility Validation — Kahn-style cross-table and ValueSet checks via a hybrid YAML DSL + Python rules. Plausibility guide.
Reproducibility Manifest — every pipeline run emits a versioned manifest.lock.json capturing model identity, vocab fingerprints, source-data hash, threshold snapshot, and per-stage entries. portiere replay reconstructs the project from a manifest. Reproducibility guide.
Bundled Quickstart — portiere quickstart runs the full OMOP pipeline against ~20 synthetic patients in <60 s, fully offline (no SapBERT download, no Athena required).
AI-Powered Mapping — SapBERT embeddings (default) + cross-encoder reranking + optional LLM verification.
9 Knowledge Backends — BM25s, FAISS, Elasticsearch, ChromaDB, PGVector, MongoDB, Qdrant, Milvus, Hybrid (RRF fusion).
BYO-LLM — Bring your own LLM: OpenAI, Anthropic Claude, AWS Bedrock, Ollama (local).
Pluggable Engines — Polars (default), PySpark / Databricks, Pandas, DuckDB.
Standalone ETL Artifacts — Generated ETL scripts run without the SDK.
Confidence Routing — Auto-accept, needs-review, and manual tiers with human-in-the-loop. Streamlit-based Mapping Review UI ships in v0.3.1: pip install "portiere-health[review]" then portiere review <project-dir>.
Cross-Standard Mapping — Transform between standards (OMOP ↔ FHIR, HL7v2 → FHIR, OMOP → OpenEHR).
Local-First — All processing runs on your machine; no cloud dependency.

Quickstart in 60 seconds

pip install portiere-health[polars,quality]
portiere quickstart

Runs the full pipeline (ingest → schema-map → concept-map → ETL → validate) against bundled synthetic data. Produces a real OMOP mapping plus a reproducibility manifest. No network calls, no extra setup.

Quick Start

Install

pip install portiere-health

# With a compute engine (pick one)
pip install "portiere-health[polars]"    # Lightweight (recommended)
pip install "portiere-health[spark]"     # Large-scale / Databricks
pip install "portiere-health[pandas]"    # Prototyping

Map Clinical Data to OMOP CDM

import portiere
from portiere.engines import PolarsEngine

# Initialize a project
project = portiere.init(
    name="Hospital OMOP Migration",
    engine=PolarsEngine(),
    target_model="omop_cdm_v5.4",
    vocabularies=["SNOMED", "LOINC", "RxNorm", "ICD10CM"],
)

# Add and profile a data source
source = project.add_source("patients.csv")
profile = project.profile(source)

# AI-powered schema mapping (source columns → OMOP tables)
schema_map = project.map_schema(source)

# AI-powered concept mapping (clinical codes → standard concepts)
concept_map = project.map_concepts(codes=["E11.9", "I10", "R73.03"])

# Review mappings
schema_map.summary()
concept_map.summary()

# Generate and run ETL
result = project.run_etl(source, schema_map, concept_map)

Cross-Standard Mapping (OMOP → FHIR)

project = portiere.init(
    name="FHIR Export",
    engine=PolarsEngine(),
    task="cross_map",
    source_standard="omop_cdm_v5.4",
    target_model="fhir_r4",
)

Installation

Core Package

pip install portiere-health

Optional Extras

Install only what you need:

Category	Extra	Command
Engines	Polars	`pip install "portiere-health[polars]"`
	PySpark	`pip install "portiere-health[spark]"`
	Pandas	`pip install "portiere-health[pandas]"`
LLM Providers	OpenAI	`pip install "portiere-health[openai]"`
	Anthropic	`pip install "portiere-health[anthropic]"`
	AWS Bedrock	`pip install "portiere-health[bedrock]"`
	Ollama	`pip install "portiere-health[ollama]"`
Knowledge Backends	FAISS	`pip install "portiere-health[faiss]"`
	Elasticsearch	`pip install "portiere-health[elasticsearch]"`
	ChromaDB	`pip install "portiere-health[chromadb]"`
	PGVector	`pip install "portiere-health[pgvector]"`
	MongoDB	`pip install "portiere-health[mongodb]"`
	Qdrant	`pip install "portiere-health[qdrant]"`
	Milvus	`pip install "portiere-health[milvus]"`
Quality	Great Expectations	`pip install "portiere-health[quality]"`
FHIR	US Core validation + Bundle/NDJSON export	`pip install "portiere-health[fhir]"`
Everything	All extras	`pip install "portiere-health[all]"`

Requirements: Python 3.10+

How It Works

Portiere implements a 5-stage AI pipeline for clinical data transformation:

Stage 1: Ingest & Profile

Connects to your data source (CSV, Parquet, databases) and extracts schema metadata — column names, types, cardinality, detected code columns, and PHI indicators.

Stage 2: Schema Mapping

Maps source columns to target standard entities using a fusion of:

Pattern matching — Regex patterns defined in YAML standard files
Embedding similarity — SapBERT clinical embeddings for semantic matching
Cross-encoder reranking — Precision reranking of top candidates

Stage 3: Concept Mapping

Maps clinical codes (ICD-10, CPT, local codes) to standard vocabularies (SNOMED CT, LOINC, RxNorm) through:

Direct code lookup — Exact match in knowledge base
Knowledge layer search — BM25s lexical / FAISS vector / Hybrid search
Cross-encoder reranking — Rerank top-k candidates for precision
LLM verification — Optional AI verification for medium-confidence mappings
Confidence routing — Auto-accept (>0.95), needs-review (0.70–0.95), manual (<0.70)

Stage 4: ETL Generation

Generates standalone ETL scripts (Spark, Polars, or Pandas) and lookup tables (CSV) that run without the Portiere SDK — no vendor lock-in.

Stage 5: Validation

Post-ETL data quality checks using Great Expectations, with standards-aware conformance for all supported models (OMOP, FHIR, HL7, OpenEHR, custom YAML):

Completeness — Non-null percentages for required fields
Conformance — Type and constraint compliance derived from YAML field metadata
Plausibility — Domain-specific clinical rules

Supported Standards

Standard	Version	Use Case
OMOP CDM	v5.4	Observational research, population health
FHIR R4	R4	Interoperability, health information exchange
HL7 v2	2.5.1	Legacy hospital system integration
OpenEHR	1.0.4	European clinical data, archetype-based EHRs

Standards are defined as YAML files and are fully extensible — you can define custom hospital CDMs or registry schemas.

Cross-Standard Mapping

Built-in crossmaps for transforming between standards:

Source	Target	File
FHIR R4	OMOP CDM	`fhir_r4_to_omop.yaml`
OMOP CDM	FHIR R4	`omop_to_fhir_r4.yaml`
HL7 v2	FHIR R4	`hl7v2_to_fhir_r4.yaml`
OMOP CDM	OpenEHR	`omop_to_openehr.yaml`
FHIR R4	OpenEHR	`fhir_r4_to_openehr.yaml`

Custom Standards

Portiere is not limited to built-in standards. You can define any clinical data model — a hospital CDM, a disease registry schema, a research database, a legacy warehouse — as a YAML file and use it identically to built-in standards.

Define a Custom Standard (YAML)

Create a .yaml file with the following structure:

name: "hospital_cdm_v1"
version: "1.0"
standard_type: "relational"
organization: "General Hospital Research"
description: "Internal clinical data model for General Hospital"

entities:
  patients:
    description: "Core patient demographics"
    fields:
      patient_id:
        type: integer
        required: true
        description: "Unique patient identifier"
        ddl: "INTEGER PRIMARY KEY"
      date_of_birth:
        type: date
        description: "Patient date of birth"
        ddl: "DATE NOT NULL"
      sex:
        type: string
        description: "Biological sex (M/F/U)"
        ddl: "VARCHAR(1)"

    # Fast pattern matching: source column name → target field
    source_patterns:
      patient_id: "patient_id"
      subject_id: "patient_id"
      dob: "date_of_birth"
      birth_date: "date_of_birth"
      gender: "sex"
      sex: "sex"

    # Embedding descriptions: optimized text for AI semantic matching
    # Write what a clinician would search for, not just the field name
    embedding_descriptions:
      patient_id: "unique patient identifier number"
      date_of_birth: "patient birth date birthday date of birth"
      sex: "biological sex gender male female M F"

  encounters:
    description: "Hospital visits and admissions"
    fields:
      encounter_id:
        type: integer
        required: true
        description: "Unique encounter identifier"
        ddl: "INTEGER PRIMARY KEY"
      admit_date:
        type: datetime
        description: "Admission date and time"
        ddl: "TIMESTAMP NOT NULL"
      encounter_type:
        type: string
        description: "Type of encounter (inpatient, outpatient, ED)"
        ddl: "VARCHAR(20)"

    source_patterns:
      encounter_id: "encounter_id"
      visit_id: "encounter_id"
      hadm_id: "encounter_id"
      admit_date: "admit_date"
      admittime: "admit_date"
      visit_type: "encounter_type"

    embedding_descriptions:
      encounter_id: "hospital encounter visit admission identifier"
      admit_date: "admission date time when patient was admitted"
      encounter_type: "visit type inpatient outpatient emergency department"

Use Your Custom Standard

import portiere
from portiere.engines import PolarsEngine

# Reference via "custom:" prefix — works anywhere target_model is accepted
project = portiere.init(
    name="Hospital Migration",
    engine=PolarsEngine(),
    target_model="custom:/path/to/hospital_cdm_v1.yaml",
)

source = project.add_source("patients.csv")
schema_map = project.map_schema(source)
concept_map = project.map_concepts(codes=["E11.9", "I10"])
result = project.run_etl(source, schema_map, concept_map)

Or load directly for inspection:

from portiere.standards import YAMLTargetModel

model = YAMLTargetModel("/path/to/hospital_cdm_v1.yaml")
print(model.get_schema())      # entity → [fields]
print(model.get_source_patterns())  # source column hints

You can also ship your custom standard as a built-in by placing the YAML in src/portiere/standards/ — it will then be loadable by name:

model = YAMLTargetModel.from_name("hospital_cdm_v1")

Column Naming Guide

Portiere's schema mapper uses two strategies in sequence: exact pattern matching (fast, zero-cost) then embedding similarity (AI-powered). Understanding both helps you get higher auto-accept rates.

Strategy 1 — Source Patterns (rule-based, highest priority)

Each entity in a standard YAML defines source_patterns — a dictionary mapping source column names to target fields. Matches here are always accepted, regardless of confidence score.

Built-in OMOP patterns include common aliases:

Your column name	Maps to
`patient_id`, `subject_id`, `mrn`	`person.person_id`
`dob`, `birth_date`, `date_of_birth`	`person.birth_datetime`
`gender`, `sex`	`person.gender_concept_id`
`icd_code`, `diagnosis_code`, `dx_code`	`condition_occurrence.condition_source_value`
`admit_date`, `admittime`	`visit_occurrence.visit_start_date`
`drug_code`, `ndc`, `medication_code`	`drug_exposure.drug_source_value`

To maximize pattern hits in your own standard, add all known aliases to source_patterns in your YAML:

source_patterns:
  patient_id: "person_id"     # exact name
  pid: "person_id"            # short alias
  subject_id: "person_id"     # research alias
  pt_id: "person_id"          # abbreviated
  medical_record_number: "person_id"  # verbose

Strategy 2 — Embedding Similarity (semantic, AI-powered)

When no pattern matches, the mapper encodes both the source column name and the embedding_descriptions into vectors using SapBERT, then finds the closest target field by cosine similarity.

What to write in embedding_descriptions:

Write natural-language phrases a clinician would use to describe what that column contains — not just a rephrasing of the field name.

# ❌ Too literal — just re-states the name
embedding_descriptions:
  admit_date: "admission date"
  dx_code: "diagnosis code"

# ✅ Rich synonyms and clinical context — maximizes semantic recall
embedding_descriptions:
  admit_date: "hospital admission date time when patient was admitted inpatient start"
  dx_code: "ICD diagnosis code ICD-10-CM ICD-9 disease condition clinical code"

Naming your source columns well also helps. The source column name itself is encoded alongside the description. Prefer descriptive names over cryptic abbreviations:

Less matchable	More matchable
`col_32`	`diagnosis_code`
`dt1`	`admission_date`
`flg_act`	`is_active`
`cd_race`	`race_code`
`proc_nm`	`procedure_name`

Confidence Tiers

After matching, every column receives a confidence score:

Score	Tier	Action
≥ 0.95	Auto-accepted	Written to output immediately
0.70 – 0.95	Needs review	Flagged for human inspection
< 0.70	Manual	Requires explicit override

Tune these thresholds to match your project's risk tolerance:

from portiere import PortiereConfig, ThresholdsConfig
from portiere.config import SchemaMappingThresholds

config = PortiereConfig(
    thresholds=ThresholdsConfig(
        schema_mapping=SchemaMappingThresholds(
            auto_accept=0.90,   # lower → more auto-accepts
            needs_review=0.60,  # lower → fewer manual items
        )
    )
)

Full Workflow with Review

schema_map = project.map_schema(source)

# Inspect what needs review
for item in schema_map.needs_review():
    print(f"{item.source_column} → {item.target_table}.{item.target_column} "
          f"(confidence={item.confidence:.2f})")
    for c in item.candidates[:3]:
        print(f"  candidate: {c['target_table']}.{c['target_column']} ({c['confidence']:.2f})")

# Approve, override, or reject
schema_map.approve("patient_name")
schema_map.override("pt_zip", target_table="location", target_column="zip")
schema_map.reject("internal_audit_flag")

# Approve all remaining items
schema_map.approve_all()
schema_map.finalize()

Knowledge Layer Backends

Backend	Type	Dependencies	Best For
BM25s	Lexical	None (built-in)	Quick start, no infra needed
FAISS	Vector	`faiss-cpu`, `sentence-transformers`	High-accuracy local search
Elasticsearch	Hybrid	`elasticsearch`	Production deployments
ChromaDB	Vector	`chromadb`	Lightweight vector store
PGVector	Vector	`psycopg`, `pgvector`	PostgreSQL environments
MongoDB	Vector	`pymongo`	Atlas Vector Search users
Qdrant	Vector	`qdrant-client`	Dedicated vector DB
Milvus	Vector	`pymilvus`	Large-scale vector search
Hybrid	Fusion	Varies	Combine backends with RRF

Hybrid Search Example

from portiere import PortiereConfig, KnowledgeLayerConfig

config = PortiereConfig(
    knowledge_layer=KnowledgeLayerConfig(
        backend="hybrid",
        hybrid_backends=["bm25s", "faiss"],
        hybrid_fusion="rrf",  # Reciprocal Rank Fusion
    )
)

LLM Providers

Portiere supports Bring-Your-Own-LLM for concept verification:

Provider	Extra	Model Examples
OpenAI	`portiere[openai]`	GPT-4o, GPT-4o-mini
Anthropic	`portiere[anthropic]`	Claude Sonnet, Claude Haiku
AWS Bedrock	`portiere[bedrock]`	Claude, Titan, Llama
Ollama	`portiere[ollama]`	Llama 3, Mistral, Gemma (local)

from portiere import PortiereConfig, LLMConfig

config = PortiereConfig(
    llm=LLMConfig(
        provider="openai",
        model="gpt-4o-mini",
        api_key="sk-...",
    )
)

Configuration

Portiere auto-discovers configuration from multiple sources (in priority order):

1. Python Objects

from portiere import PortiereConfig, EmbeddingConfig, KnowledgeLayerConfig

config = PortiereConfig(
    target_model="omop_cdm_v5.4",
    embedding=EmbeddingConfig(
        provider="huggingface",
        model="cambridgeltl/SapBERT-from-PubMedBERT-fulltext",
    ),
    knowledge_layer=KnowledgeLayerConfig(backend="bm25s"),
)

2. YAML File (`portiere.yaml`)

target_model: omop_cdm_v5.4
storage: local

embedding:
  provider: huggingface
  model: cambridgeltl/SapBERT-from-PubMedBERT-fulltext

knowledge_layer:
  backend: bm25s

llm:
  provider: openai
  model: gpt-4o-mini

thresholds:
  auto_accept: 0.95
  needs_review: 0.70

3. Environment Variables

export PORTIERE_TARGET_MODEL=omop_cdm_v5.4
export PORTIERE_LLM__PROVIDER=openai
export PORTIERE_LLM__API_KEY=sk-...
export PORTIERE_KNOWLEDGE_LAYER__BACKEND=faiss

Building the Knowledge Layer

Before concept mapping, build a searchable index from standard vocabularies (e.g., OHDSI Athena):

from portiere import build_knowledge_layer, PortiereConfig

config = PortiereConfig()

stats = build_knowledge_layer(
    vocabulary_dir="./data/athena/",
    config=config,
    vocabularies=["SNOMED", "LOINC", "RxNorm", "ICD10CM"],
)

print(f"Indexed {stats['total_concepts']:,} concepts")

Documentation

Resource	Description
Quick Start Guide	Get started in 5 minutes
API Reference	Full SDK API documentation
Configuration Guide	YAML, Python, and env var config
Knowledge Layer Guide	All 9 backends explained
LLM Integration	BYO-LLM setup
Pipeline Architecture	5-stage pipeline deep dive
Multi-Standard Support	Standards and custom schemas
Cross-Standard Mapping	OMOP ↔ FHIR, HL7v2 → FHIR
Example Notebooks	19 Jupyter notebooks with walkthroughs

Project Structure

portiere/
├── src/portiere/
│   ├── __init__.py          # Public API: init(), PortiereProject, configs
│   ├── config.py            # Configuration with auto-discovery
│   ├── project.py           # Unified project interface
│   ├── exceptions.py        # Error hierarchy
│   ├── stages/              # 5-stage pipeline implementation
│   ├── engines/             # Compute engines (Polars, Spark, Pandas, DuckDB)
│   ├── knowledge/           # Knowledge layer backends (9 backends)
│   ├── embedding/           # Embedding providers & gateway
│   ├── llm/                 # LLM providers & gateway
│   ├── local/               # Local AI components (schema mapper, concept mapper)
│   ├── artifacts/           # ETL code generation (Jinja2 templates)
│   ├── runner/              # ETL execution engine
│   ├── quality/             # Data quality validation (Great Expectations)
│   ├── standards/           # Clinical standard YAML definitions & crossmaps
│   ├── storage/             # Storage backends (local filesystem)
│   └── models/              # Pydantic data models
├── tests/                   # 36 test modules, 689 tests
├── docs/
│   ├── documentations/      # 22 guides and references
│   └── notebooks_examples/  # 19 Jupyter notebook examples
├── pyproject.toml           # Package configuration (hatchling)
└── LICENSE                  # Apache 2.0

Standards Coverage

v0.2.0 ships an opinionated subset of each standard — the required + commonly-used optional fields and entities, not every column from the spec. Long-tail fields are accepted contributions.

Standard	Version	Entities / Resources	Notes
OMOP CDM	v5.4	19 (of ~37)	14 clinical-data tables + 5 vocabulary tables (`concept`, `vocabulary`, `concept_relationship`, `domain`, `concept_class`) needed for FK validation
FHIR R4	R4 (4.0.1)	18 (of 145)	`Patient`, `Practitioner`, `Organization`, `Location`, `Encounter`, `Condition`, `Observation`, `Specimen`, `Immunization`, `ServiceRequest`, `MedicationRequest`, `MedicationAdministration`, `MedicationDispense`, `Procedure`, `AllergyIntolerance`, `DiagnosticReport`, `Bundle`, `DocumentReference`
HL7 v2	2.5.1	Selected message types	Legacy hospital integration
OpenEHR	1.0.4	Selected archetypes	European EHRs

For a programmatic listing: python -c "from portiere.standards import YAMLTargetModel; print(sorted(YAMLTargetModel.from_name('omop_cdm_v5.4').get_schema()))".

Limitations

Portiere is in active development. Current limitations (will be addressed in upcoming releases):

Standards coverage is partial. OMOP CDM v5.4: 19 of ~37 tables. FHIR R4: 18 of 145 resources. PRs to extend coverage are welcome.
FHIR profile coverage is US Core + mCODE only. v0.3.0 validates against US Core 6.1.0 (10 resource types); v0.3.1 adds mCODE STU3 2.0.0 (5 oncology profiles). IPS, mCODE-extended (treatments, additional staging) planned for v0.3.2.
Mapping Review UI covers schema mappings. v0.3.1 ships the Streamlit-based UI for schema-mapping review (approve / override / reject); concept-mapping review planned for v0.3.2.
No PHI scrubbing. PHI detection is column-name-pattern only — not a redactor. Free-text PHI scrubbing (Microsoft Presidio integration, HIPAA Safe Harbor) planned for v0.4.0.
SNOMED CT and CPT-4 not bundled. Both have licensing constraints. portiere quickstart operates on bundled ICD-10-CM/LOINC/RxNorm only; for SNOMED, see vocabulary setup.
Replay reproduces stages best-effort. portiere replay --auto-replay re-runs deterministic stages (ingest, validate) and records LLM-bound stages (schema, concept, ETL) as UNAVAILABLE. Full BYO-LLM rehydration planned for v0.3.x. Within-tolerance outputs may still differ ±1% due to LLM sampling. See reproducibility guide.
Benchmark coverage. v0.3.1 publishes the ICD-10-CM → SNOMED 4-row ablation (BM25 / FAISS / hybrid / USAGI baseline). LOINC / RxNorm pairs planned for v0.3.x.

Roadmap

v0.3.2 (next): Concept-mapping page in the Review UI (with bulk actions); mCODE-extended (treatments, surgical procedures, additional staging systems); IPS profile; strict ValueSet binding mode (--strict-bindings).
v0.3.x: Full BYO-LLM rehydration for replay --auto-replay LLM-bound stages; additional benchmark pairs (LOINC, RxNorm); active-learning loop on review-UI override decisions.
v0.4.0: PHI scrubber (Microsoft Presidio, HIPAA Safe Harbor); MCP / LangChain / dbt integration surface.
v0.5.0+: PCORnet / Sentinel / i2b2 / CDISC SDTM CDMs; clinical NLP path (scispaCy / GLiNER-clinical); OHDSI DataQualityDashboard parity.

Each release tracks via GitHub Milestones; please open issues or PRs against the relevant milestone. See specs/ for the design docs behind each release.

Benchmarks

ICD-10-CM → SNOMED concept mapping, n=1,000 held-out, Athena 2026-04-30, against the OHDSI CONCEPT_RELATIONSHIP gold standard. Three retrieval backends compared:

Backend	top-1	top-5	top-10	MRR
BM25 (sparse only)	0.288	0.528	0.588	0.390
SapBERT + FAISS (dense only)	0.278	0.473	0.551	0.361
SapBERT + BM25 + FAISS via RRF (hybrid)	0.251	0.473	0.558	0.343

Honest result: on this lexical-overlap-heavy task (ICD descriptions → SNOMED descriptions), BM25 wins. Dense retrieval and the hybrid RRF combiner under-perform here — semantic similarity does not help when the gold mapping shares vocabulary with the source. We publish all three rows so users can pick the right backend for their data; hybrid pays off on noisier free-text inputs (not measured in v0.3.0).

Reproduce any row: portiere benchmark athena-icd-snomed --backend bm25s|faiss|hybrid --athena-dir <path>. Stratified sampling (opt-in): add --stratify-by domain to sample proportionally across Athena domains. Methodology: docs/benchmarks/athena-icd-snomed.md.

FHIR Interoperability

v0.3.0 + v0.3.1 ship:

US Core 6.1.0 profile validation — Project.validate(fhir_profile="us-core-6.1.0") or portiere validate --fhir-profile us-core-6.1.0 --input resources.json. 10 resource types (Patient, Practitioner, Organization, Encounter, Condition, Observation, MedicationRequest, AllergyIntolerance, Procedure, DocumentReference).
mCODE STU3 2.0.0 profile validation (v0.3.1) — --fhir-profile mcode-2.0.0. 5 core oncology profiles (CancerPatient, PrimaryCancerCondition, CancerDiseaseStatus, CancerStage, TNMStageGroup). Profile is keyed on meta.profile URL substring (not resourceType).
Bundle + NDJSON export — portiere export --format bundle --out out.json or --format ndjson --out out_dir/. Optional --fhir-profile us-core-6.1.0|mcode-2.0.0 validates before writing.

Install the FHIR extra: pip install "portiere-health[fhir]". See: docs/fhir-profile-validation.md, docs/fhir-bundle-export.md.

Mapping Review UI (v0.3.1)

Streamlit-based human-in-the-loop review of AI-generated schema mappings:

pip install "portiere-health[review]"
portiere review <project-dir>        # opens http://127.0.0.1:8501

Reviewer actions: approve (accept AI suggestion), reject (mark unmappable), override (replace with table.column). Decisions persist to <project_dir>/schema_mappings/schema_mapping_reviewed.json next to the original — originals never modified. Local-only by default (no auth); --host 0.0.0.0 opt-in for LAN demos.

See: docs/mapping-review-ui.md. Concept-mapping review coming in v0.3.2.

Reproducibility (v0.3.1)

portiere replay --auto-replay manifest.lock.json

Re-runs each recorded stage and compares outputs to the manifest within tolerance bands. Stages whose dependencies are unavailable (LLM, knowledge layer) record as UNAVAILABLE and do not fail the report; full BYO-LLM rehydration is a v0.3.x roadmap item. See docs/reproducibility.md.

Star History

Contributing

We welcome contributions! Here's how to get started:

# Clone the repository
git clone https://github.com/Cuspal/portiere.git
cd portiere

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install in development mode
pip install -e ".[dev,docs,polars,quality]"

# Run tests
pytest

# Run linter
ruff check src/ tests/

# Run type checker
mypy src/portiere/

Please read our contributing guidelines before submitting a pull request.

License

Portiere is licensed under the Apache License 2.0.

Copyright 2026 Cuspal Co. Ltd.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Citation

If you use Portiere in your research, please cite:

@software{portiere2026,
  title     = {Portiere: AI-Powered Clinical Data Mapping SDK},
  author    = {Cuspal Co.,Ltd.},
  year      = {2026},
  url       = {https://github.com/Cuspal/portiere},
  license   = {Apache-2.0},
}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

tharathip.k

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.1

May 16, 2026

0.3.0

May 11, 2026

0.2.0

Apr 29, 2026

0.1.0

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

portiere_health-0.3.1.tar.gz (516.7 kB view details)

Uploaded May 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

portiere_health-0.3.1-py3-none-any.whl (598.3 kB view details)

Uploaded May 16, 2026 Python 3

File details

Details for the file portiere_health-0.3.1.tar.gz.

File metadata

Download URL: portiere_health-0.3.1.tar.gz
Upload date: May 16, 2026
Size: 516.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for portiere_health-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`e989d850e89566f3d2779f29c9c4e52ba992c5983f982fc933e17e3cd2243753`
MD5	`a82896ad99e7fedc584c031367a97270`
BLAKE2b-256	`c0c8809610a5e02d504d4228bfa49dba72fabbad22abfcb71b1d08f89e9e0e03`

See more details on using hashes here.

Provenance

The following attestation bundles were made for portiere_health-0.3.1.tar.gz:

Publisher: publish.yml on Cuspal/portiere

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: portiere_health-0.3.1.tar.gz
- Subject digest: e989d850e89566f3d2779f29c9c4e52ba992c5983f982fc933e17e3cd2243753
- Sigstore transparency entry: 1552405056
- Sigstore integration time: May 16, 2026
Source repository:
- Permalink: Cuspal/portiere@1e1b1f5872b1fc86b8053cc9ff6b70a8e99befa4
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/Cuspal
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1e1b1f5872b1fc86b8053cc9ff6b70a8e99befa4
- Trigger Event: release

File details

Details for the file portiere_health-0.3.1-py3-none-any.whl.

File metadata

Download URL: portiere_health-0.3.1-py3-none-any.whl
Upload date: May 16, 2026
Size: 598.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for portiere_health-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1e68085475ed33586600903e3112424af76938a5b7766886e7a456d4b090415d`
MD5	`fa9cd4474b359edf81a52f7200cf5d18`
BLAKE2b-256	`1e41441a47910f423f5c7ec9176d96a82bd0d1d77b5f5da7809b3f5172bdab5a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for portiere_health-0.3.1-py3-none-any.whl:

Publisher: publish.yml on Cuspal/portiere

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: portiere_health-0.3.1-py3-none-any.whl
- Subject digest: 1e68085475ed33586600903e3112424af76938a5b7766886e7a456d4b090415d
- Sigstore transparency entry: 1552405061
- Sigstore integration time: May 16, 2026
Source repository:
- Permalink: Cuspal/portiere@1e1b1f5872b1fc86b8053cc9ff6b70a8e99befa4
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/Cuspal
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1e1b1f5872b1fc86b8053cc9ff6b70a8e99befa4
- Trigger Event: release

portiere-health 0.3.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Portiere

What is Portiere?

Key Features

Quickstart in 60 seconds

Quick Start

Install

Map Clinical Data to OMOP CDM

Cross-Standard Mapping (OMOP → FHIR)

Installation

Core Package

Optional Extras

How It Works

Stage 1: Ingest & Profile

Stage 2: Schema Mapping

Stage 3: Concept Mapping

Stage 4: ETL Generation

Stage 5: Validation

Supported Standards

Cross-Standard Mapping

Custom Standards

Define a Custom Standard (YAML)

Use Your Custom Standard

Column Naming Guide

Strategy 1 — Source Patterns (rule-based, highest priority)

Strategy 2 — Embedding Similarity (semantic, AI-powered)

Confidence Tiers

Full Workflow with Review

Knowledge Layer Backends

Hybrid Search Example

LLM Providers

Configuration

1. Python Objects

2. YAML File (portiere.yaml)

3. Environment Variables

Building the Knowledge Layer

Documentation

Project Structure

Standards Coverage

Limitations

Roadmap

Benchmarks

FHIR Interoperability

Mapping Review UI (v0.3.1)

Reproducibility (v0.3.1)

Star History

Contributing

License

Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

2. YAML File (`portiere.yaml`)