coreason-codex
Project description
coreason-codex
The Terminology Server for Bio-Pharma AI
coreason-codex acts as the "Universal Translator" for the platform. It bridges the "Semantic Precision Gap" in Bio-Pharma AI by enforcing the use of Standardized Vocabularies (OMOP CDM).
Executive Summary
While Large Language Models are fluent, they can be imprecise. coreason-codex ensures that when an Agent reads "Heart Attack", it records it as ConceptID: 312327 (Data), enabling precise retrieval, graph grounding, and regulatory reporting.
It provides tools for Agents to lookup, validate, and translate medical concepts using a "Frozen Lake" pattern for GxP compliance.
For detailed requirements, see the Product Requirements Document.
Getting Started
Prerequisites
- Python 3.12+
- Poetry
Installation
- Clone the repository:
git clone https://github.com/CoReason-AI/coreason_codex.git cd coreason_codex
- Install dependencies:
poetry install
Usage
Here is a quick example of how to use coreason-codex to normalize text to a standard concept.
from pathlib import Path
from coreason_codex.loader import CodexLoader
from coreason_codex.normalizer import CodexNormalizer
from coreason_codex.embedders import SapBertEmbedder
# 1. Initialize Loader with path to your Codex Pack
# Ensure you have a valid Codex Pack at this location
pack_path = Path("./codex_pack_v1")
loader = CodexLoader(pack_path)
duckdb_conn, lancedb_conn = loader.load_codex()
# 2. Initialize Embedder and Normalizer
embedder = SapBertEmbedder() # Uses cambridgeltl/SapBERT-from-PubMedBERT-fulltext
normalizer = CodexNormalizer(embedder, duckdb_conn, lancedb_conn)
# 3. Normalize Text
matches = normalizer.normalize("Heart Attack")
for match in matches:
print(f"Concept: {match.match_concept.concept_name} (ID: {match.match_concept.concept_id})")
print(f"Score: {match.similarity_score}")
Documentation
Detailed documentation is available in the docs/ directory:
- Architecture: Overview of the system design, including the Frozen Lake pattern and Zero-Copy architecture.
- Usage Guide: Detailed instructions on using the Loader, Normalizer, Hierarchy, and CrossWalker components.
- Vignettes: Walkthroughs of key user stories (Semantic Tagging, Lateral Logic, Audit Replay).
Development
- Run the linter:
poetry run pre-commit run --all-files
- Run the tests:
poetry run pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file coreason_codex-0.1.0.tar.gz.
File metadata
- Download URL: coreason_codex-0.1.0.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd7700b2c7488998d7dbb06fa0cb6d0144420148c726e5d947f3a43bb9f01b5f
|
|
| MD5 |
7bde27a8e3cbcb9d67a505abbc2c2fb5
|
|
| BLAKE2b-256 |
574070b77a38410d137aa6379e7335b2594cfd9f20e42523b30bc6b5ecefaed3
|
Provenance
The following attestation bundles were made for coreason_codex-0.1.0.tar.gz:
Publisher:
publish.yml on CoReason-AI/coreason-codex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
coreason_codex-0.1.0.tar.gz -
Subject digest:
bd7700b2c7488998d7dbb06fa0cb6d0144420148c726e5d947f3a43bb9f01b5f - Sigstore transparency entry: 833806201
- Sigstore integration time:
-
Permalink:
CoReason-AI/coreason-codex@416ed6532bca44ab2485e8d39495c649a0e7edf2 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/CoReason-AI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@416ed6532bca44ab2485e8d39495c649a0e7edf2 -
Trigger Event:
release
-
Statement type:
File details
Details for the file coreason_codex-0.1.0-py3-none-any.whl.
File metadata
- Download URL: coreason_codex-0.1.0-py3-none-any.whl
- Upload date:
- Size: 24.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e836837064d76d969fcbc7fee384d476410efdeccbaf1fc16f1d1d4d0c27f29a
|
|
| MD5 |
9af79c9cb3f299bb367765479dbb9ba2
|
|
| BLAKE2b-256 |
0b758d31dba5d0e0622e9defe4553c3adaef4d95cc0890e35888aa907564c7ea
|
Provenance
The following attestation bundles were made for coreason_codex-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on CoReason-AI/coreason-codex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
coreason_codex-0.1.0-py3-none-any.whl -
Subject digest:
e836837064d76d969fcbc7fee384d476410efdeccbaf1fc16f1d1d4d0c27f29a - Sigstore transparency entry: 833806203
- Sigstore integration time:
-
Permalink:
CoReason-AI/coreason-codex@416ed6532bca44ab2485e8d39495c649a0e7edf2 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/CoReason-AI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@416ed6532bca44ab2485e8d39495c649a0e7edf2 -
Trigger Event:
release
-
Statement type: