Semantic data model for LLM-consumable data catalog

These details have not been verified by PyPI

Project description

Nomox LLM Semantic Model

Internal python package for describing semantics model used across Nomox.

Installation

To use the packge from anywhere run:

pip install git+https://github.com/MiraZzle/nomox-semantics-package.git

For development:

pip install -e .

Architecture

The semantic model is organized into three layers:

Level 1: Source-Scoped Semantics

Produced by the Level 1 Indexer Agent. Contains:

DataSource: Top-level container for a data source (Trino catalog.schema)
Table: Tables and views with semantic roles and temporal information
Column: Columns with semantic types, profiling, and sample values
InternalRelationship: Foreign key relationships within a source

Level 2: Cross-Source Semantics

Produced by the Level 2 Aggregator Agent. Contains:

SemanticEntity: Canonical business concepts (Customer, Order, Product)
EntityManifestation: Where entities appear across sources
UnifiedAttribute: Logical attributes sourced from multiple places
EntityRelationship: Relationships between entities with join paths
IdentityResolution: How to match entities across sources

Shared Components

GlossaryTerm: Business terminology definitions
ConfidenceScore: Confidence scoring for all elements
ExpertOverride: Human corrections and enhancements
IndexingState: Tracking of indexing jobs and status

Quick Start

from semantic_model import (
    SemanticModel,
    DataSource,
    Table,
    Column,
    SemanticType,
    SemanticCategory,
    SourceType,
    create_empty_model,
    save_model,
    load_model,
)

# Create an empty model
model = create_empty_model(
    model_id="my-org-model",
    organization_id="my-org",
)

# Create a data source
source = DataSource(
    id="sales-db",
    name="Sales Database",
    trino_catalog="analytics",
    trino_schema="sales",
    fully_qualified_prefix="analytics.sales",
    source_type=SourceType.ANALYTICAL,
    description="Sales transaction data warehouse",
    domain="Sales",
)

# Create a table
orders_table = Table(
    id="orders",
    name="orders",
    fully_qualified_name="analytics.sales.orders",
    description="Fact table containing one row per order",
    columns=[
        Column(
            id="order_id",
            name="order_id",
            ordinal_position=0,
            data_type="VARCHAR",
            is_primary_key=True,
            semantic_type=SemanticType.identifier(subtype="uuid"),
            description="Unique order identifier",
        ),
        Column(
            id="customer_id",
            name="customer_id",
            ordinal_position=1,
            data_type="VARCHAR",
            is_foreign_key=True,
            semantic_type=SemanticType.identifier(subtype="uuid"),
            description="ID of the customer who placed the order",
        ),
        Column(
            id="total_amount",
            name="total_amount",
            ordinal_position=2,
            data_type="DECIMAL(12,2)",
            semantic_type=SemanticType(
                category=SemanticCategory.CURRENCY,
                confidence=0.95,
            ),
            unit="USD",
            description="Total order value including tax",
        ),
    ],
)

# Add table to source
source = source.add_table(orders_table)

# Add source to model
model = model.add_source(source)

# Save the model
save_model(model, "semantic_model.json")

# Load the model
loaded_model = load_model("semantic_model.json")

# Generate prompt context for LLM
prompt_context = model.to_prompt_format(
    include_sources=True,
    include_entities=True,
    include_glossary=True,
)
print(prompt_context)

Working with Confidence Scores

from semantic_model import ConfidenceScore, LowConfidenceItem, ConfidenceObjectType

# Create a confidence score
confidence = ConfidenceScore(
    overall=0.75,
    threshold=0.8,
    schema_understanding=0.9,
    semantic_typing=0.7,
    description_quality=0.65,
    low_confidence_items=[
        LowConfidenceItem(
            object_type=ConfidenceObjectType.COLUMN,
            object_id="status_code",
            object_name="status_code",
            score=0.4,
            reason="Unknown categorical values",
            suggested_clarification="What do status codes 'P', 'A', 'R' mean?",
        ),
    ],
)

# Check if meets threshold
if not confidence.meets_threshold:
    print("Source needs expert review")
    for item in confidence.low_confidence_items:
        print(f"  - {item.object_name}: {item.reason}")

Expert Overrides

from semantic_model import ExpertOverride, ReindexScope

# Create an override
override = ExpertOverride(
    id="override-001",
    created_by="domain-expert@company.com",
    field_path="description",
    original_value="Unknown table",
    override_value="Customer master data from CRM system",
    reason="Clarified based on CRM documentation",
    reindex_scope=ReindexScope.THIS_SOURCE,
)

# Apply to a table
table.expert_overrides.append(override)

Semantic Entities (Level 2)

from semantic_model import (
    SemanticEntity,
    EntityManifestation,
    ManifestationRole,
    UnifiedAttribute,
    EntityRelationship,
    JoinPath,
    JoinStep,
)

# Create a semantic entity
customer_entity = SemanticEntity(
    id="customer",
    name="Customer",
    description="A customer is any individual or organization with an account",
    canonical_id_name="customer_id",
    canonical_id_format="UUID",
    domain="Sales",
    manifestations=[
        EntityManifestation(
            source_id="crm-db",
            table_id="accounts",
            fully_qualified_name="crm.public.accounts",
            role=ManifestationRole.PRIMARY,
            key_column_id="account_id",
            usage_guidance="Use for real-time customer master data",
        ),
        EntityManifestation(
            source_id="analytics-db",
            table_id="customer_360",
            fully_qualified_name="analytics.customers.customer_360",
            role=ManifestationRole.DERIVED,
            key_column_id="customer_id",
            usage_guidance="Use for analytics with pre-computed metrics",
        ),
    ],
)

# Add to model
model = model.add_entity(customer_entity)

Serialization

from semantic_model import save_model, load_model
from semantic_model.serialization import ModelExporter, save_model_yaml

# Save as JSON
save_model(model, "model.json")

# Save as YAML (requires PyYAML)
save_model_yaml(model, "model.yaml")

# Export utilities
exporter = ModelExporter(model)

# Get prompt-ready context
context = exporter.to_prompt_context(max_tokens=4000)

# Get source summary
summary = exporter.to_source_summary()

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.1

Feb 26, 2026

0.2.0

Feb 23, 2026

This version

0.1.0

Feb 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nomox_semantic_model-0.1.0.tar.gz (30.4 kB view details)

Uploaded Feb 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nomox_semantic_model-0.1.0-py3-none-any.whl (34.8 kB view details)

Uploaded Feb 22, 2026 Python 3

File details

Details for the file nomox_semantic_model-0.1.0.tar.gz.

File metadata

Download URL: nomox_semantic_model-0.1.0.tar.gz
Upload date: Feb 22, 2026
Size: 30.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nomox_semantic_model-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`07d96139780c6cb8c2cdb1d78899d1c4373d1443cafbd9bf6c8ac700b14856b9`
MD5	`008ed2710d6ec486b803e56602275006`
BLAKE2b-256	`f893d751abe4bd996789fa7f76d7e68cdabf237c53fbc8a7b6251c2dc7fb355a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nomox_semantic_model-0.1.0.tar.gz:

Publisher: publish.yml on MiraZzle/nomox-semantics-package

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nomox_semantic_model-0.1.0.tar.gz
- Subject digest: 07d96139780c6cb8c2cdb1d78899d1c4373d1443cafbd9bf6c8ac700b14856b9
- Sigstore transparency entry: 976497298
- Sigstore integration time: Feb 22, 2026
Source repository:
- Permalink: MiraZzle/nomox-semantics-package@74b0d9e8855c195a5968c500f6488d977a7f75f1
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/MiraZzle
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@74b0d9e8855c195a5968c500f6488d977a7f75f1
- Trigger Event: release

File details

Details for the file nomox_semantic_model-0.1.0-py3-none-any.whl.

File metadata

Download URL: nomox_semantic_model-0.1.0-py3-none-any.whl
Upload date: Feb 22, 2026
Size: 34.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nomox_semantic_model-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bf66afb7b0a24fb4cbb7af3aef551591d1ab88fa4126a008e83922024ba9881b`
MD5	`7e8ebf28c4b89655834df5fa54dc1e20`
BLAKE2b-256	`301a607b588d4718ec15aa5e2c95ea1f4ef0915543b87e040d79ae90a4781e5b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nomox_semantic_model-0.1.0-py3-none-any.whl:

Publisher: publish.yml on MiraZzle/nomox-semantics-package

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nomox_semantic_model-0.1.0-py3-none-any.whl
- Subject digest: bf66afb7b0a24fb4cbb7af3aef551591d1ab88fa4126a008e83922024ba9881b
- Sigstore transparency entry: 976497299
- Sigstore integration time: Feb 22, 2026
Source repository:
- Permalink: MiraZzle/nomox-semantics-package@74b0d9e8855c195a5968c500f6488d977a7f75f1
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/MiraZzle
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@74b0d9e8855c195a5968c500f6488d977a7f75f1
- Trigger Event: release

nomox-semantic-model 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Nomox LLM Semantic Model

Installation

Architecture

Level 1: Source-Scoped Semantics

Level 2: Cross-Source Semantics

Shared Components

Quick Start

Working with Confidence Scores

Expert Overrides

Semantic Entities (Level 2)

Serialization

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance