Semantic data model for LLM-consumable data catalog
Project description
Nomox LLM Semantic Model
Internal python package for describing semantics model used across Nomox.
Installation
To use the packge from anywhere run:
pip install git+https://github.com/MiraZzle/nomox-semantics-package.git
For development:
pip install -e .
Architecture
The semantic model is organized into three layers:
Level 1: Source-Scoped Semantics
Produced by the Level 1 Indexer Agent. Contains:
- DataSource: Top-level container for a data source (Trino catalog.schema)
- Table: Tables and views with semantic roles and temporal information
- Column: Columns with semantic types, profiling, and sample values
- InternalRelationship: Foreign key relationships within a source
Level 2: Cross-Source Semantics
Produced by the Level 2 Aggregator Agent. Contains:
- SemanticEntity: Canonical business concepts (Customer, Order, Product)
- EntityManifestation: Where entities appear across sources
- UnifiedAttribute: Logical attributes sourced from multiple places
- EntityRelationship: Relationships between entities with join paths
- IdentityResolution: How to match entities across sources
Shared Components
- GlossaryTerm: Business terminology definitions
- ConfidenceScore: Confidence scoring for all elements
- ExpertOverride: Human corrections and enhancements
- IndexingState: Tracking of indexing jobs and status
Quick Start
from semantic_model import (
SemanticModel,
DataSource,
Table,
Column,
SemanticType,
SemanticCategory,
SourceType,
create_empty_model,
save_model,
load_model,
)
# Create an empty model
model = create_empty_model(
model_id="my-org-model",
organization_id="my-org",
)
# Create a data source
source = DataSource(
id="sales-db",
name="Sales Database",
trino_catalog="analytics",
trino_schema="sales",
fully_qualified_prefix="analytics.sales",
source_type=SourceType.ANALYTICAL,
description="Sales transaction data warehouse",
domain="Sales",
)
# Create a table
orders_table = Table(
id="orders",
name="orders",
fully_qualified_name="analytics.sales.orders",
description="Fact table containing one row per order",
columns=[
Column(
id="order_id",
name="order_id",
ordinal_position=0,
data_type="VARCHAR",
is_primary_key=True,
semantic_type=SemanticType.identifier(subtype="uuid"),
description="Unique order identifier",
),
Column(
id="customer_id",
name="customer_id",
ordinal_position=1,
data_type="VARCHAR",
is_foreign_key=True,
semantic_type=SemanticType.identifier(subtype="uuid"),
description="ID of the customer who placed the order",
),
Column(
id="total_amount",
name="total_amount",
ordinal_position=2,
data_type="DECIMAL(12,2)",
semantic_type=SemanticType(
category=SemanticCategory.CURRENCY,
confidence=0.95,
),
unit="USD",
description="Total order value including tax",
),
],
)
# Add table to source
source = source.add_table(orders_table)
# Add source to model
model = model.add_source(source)
# Save the model
save_model(model, "semantic_model.json")
# Load the model
loaded_model = load_model("semantic_model.json")
# Generate prompt context for LLM
prompt_context = model.to_prompt_format(
include_sources=True,
include_entities=True,
include_glossary=True,
)
print(prompt_context)
Working with Confidence Scores
from semantic_model import ConfidenceScore, LowConfidenceItem, ConfidenceObjectType
# Create a confidence score
confidence = ConfidenceScore(
overall=0.75,
threshold=0.8,
schema_understanding=0.9,
semantic_typing=0.7,
description_quality=0.65,
low_confidence_items=[
LowConfidenceItem(
object_type=ConfidenceObjectType.COLUMN,
object_id="status_code",
object_name="status_code",
score=0.4,
reason="Unknown categorical values",
suggested_clarification="What do status codes 'P', 'A', 'R' mean?",
),
],
)
# Check if meets threshold
if not confidence.meets_threshold:
print("Source needs expert review")
for item in confidence.low_confidence_items:
print(f" - {item.object_name}: {item.reason}")
Expert Overrides
from semantic_model import ExpertOverride, ReindexScope
# Create an override
override = ExpertOverride(
id="override-001",
created_by="domain-expert@company.com",
field_path="description",
original_value="Unknown table",
override_value="Customer master data from CRM system",
reason="Clarified based on CRM documentation",
reindex_scope=ReindexScope.THIS_SOURCE,
)
# Apply to a table
table.expert_overrides.append(override)
Semantic Entities (Level 2)
from semantic_model import (
SemanticEntity,
EntityManifestation,
ManifestationRole,
UnifiedAttribute,
EntityRelationship,
JoinPath,
JoinStep,
)
# Create a semantic entity
customer_entity = SemanticEntity(
id="customer",
name="Customer",
description="A customer is any individual or organization with an account",
canonical_id_name="customer_id",
canonical_id_format="UUID",
domain="Sales",
manifestations=[
EntityManifestation(
source_id="crm-db",
table_id="accounts",
fully_qualified_name="crm.public.accounts",
role=ManifestationRole.PRIMARY,
key_column_id="account_id",
usage_guidance="Use for real-time customer master data",
),
EntityManifestation(
source_id="analytics-db",
table_id="customer_360",
fully_qualified_name="analytics.customers.customer_360",
role=ManifestationRole.DERIVED,
key_column_id="customer_id",
usage_guidance="Use for analytics with pre-computed metrics",
),
],
)
# Add to model
model = model.add_entity(customer_entity)
Serialization
from semantic_model import save_model, load_model
from semantic_model.serialization import ModelExporter, save_model_yaml
# Save as JSON
save_model(model, "model.json")
# Save as YAML (requires PyYAML)
save_model_yaml(model, "model.yaml")
# Export utilities
exporter = ModelExporter(model)
# Get prompt-ready context
context = exporter.to_prompt_context(max_tokens=4000)
# Get source summary
summary = exporter.to_source_summary()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nomox_semantic_model-0.1.0.tar.gz.
File metadata
- Download URL: nomox_semantic_model-0.1.0.tar.gz
- Upload date:
- Size: 30.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07d96139780c6cb8c2cdb1d78899d1c4373d1443cafbd9bf6c8ac700b14856b9
|
|
| MD5 |
008ed2710d6ec486b803e56602275006
|
|
| BLAKE2b-256 |
f893d751abe4bd996789fa7f76d7e68cdabf237c53fbc8a7b6251c2dc7fb355a
|
Provenance
The following attestation bundles were made for nomox_semantic_model-0.1.0.tar.gz:
Publisher:
publish.yml on MiraZzle/nomox-semantics-package
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nomox_semantic_model-0.1.0.tar.gz -
Subject digest:
07d96139780c6cb8c2cdb1d78899d1c4373d1443cafbd9bf6c8ac700b14856b9 - Sigstore transparency entry: 976497298
- Sigstore integration time:
-
Permalink:
MiraZzle/nomox-semantics-package@74b0d9e8855c195a5968c500f6488d977a7f75f1 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/MiraZzle
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@74b0d9e8855c195a5968c500f6488d977a7f75f1 -
Trigger Event:
release
-
Statement type:
File details
Details for the file nomox_semantic_model-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nomox_semantic_model-0.1.0-py3-none-any.whl
- Upload date:
- Size: 34.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf66afb7b0a24fb4cbb7af3aef551591d1ab88fa4126a008e83922024ba9881b
|
|
| MD5 |
7e8ebf28c4b89655834df5fa54dc1e20
|
|
| BLAKE2b-256 |
301a607b588d4718ec15aa5e2c95ea1f4ef0915543b87e040d79ae90a4781e5b
|
Provenance
The following attestation bundles were made for nomox_semantic_model-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on MiraZzle/nomox-semantics-package
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nomox_semantic_model-0.1.0-py3-none-any.whl -
Subject digest:
bf66afb7b0a24fb4cbb7af3aef551591d1ab88fa4126a008e83922024ba9881b - Sigstore transparency entry: 976497299
- Sigstore integration time:
-
Permalink:
MiraZzle/nomox-semantics-package@74b0d9e8855c195a5968c500f6488d977a7f75f1 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/MiraZzle
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@74b0d9e8855c195a5968c500f6488d977a7f75f1 -
Trigger Event:
release
-
Statement type: