Skip to main content

CogniFlow ontology definitions and utilities

Project description

CogniFlow Ontology

cf_ontology is the semantics core for CogniFlow. It ingests JSON-LD into DuckDB-backed RDF quads, validates ontology/pipeline documents, and exposes a single Python API (OntologyManager) consumed by other packages (for example cf_web and pipeline tooling).

Semantics are authored as JSON-LD:

  • ontology fragments
  • step packages
  • pipelines

OntologyManager loads packaged ontology resources, can ingest installed step packages, validates JSON-LD with SHACL, and exposes normalized DTOs + JSON-LD exports.

Interfaces

There are three ways to interact with cf_ontology.

1) Python functions/classes (direct API)

Use this when building apps/services:

  • OntologyManager
  • ingest_jsonld_files
  • rebuild_semantics_db

This is the primary programmatic interface and the one used by cf_web.

2) Package CLI (cf-ontology or python -m cf_ontology)

Use this for semantics operations:

  • init/rebuild
  • ingest
  • export
  • pipeline revision/state operations
  • signature generation

This is implemented in src/cf_ontology/cli.py.

3) Unified CLI (cf ontology ...)

Use this for ontology inspection in the unified Cogniflow CLI:

  • list classes
  • inspect class
  • list steps
  • inspect step

This is implemented in src/cf_ontology/cf_cli.py and registered into cf_cli.

Interface differences

  • Python API: best for integration in code; returns Python objects/DTOs.
  • Package CLI (cf-ontology): operational/automation tasks over semantics DB files.
  • Unified CLI (cf ontology): user-facing inspection commands across packages.

Storage (Semantics DB)

The default storage is an RDF Quad Store backed by DuckDB files under workspace/semantics (repo-relative). The location is configurable via environment variables.

Default DB files:

  • cf-ontology.duckdb (static ontology + shapes)
  • cf-steps.duckdb (step packages)
  • cf-pipelines.duckdb (pipeline revisions + audit)
  • cf-pipeline-states.duckdb (pipeline runtime state/events; not part of default exports yet)

Backend

cf_ontology is quads-only. JSON-LD is ingested into RDF quads stored in DuckDB (rdf_quads) with a graph catalog (graphs).

Paths

  • CF_SEMANTICS_DIR=/path/to/semantics (absolute or repo-relative; default: workspace/semantics)
  • CF_WORKSPACE_DIR=/path/to/workspace (used only when CF_SEMANTICS_DIR is unset)

If neither is set, workspace/semantics is used when a repo root can be detected; otherwise the fallback is ~/.cogniflow/workspace/semantics.

Step packages

  • CF_ENABLE_STEP_PACKAGES=0|1 (default: 0) to discover installed step packages via the cogniflow.steps entry point group.

Graph IDs (binding conventions)

Step packages

  • Split steps (split_steps=True): g = "pkg:{package_name}@{package_version}#{step_id}"
  • Unsplit: g = "pkg:{package_name}@{package_version}"

package_name is the StepPackage id (from cf:packageId in steps.jsonld).

Package upgrades generate new graph ids (the version is part of g). Existing graphs are not deleted; missing/uninstalled packages are marked inactive via graphs.is_active=false.

Pipelines (versioned)

  • Stable pipeline identifier: pipeline_id (UUID or user-provided stable id)
  • Revision graph: g = "pipe:{pipeline_id}@{rev}" with monotonically increasing rev starting at 1
  • "Current" is a pointer in the pipelines table (no separate quad graph)

What is included

  • Packaged JSON-LD fragments: ontology/core/*.jsonld, ontology/vocab/*.jsonld, and ontology/shapes/*.jsonld.
  • OntologyManager for loading/merging ontologies and querying step/pipeline metadata.
  • Dynamic step-package discovery via the cogniflow.steps entry point group.

Installation

From the sandcastle/cf_ontology directory:

pip install .

Published distribution name:

pip install cf-ontology

Optional (for unified cf CLI integration):

pip install -e ../cf_cli

Usage

Policy: all reads/writes to the semantics DuckDBs must go through the cf_ontology package (Python API or CLI). Do not access cf-*.duckdb directly from other packages.

Python API

Basic loading and inspection:

from cf_ontology import OntologyManager

ontology = OntologyManager()
print(ontology.get_processing_steps())
print(ontology.get_graph_jsonld())

Load rich step DTOs (quads-backed bulk path):

from cf_ontology import OntologyManager

om = OntologyManager(load_resources=False)
steps = om.get_processing_steps_info_quads_bulk()
print(len(steps))
print(steps[0]["@id"])

Importing JSON-LD files into the Quad Store

Python utility:

from pathlib import Path
from cf_ontology import ingest_jsonld_files

ingest_jsonld_files(
    [Path("sandcastle/cf_pipeline/cf_pipeline_engine/examples/opcua_fifo_avg_to_duckdb_parquet_triggered.jsonld")],
    graph_type="pipeline",
    package="examples",
)

Package CLI:

python -m cf_ontology ingest --paths sandcastle/cf_pipeline/cf_pipeline_engine/examples/opcua_fifo_avg_to_duckdb_parquet_triggered.jsonld --graph-type pipeline --package examples

Or via console script:

cf-ontology ingest --paths sandcastle/cf_pipeline/cf_pipeline_engine/examples/opcua_fifo_avg_to_duckdb_parquet_triggered.jsonld --graph-type pipeline --package examples

Unified CLI (inspection-focused):

cf ontology classes
cf ontology class cf:ProcessingStep --instances
cf ontology steps
cf ontology step cfbs:AverageStep

Ingesting installed step packages

This discovers cogniflow.steps entry points and ingests their JSON-LD with the graph-id conventions above:

python -m cf_ontology ingest --installed-steps

Re-ingest even if the stored content hash matches:

python -m cf_ontology ingest --installed-steps --force

Initialize semantics DB (CLI)

python -m cf_ontology init

This initializes all split DB files under the semantics directory.

To force a rebuild (destructive):

python -m cf_ontology init --rebuild

Fresh install smoke test

scripts/fresh_install.ps1 -Clean runs a Semantics QuadStore smoke test at the end via sandcastle/cf_ontology/scripts/semantics_smoketest.py and prints CI-friendly lines like:

[SEMANTICS] backend=quads
[SEMANTICS] db_path=...
[SEMANTICS] tables=graphs,rdf_quads,pipelines,pipeline_revisions OK
[SEMANTICS] committed pipeline_id=... revs=...,... graph_id=...
[SEMANTICS] export_flatten=current_only OK
[SEMANTICS] export_dataset=named_graphs OK

db_path above points to cf-pipelines.duckdb.

Pipeline versioning (append-only revisions)

Commit pipeline revisions (audit metadata is stored in pipeline_revisions):

from cf_ontology import OntologyManager

om = OntologyManager()
rev = om.commit_pipeline(
    "my-pipeline-id",
    {"@context": {"cf": "https://cogniflow.org/ns#"}, "@graph": []},
    user="alice",
    message="update",
)
print(rev)
print(om.list_pipeline_revisions("my-pipeline-id"))
print(om.get_pipeline_revision("my-pipeline-id"))  # current
print(om.get_pipeline_revision("my-pipeline-id", rev=1))

Derive a pipeline id from JSON-LD (uses the ProcessingPipeline @id local name):

python -m cf_ontology pipeline id --jsonld path/to/pipeline.jsonld

OntologyManager.get_graph_jsonld() stays backwards compatible and includes only current pipeline revisions (not all historical revisions).

Dataset export (named graphs preserved)

from cf_ontology import OntologyManager

om = OntologyManager()
dataset_jsonld = om.get_graph_dataset_jsonld()

Rebuilding the semantics DB

from cf_ontology import rebuild_semantics_db

rebuild_semantics_db()

Package CLI command groups

python -m cf_ontology -h (or cf-ontology -h) provides:

  • init
  • ingest
  • export
  • pipeline (commit/get/id)
  • state (activate/set/run-loop)
  • siggen

Unified CLI command groups

cf -h shows groups contributed by installed packages.
cf_ontology contributes:

  • ontology classes
  • ontology class
  • ontology steps
  • ontology step

Step package entry point convention

Entry point group: cogniflow.steps

Supported entry point values:

  • Resource path: some_pkg_module:steps.jsonld (path relative to that package/module)
  • Loader function: some_pkg_module:load_steps returning a JSON-LD dict or JSON string

Package metadata is read from the cf:StepPackage node in steps.jsonld (cf:packageId / cf:packageVersion), with distribution metadata as a fallback.

Quad Store schema (DuckDB)

The quad store uses these tables (per DB file):

  • graphs (catalog)
    • graph_id TEXT PRIMARY KEY
    • graph_type TEXT
    • package TEXT
    • path TEXT
    • context JSON
    • graph_kind TEXT (package_step|package|pipeline_rev|system|custom)
    • is_active BOOLEAN
    • content_hash TEXT
    • package_name TEXT NULL, package_version TEXT NULL
    • created_at TIMESTAMP
    • updated_at TIMESTAMP
  • rdf_quads (RDF terms, normalized)
    • g TEXT (named graph id)
    • s TEXT, s_kind TEXT (iri|bnode)
    • p TEXT, p_kind TEXT (iri)
    • o TEXT, o_kind TEXT (iri|bnode|literal)
    • o_datatype TEXT NULL, o_lang TEXT NULL
    • graph_type TEXT, package TEXT, path TEXT
    • updated_at TIMESTAMP
  • pipelines (current pointer)
    • pipeline_id TEXT PRIMARY KEY
    • current_rev INTEGER
    • created_at TIMESTAMP, updated_at TIMESTAMP
    • created_by TEXT NULL, updated_by TEXT NULL
  • pipeline_revisions (append-only audit log)
    • PRIMARY KEY (pipeline_id, rev)
    • graph_id TEXT (equals pipe:{pipeline_id}@{rev})
    • created_at TIMESTAMP, created_by TEXT NULL
    • message TEXT NULL
    • content_hash TEXT NULL
    • parent_rev INTEGER NULL

In the split layout, pipelines and pipeline_revisions live only in cf-pipelines.duckdb.

Pipeline states schema (DuckDB)

cf-pipeline-states.duckdb stores runtime state as relational tables. Each column has a corresponding property or class definition in the ontology JSON-LD (core/classes + core/properties).

Tables:

  • pipeline_state (snapshot/control)
  • pipeline_runs (run history)
  • run_events (append-only event log)
  • run_queue (execution queue + leases)

Runtime CLI

Canonical pipeline start flow (the only supported run-start contract):

  1. Activate pipeline state (creates snapshot row):
python -m cf_ontology state activate --pipeline-id opcua_fifo_avg --desired-state enabled
  1. Persist an inbound trigger event (idempotent by dedupe_key):
python -m cf_ontology state emit-event \
  --pipeline-id opcua_fifo_avg \
  --event-type opcua_signal \
  --source manual \
  --dedupe-key demo-opcua-001
  1. Run worker loop to consume pending events and execute the run:
python -m cf_ontology state run-loop --pipeline-id opcua_fifo_avg --poll-interval 1.0 --once

Control desired state (pause/sleep/disable):

python -m cf_ontology state set --pipeline-id opcua_fifo_avg --desired-state sleep

state run-loop consumes pending run_events only and never auto-generates runs from idle state ticks. If no pending events are available, the worker waits (or exits with --once). Each consumed event creates linked runtime records in cf-pipeline-states.duckdb (run_events, pipeline_runs, run_queue) before engine execution starts.

Notes

  • DuckDB as default storage: split DB files keep the high-churn data (pipelines, states) isolated from the mostly static ontology and steps. Parquet export/materialization can be added later for bulk scans and interchange.
  • sandcastle/src is legacy and not part of ongoing development; new work happens under sandcastle/* packages.

Pipeline steps header

Processing pipelines must declare the step catalogs they rely on via a steps header:

{
  "@id": "ex:MyPipeline",
  "@type": "cf:ProcessingPipeline",
  "cf:hasStepsHeader": { "@id": "ex:stepsHeader" }
},
{
  "@id": "ex:stepsHeader",
  "@type": "cf:StepsHeader",
  "cf:stepsPath": [
    "path/to/steps.jsonld",
    "path/to/other_steps.jsonld"
  ]
}

The runner uses these paths when --steps is not provided on the CLI.

Pipeline plugins header

Pipelines should also declare plugin directories so the runner can load implementations without --plugins:

{
  "@id": "ex:MyPipeline",
  "@type": "cf:ProcessingPipeline",
  "cf:hasPluginsHeader": { "@id": "ex:pluginsHeader" }
},
{
  "@id": "ex:pluginsHeader",
  "@type": "cf:PluginsHeader",
  "cf:pluginPath": [
    "path/to/plugin/bin",
    "path/to/other/plugin/bin"
  ]
}

Publishing

cf_ontology is published with the dedicated Windows workflow:

  • Workflow: .github/workflows/cf_ontology_windows_publish.yml
  • Package directory: sandcastle/cf_ontology
  • PyPI tag: cf-ontology-v<version>
  • TestPyPI tag: cf-ontology-v<version>-test

Local preflight:

powershell -ExecutionPolicy Bypass -File scripts/mimic_windows_python_publish_workflow.ps1 `
  -WorkflowFile .github/workflows/cf_ontology_windows_publish.yml `
  -PackageDir sandcastle/cf_ontology `
  -PythonExe py `
  -PythonVersion 3.13

Queue a dry-run dispatch:

powershell -ExecutionPolicy Bypass -File scripts/queue_windows_python_publish_workflow.ps1 `
  -WorkflowFile .github/workflows/cf_ontology_windows_publish.yml `
  -PackageDir sandcastle/cf_ontology `
  -PublishTarget testpypi `
  -Ref main `
  -RequireLocalPass `
  -DryRun

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cf_ontology-0.1.4.tar.gz (102.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cf_ontology-0.1.4-py3-none-any.whl (101.2 kB view details)

Uploaded Python 3

File details

Details for the file cf_ontology-0.1.4.tar.gz.

File metadata

  • Download URL: cf_ontology-0.1.4.tar.gz
  • Upload date:
  • Size: 102.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for cf_ontology-0.1.4.tar.gz
Algorithm Hash digest
SHA256 36ea4db56c53b904e7325e11dcecb1ae48b6edf2955045892e17cdfffb61ce70
MD5 036d29ab59e4d1ddd2f408d734c8e74f
BLAKE2b-256 1d585196ea76527020b0501251976791d2b8f6d21d845fbaac604d4004ff192f

See more details on using hashes here.

File details

Details for the file cf_ontology-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: cf_ontology-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 101.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for cf_ontology-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c5fa458d3e2fc1525693fc2288cdf4c3b890584216b331e6d1106f761ff09a79
MD5 6a2fe637041970fbdb9c93c9fb93185b
BLAKE2b-256 00deedf69389ba2093f326970370eaf09d678ea69a4cb9af0bc132e1df1884e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page