CogniFlow ontology definitions and utilities

Project description

CogniFlow Ontology

cf_ontology is the semantics core for CogniFlow. It ingests JSON-LD into DuckDB-backed RDF quads, validates ontology/pipeline documents, and exposes a single Python API (OntologyManager) consumed by other packages (for example cf_web and pipeline tooling).

Semantics are authored as JSON-LD:

ontology fragments
step packages
pipelines

OntologyManager loads packaged ontology resources, can ingest installed step packages, validates JSON-LD with SHACL, and exposes normalized DTOs + JSON-LD exports.

Interfaces

There are three ways to interact with cf_ontology.

1) Python functions/classes (direct API)

Use this when building apps/services:

OntologyManager
ingest_jsonld_files
rebuild_semantics_db

This is the primary programmatic interface and the one used by cf_web.

2) Package CLI (`cf-ontology` or `python -m cf_ontology`)

Use this for semantics operations:

init/rebuild
ingest
export
pipeline revision/state operations
signature generation

This is implemented in src/cf_ontology/cli.py.

3) Unified CLI (`cf ontology ...`)

Use this for ontology inspection in the unified Cogniflow CLI:

list classes
inspect class
list steps
inspect step

This is implemented in src/cf_ontology/cf_cli.py and registered into cf_cli.

Interface differences

Python API: best for integration in code; returns Python objects/DTOs.
Package CLI (cf-ontology): operational/automation tasks over semantics DB files.
Unified CLI (cf ontology): user-facing inspection commands across packages.

Storage (Semantics DB)

The default storage is an RDF Quad Store backed by DuckDB files under workspace/semantics (repo-relative). The location is configurable via environment variables.

Default DB files:

cf-ontology.duckdb (static ontology + shapes)
cf-steps.duckdb (step packages)
cf-pipelines.duckdb (pipeline revisions + audit)
cf-pipeline-states.duckdb (pipeline runtime state/events; not part of default exports yet)

Backend

cf_ontology is quads-only. JSON-LD is ingested into RDF quads stored in DuckDB (rdf_quads) with a graph catalog (graphs).

Paths

CF_SEMANTICS_DIR=/path/to/semantics (absolute or repo-relative; default: workspace/semantics)
CF_WORKSPACE_DIR=/path/to/workspace (used only when CF_SEMANTICS_DIR is unset)

If neither is set, workspace/semantics is used when a repo root can be detected; otherwise the fallback is ~/.cogniflow/workspace/semantics.

Step packages

CF_ENABLE_STEP_PACKAGES=0|1 (default: 0) to discover installed step packages via the cogniflow.steps entry point group.

Graph IDs (binding conventions)

Step packages

Split steps (split_steps=True): g = "pkg:{package_name}@{package_version}#{step_id}"
Unsplit: g = "pkg:{package_name}@{package_version}"

package_name is the StepPackage id (from cf:packageId in steps.jsonld).

Package upgrades generate new graph ids (the version is part of g). Existing graphs are not deleted; missing/uninstalled packages are marked inactive via graphs.is_active=false.

Pipelines (versioned)

Stable pipeline identifier: pipeline_id (UUID or user-provided stable id)
Revision graph: g = "pipe:{pipeline_id}@{rev}" with monotonically increasing rev starting at 1
"Current" is a pointer in the pipelines table (no separate quad graph)

What is included

Packaged JSON-LD fragments: ontology/core/*.jsonld, ontology/vocab/*.jsonld, and ontology/shapes/*.jsonld.
OntologyManager for loading/merging ontologies and querying step/pipeline metadata.
Dynamic step-package discovery via the cogniflow.steps entry point group.

Installation

From the sandcastle/cf_ontology directory:

pip install .

Published distribution name:

pip install cf-ontology

Optional (for unified cf CLI integration):

pip install -e ../cf_cli

Usage

Policy: all reads/writes to the semantics DuckDBs must go through the cf_ontology package (Python API or CLI). Do not access cf-*.duckdb directly from other packages.

Python API

Basic loading and inspection:

from cf_ontology import OntologyManager

ontology = OntologyManager()
print(ontology.get_processing_steps())
print(ontology.get_graph_jsonld())

Load rich step DTOs (quads-backed bulk path):

from cf_ontology import OntologyManager

om = OntologyManager(load_resources=False)
steps = om.get_processing_steps_info_quads_bulk()
print(len(steps))
print(steps[0]["@id"])

Importing JSON-LD files into the Quad Store

Python utility:

from pathlib import Path
from cf_ontology import ingest_jsonld_files

ingest_jsonld_files(
    [Path("sandcastle/cf_pipeline/cf_pipeline_engine/examples/opcua_fifo_avg_to_duckdb_parquet_triggered.jsonld")],
    graph_type="pipeline",
    package="examples",
)

Package CLI:

python -m cf_ontology ingest --paths sandcastle/cf_pipeline/cf_pipeline_engine/examples/opcua_fifo_avg_to_duckdb_parquet_triggered.jsonld --graph-type pipeline --package examples

Or via console script:

cf-ontology ingest --paths sandcastle/cf_pipeline/cf_pipeline_engine/examples/opcua_fifo_avg_to_duckdb_parquet_triggered.jsonld --graph-type pipeline --package examples

Unified CLI (inspection-focused):

cf ontology classes
cf ontology class cf:ProcessingStep --instances
cf ontology steps
cf ontology step cfbs:AverageStep

Ingesting installed step packages

This discovers cogniflow.steps entry points and ingests their JSON-LD with the graph-id conventions above:

python -m cf_ontology ingest --installed-steps

Re-ingest even if the stored content hash matches:

python -m cf_ontology ingest --installed-steps --force

Initialize semantics DB (CLI)

python -m cf_ontology init

This initializes all split DB files under the semantics directory.

To force a rebuild (destructive):

python -m cf_ontology init --rebuild

Fresh install smoke test

scripts/fresh_install.ps1 -Clean runs a Semantics QuadStore smoke test at the end via sandcastle/cf_ontology/scripts/semantics_smoketest.py and prints CI-friendly lines like:

[SEMANTICS] backend=quads
[SEMANTICS] db_path=...
[SEMANTICS] tables=graphs,rdf_quads,pipelines,pipeline_revisions OK
[SEMANTICS] committed pipeline_id=... revs=...,... graph_id=...
[SEMANTICS] export_flatten=current_only OK
[SEMANTICS] export_dataset=named_graphs OK

db_path above points to cf-pipelines.duckdb.

Pipeline versioning (append-only revisions)

Commit pipeline revisions (audit metadata is stored in pipeline_revisions):

from cf_ontology import OntologyManager

om = OntologyManager()
rev = om.commit_pipeline(
    "my-pipeline-id",
    {"@context": {"cf": "https://cogniflow.org/ns#"}, "@graph": []},
    user="alice",
    message="update",
)
print(rev)
print(om.list_pipeline_revisions("my-pipeline-id"))
print(om.get_pipeline_revision("my-pipeline-id"))  # current
print(om.get_pipeline_revision("my-pipeline-id", rev=1))

Derive a pipeline id from JSON-LD (uses the ProcessingPipeline @id local name):

python -m cf_ontology pipeline id --jsonld path/to/pipeline.jsonld

OntologyManager.get_graph_jsonld() stays backwards compatible and includes only current pipeline revisions (not all historical revisions).

Dataset export (named graphs preserved)

from cf_ontology import OntologyManager

om = OntologyManager()
dataset_jsonld = om.get_graph_dataset_jsonld()

Rebuilding the semantics DB

from cf_ontology import rebuild_semantics_db

rebuild_semantics_db()

Package CLI command groups

python -m cf_ontology -h (or cf-ontology -h) provides:

init
ingest
export
pipeline (commit/get/id)
state (activate/set/run-loop)
siggen

Unified CLI command groups

cf -h shows groups contributed by installed packages.
cf_ontology contributes:

ontology classes
ontology class
ontology steps
ontology step

Step package entry point convention

Entry point group: cogniflow.steps

Supported entry point values:

Resource path: some_pkg_module:steps.jsonld (path relative to that package/module)
Loader function: some_pkg_module:load_steps returning a JSON-LD dict or JSON string

Package metadata is read from the cf:StepPackage node in steps.jsonld (cf:packageId / cf:packageVersion), with distribution metadata as a fallback.

Quad Store schema (DuckDB)

The quad store uses these tables (per DB file):

graphs (catalog)
- graph_id TEXT PRIMARY KEY
- graph_type TEXT
- package TEXT
- path TEXT
- context JSON
- graph_kind TEXT (package_step|package|pipeline_rev|system|custom)
- is_active BOOLEAN
- content_hash TEXT
- package_name TEXT NULL, package_version TEXT NULL
- created_at TIMESTAMP
- updated_at TIMESTAMP
rdf_quads (RDF terms, normalized)
- g TEXT (named graph id)
- s TEXT, s_kind TEXT (iri|bnode)
- p TEXT, p_kind TEXT (iri)
- o TEXT, o_kind TEXT (iri|bnode|literal)
- o_datatype TEXT NULL, o_lang TEXT NULL
- graph_type TEXT, package TEXT, path TEXT
- updated_at TIMESTAMP
pipelines (current pointer)
- pipeline_id TEXT PRIMARY KEY
- current_rev INTEGER
- created_at TIMESTAMP, updated_at TIMESTAMP
- created_by TEXT NULL, updated_by TEXT NULL
pipeline_revisions (append-only audit log)
- PRIMARY KEY (pipeline_id, rev)
- graph_id TEXT (equals pipe:{pipeline_id}@{rev})
- created_at TIMESTAMP, created_by TEXT NULL
- message TEXT NULL
- content_hash TEXT NULL
- parent_rev INTEGER NULL

In the split layout, pipelines and pipeline_revisions live only in cf-pipelines.duckdb.

Pipeline states schema (DuckDB)

cf-pipeline-states.duckdb stores runtime state as relational tables. Each column has a corresponding property or class definition in the ontology JSON-LD (core/classes + core/properties).

Tables:

pipeline_state (snapshot/control)
pipeline_runs (run history)
run_events (append-only event log)
run_queue (execution queue + leases)

Runtime CLI

Canonical pipeline start flow (the only supported run-start contract):

Activate pipeline state (creates snapshot row):

python -m cf_ontology state activate --pipeline-id opcua_fifo_avg --desired-state enabled

Persist an inbound trigger event (idempotent by dedupe_key):

python -m cf_ontology state emit-event \
  --pipeline-id opcua_fifo_avg \
  --event-type opcua_signal \
  --source manual \
  --dedupe-key demo-opcua-001

Run worker loop to consume pending events and execute the run:

python -m cf_ontology state run-loop --pipeline-id opcua_fifo_avg --poll-interval 1.0 --once

Control desired state (pause/sleep/disable):

python -m cf_ontology state set --pipeline-id opcua_fifo_avg --desired-state sleep

state run-loop consumes pending run_events only and never auto-generates runs from idle state ticks. If no pending events are available, the worker waits (or exits with --once). Each consumed event creates linked runtime records in cf-pipeline-states.duckdb (run_events, pipeline_runs, run_queue) before engine execution starts.

Notes

DuckDB as default storage: split DB files keep the high-churn data (pipelines, states) isolated from the mostly static ontology and steps. Parquet export/materialization can be added later for bulk scans and interchange.
sandcastle/src is legacy and not part of ongoing development; new work happens under sandcastle/* packages.

Pipeline steps header

Processing pipelines must declare the step catalogs they rely on via a steps header:

{
  "@id": "ex:MyPipeline",
  "@type": "cf:ProcessingPipeline",
  "cf:hasStepsHeader": { "@id": "ex:stepsHeader" }
},
{
  "@id": "ex:stepsHeader",
  "@type": "cf:StepsHeader",
  "cf:stepsPath": [
    "path/to/steps.jsonld",
    "path/to/other_steps.jsonld"
  ]
}

The runner uses these paths when --steps is not provided on the CLI.

Pipeline plugins header

Pipelines should also declare plugin directories so the runner can load implementations without --plugins:

{
  "@id": "ex:MyPipeline",
  "@type": "cf:ProcessingPipeline",
  "cf:hasPluginsHeader": { "@id": "ex:pluginsHeader" }
},
{
  "@id": "ex:pluginsHeader",
  "@type": "cf:PluginsHeader",
  "cf:pluginPath": [
    "path/to/plugin/bin",
    "path/to/other/plugin/bin"
  ]
}

Publishing

cf_ontology is published with the dedicated Windows workflow:

Workflow: .github/workflows/cf_ontology_windows_publish.yml
Package directory: sandcastle/cf_ontology
PyPI tag: cf-ontology-v<version>
TestPyPI tag: cf-ontology-v<version>-test

Local preflight:

powershell -ExecutionPolicy Bypass -File scripts/mimic_windows_python_publish_workflow.ps1 `
  -WorkflowFile .github/workflows/cf_ontology_windows_publish.yml `
  -PackageDir sandcastle/cf_ontology `
  -PythonExe py `
  -PythonVersion 3.13

Queue a dry-run dispatch:

powershell -ExecutionPolicy Bypass -File scripts/queue_windows_python_publish_workflow.ps1 `
  -WorkflowFile .github/workflows/cf_ontology_windows_publish.yml `
  -PackageDir sandcastle/cf_ontology `
  -PublishTarget testpypi `
  -Ref main `
  -RequireLocalPass `
  -DryRun

Project details

Release history Release notifications | RSS feed

0.2.5

Mar 20, 2026

0.2.4

Mar 20, 2026

0.2.2

Mar 17, 2026

0.2.1

Mar 15, 2026

0.2.0

Mar 14, 2026

This version

0.1.4

Mar 11, 2026

0.1.3

Mar 10, 2026

0.1.2

Mar 9, 2026

0.1.1

Mar 5, 2026

0.1.0

Mar 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cf_ontology-0.1.4.tar.gz (102.4 kB view details)

Uploaded Mar 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cf_ontology-0.1.4-py3-none-any.whl (101.2 kB view details)

Uploaded Mar 11, 2026 Python 3

File details

Details for the file cf_ontology-0.1.4.tar.gz.

File metadata

Download URL: cf_ontology-0.1.4.tar.gz
Upload date: Mar 11, 2026
Size: 102.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for cf_ontology-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`36ea4db56c53b904e7325e11dcecb1ae48b6edf2955045892e17cdfffb61ce70`
MD5	`036d29ab59e4d1ddd2f408d734c8e74f`
BLAKE2b-256	`1d585196ea76527020b0501251976791d2b8f6d21d845fbaac604d4004ff192f`

See more details on using hashes here.

File details

Details for the file cf_ontology-0.1.4-py3-none-any.whl.

File metadata

Download URL: cf_ontology-0.1.4-py3-none-any.whl
Upload date: Mar 11, 2026
Size: 101.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for cf_ontology-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c5fa458d3e2fc1525693fc2288cdf4c3b890584216b331e6d1106f761ff09a79`
MD5	`6a2fe637041970fbdb9c93c9fb93185b`
BLAKE2b-256	`00deedf69389ba2093f326970370eaf09d678ea69a4cb9af0bc132e1df1884e2`

See more details on using hashes here.

cf-ontology 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

CogniFlow Ontology

Interfaces

1) Python functions/classes (direct API)

2) Package CLI (cf-ontology or python -m cf_ontology)

3) Unified CLI (cf ontology ...)

Interface differences

Storage (Semantics DB)

Backend

Paths

Step packages

Graph IDs (binding conventions)

Step packages

Pipelines (versioned)

What is included

Installation

Usage

Python API

Importing JSON-LD files into the Quad Store

Ingesting installed step packages

Initialize semantics DB (CLI)

Fresh install smoke test

Pipeline versioning (append-only revisions)

Dataset export (named graphs preserved)

Rebuilding the semantics DB

Package CLI command groups

Unified CLI command groups

Step package entry point convention

Quad Store schema (DuckDB)

Pipeline states schema (DuckDB)

Runtime CLI

Notes

Pipeline steps header

Pipeline plugins header

Publishing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

2) Package CLI (`cf-ontology` or `python -m cf_ontology`)

3) Unified CLI (`cf ontology ...`)