CogniFlow ontology definitions and utilities
Project description
CogniFlow Ontology
cf_ontology is the semantics core for CogniFlow. It ingests JSON-LD into DuckDB-backed RDF quads, validates ontology/pipeline documents, and exposes a single Python API (OntologyManager) consumed by other packages (for example cf_web and pipeline tooling).
Semantics are authored as JSON-LD:
- ontology fragments
- step packages
- pipelines
OntologyManager loads packaged ontology resources, can ingest installed step packages, validates JSON-LD with SHACL, and exposes normalized DTOs + JSON-LD exports.
Interfaces
There are three ways to interact with cf_ontology.
1) Python functions/classes (direct API)
Use this when building apps/services:
OntologyManageringest_jsonld_filesrebuild_semantics_db
This is the primary programmatic interface and the one used by cf_web.
2) Package CLI (cf-ontology or python -m cf_ontology)
Use this for semantics operations:
- init/rebuild
- ingest
- export
- pipeline revision/state operations
- signature generation
This is implemented in src/cf_ontology/cli.py.
3) Unified CLI (cf ontology ...)
Use this for ontology inspection in the unified Cogniflow CLI:
- list classes
- inspect class
- list steps
- inspect step
This is implemented in src/cf_ontology/cf_cli.py and registered into cf_cli.
Interface differences
- Python API: best for integration in code; returns Python objects/DTOs.
- Package CLI (
cf-ontology): operational/automation tasks over semantics DB files. - Unified CLI (
cf ontology): user-facing inspection commands across packages.
Storage (Semantics DB)
The default storage is an RDF Quad Store backed by DuckDB files under workspace/semantics (repo-relative). The location is configurable via environment variables.
Default DB files:
cf-ontology.duckdb(static ontology + shapes)cf-steps.duckdb(step packages)cf-pipelines.duckdb(pipeline revisions + audit)cf-pipeline-states.duckdb(pipeline runtime state/events; not part of default exports yet)
Backend
cf_ontology is quads-only. JSON-LD is ingested into RDF quads stored in DuckDB (rdf_quads) with a graph catalog (graphs).
Paths
CF_SEMANTICS_DIR=/path/to/semantics(absolute or repo-relative; default:workspace/semantics)CF_WORKSPACE_DIR=/path/to/workspace(used only whenCF_SEMANTICS_DIRis unset)
If neither is set, workspace/semantics is used when a repo root can be detected; otherwise the fallback is ~/.cogniflow/workspace/semantics.
Step packages
CF_ENABLE_STEP_PACKAGES=0|1(default:0) to discover installed step packages via thecogniflow.stepsentry point group.
Graph IDs (binding conventions)
Step packages
- Split steps (
split_steps=True):g = "pkg:{package_name}@{package_version}#{step_id}" - Unsplit:
g = "pkg:{package_name}@{package_version}"
package_name is the StepPackage id (from cf:packageId in steps.jsonld).
Package upgrades generate new graph ids (the version is part of g). Existing graphs are not deleted; missing/uninstalled packages are marked inactive via graphs.is_active=false.
Pipelines (versioned)
- Stable pipeline identifier:
pipeline_id(UUID or user-provided stable id) - Revision graph:
g = "pipe:{pipeline_id}@{rev}"with monotonically increasingrevstarting at1 - "Current" is a pointer in the
pipelinestable (no separate quad graph)
What is included
- Packaged JSON-LD fragments:
ontology/core/*.jsonld,ontology/vocab/*.jsonld, andontology/shapes/*.jsonld. OntologyManagerfor loading/merging ontologies and querying step/pipeline metadata.- Dynamic step-package discovery via the
cogniflow.stepsentry point group.
Installation
From the sandcastle/cf_ontology directory:
pip install .
Published distribution name:
pip install cf-ontology
Optional (for unified cf CLI integration):
pip install -e ../cf_cli
Usage
Policy: all reads/writes to the semantics DuckDBs must go through the cf_ontology package
(Python API or CLI). Do not access cf-*.duckdb directly from other packages.
Python API
Basic loading and inspection:
from cf_ontology import OntologyManager
ontology = OntologyManager()
print(ontology.get_processing_steps())
print(ontology.get_graph_jsonld())
Load rich step DTOs (quads-backed bulk path):
from cf_ontology import OntologyManager
om = OntologyManager(load_resources=False)
steps = om.get_processing_steps_info_quads_bulk()
print(len(steps))
print(steps[0]["@id"])
Importing JSON-LD files into the Quad Store
Python utility:
from pathlib import Path
from cf_ontology import ingest_jsonld_files
ingest_jsonld_files(
[Path("sandcastle/cf_pipeline/cf_pipeline_engine/examples/opcua_fifo_avg_to_duckdb_parquet_triggered.jsonld")],
graph_type="pipeline",
package="examples",
)
Package CLI:
python -m cf_ontology ingest --paths sandcastle/cf_pipeline/cf_pipeline_engine/examples/opcua_fifo_avg_to_duckdb_parquet_triggered.jsonld --graph-type pipeline --package examples
Or via console script:
cf-ontology ingest --paths sandcastle/cf_pipeline/cf_pipeline_engine/examples/opcua_fifo_avg_to_duckdb_parquet_triggered.jsonld --graph-type pipeline --package examples
Unified CLI (inspection-focused):
cf ontology classes
cf ontology class cf:ProcessingStep --instances
cf ontology steps
cf ontology step cfbs:AverageStep
Ingesting installed step packages
This discovers cogniflow.steps entry points and ingests their JSON-LD with the graph-id conventions above:
python -m cf_ontology ingest --installed-steps
Re-ingest even if the stored content hash matches:
python -m cf_ontology ingest --installed-steps --force
Initialize semantics DB (CLI)
python -m cf_ontology init
This initializes all split DB files under the semantics directory.
To force a rebuild (destructive):
python -m cf_ontology init --rebuild
Fresh install smoke test
scripts/fresh_install.ps1 -Clean runs a Semantics QuadStore smoke test at the end via sandcastle/cf_ontology/scripts/semantics_smoketest.py and prints CI-friendly lines like:
[SEMANTICS] backend=quads
[SEMANTICS] db_path=...
[SEMANTICS] tables=graphs,rdf_quads,pipelines,pipeline_revisions OK
[SEMANTICS] committed pipeline_id=... revs=...,... graph_id=...
[SEMANTICS] export_flatten=current_only OK
[SEMANTICS] export_dataset=named_graphs OK
db_path above points to cf-pipelines.duckdb.
Pipeline versioning (append-only revisions)
Commit pipeline revisions (audit metadata is stored in pipeline_revisions):
from cf_ontology import OntologyManager
om = OntologyManager()
rev = om.commit_pipeline(
"my-pipeline-id",
{"@context": {"cf": "https://cogniflow.org/ns#"}, "@graph": []},
user="alice",
message="update",
)
print(rev)
print(om.list_pipeline_revisions("my-pipeline-id"))
print(om.get_pipeline_revision("my-pipeline-id")) # current
print(om.get_pipeline_revision("my-pipeline-id", rev=1))
Derive a pipeline id from JSON-LD (uses the ProcessingPipeline @id local name):
python -m cf_ontology pipeline id --jsonld path/to/pipeline.jsonld
OntologyManager.get_graph_jsonld() stays backwards compatible and includes only current pipeline revisions (not all historical revisions).
Dataset export (named graphs preserved)
from cf_ontology import OntologyManager
om = OntologyManager()
dataset_jsonld = om.get_graph_dataset_jsonld()
Rebuilding the semantics DB
from cf_ontology import rebuild_semantics_db
rebuild_semantics_db()
Package CLI command groups
python -m cf_ontology -h (or cf-ontology -h) provides:
initingestexportpipeline(commit/get/id)state(activate/set/run-loop)siggen
Unified CLI command groups
cf -h shows groups contributed by installed packages.
cf_ontology contributes:
ontology classesontology classontology stepsontology step
Step package entry point convention
Entry point group: cogniflow.steps
Supported entry point values:
- Resource path:
some_pkg_module:steps.jsonld(path relative to that package/module) - Loader function:
some_pkg_module:load_stepsreturning a JSON-LDdictor JSON string
Package metadata is read from the cf:StepPackage node in steps.jsonld (cf:packageId / cf:packageVersion), with distribution metadata as a fallback.
Quad Store schema (DuckDB)
The quad store uses these tables (per DB file):
graphs(catalog)graph_id TEXT PRIMARY KEYgraph_type TEXTpackage TEXTpath TEXTcontext JSONgraph_kind TEXT(package_step|package|pipeline_rev|system|custom)is_active BOOLEANcontent_hash TEXTpackage_name TEXT NULL,package_version TEXT NULLcreated_at TIMESTAMPupdated_at TIMESTAMP
rdf_quads(RDF terms, normalized)g TEXT(named graph id)s TEXT,s_kind TEXT(iri|bnode)p TEXT,p_kind TEXT(iri)o TEXT,o_kind TEXT(iri|bnode|literal)o_datatype TEXT NULL,o_lang TEXT NULLgraph_type TEXT,package TEXT,path TEXTupdated_at TIMESTAMP
pipelines(current pointer)pipeline_id TEXT PRIMARY KEYcurrent_rev INTEGERcreated_at TIMESTAMP,updated_at TIMESTAMPcreated_by TEXT NULL,updated_by TEXT NULL
pipeline_revisions(append-only audit log)PRIMARY KEY (pipeline_id, rev)graph_id TEXT(equalspipe:{pipeline_id}@{rev})created_at TIMESTAMP,created_by TEXT NULLmessage TEXT NULLcontent_hash TEXT NULLparent_rev INTEGER NULL
In the split layout, pipelines and pipeline_revisions live only in cf-pipelines.duckdb.
Pipeline states schema (DuckDB)
cf-pipeline-states.duckdb stores runtime state as relational tables. Each column has a
corresponding property or class definition in the ontology JSON-LD (core/classes + core/properties).
Tables:
pipeline_state(snapshot/control)pipeline_runs(run history)run_events(append-only event log)run_queue(execution queue + leases)
Runtime CLI
Canonical pipeline start flow (the only supported run-start contract):
- Activate pipeline state (creates snapshot row):
python -m cf_ontology state activate --pipeline-id opcua_fifo_avg --desired-state enabled
- Persist an inbound trigger event (idempotent by
dedupe_key):
python -m cf_ontology state emit-event \
--pipeline-id opcua_fifo_avg \
--event-type opcua_signal \
--source manual \
--dedupe-key demo-opcua-001
- Run worker loop to consume pending events and execute the run:
python -m cf_ontology state run-loop --pipeline-id opcua_fifo_avg --poll-interval 1.0 --once
Control desired state (pause/sleep/disable):
python -m cf_ontology state set --pipeline-id opcua_fifo_avg --desired-state sleep
state run-loop consumes pending run_events only and never auto-generates runs from
idle state ticks. If no pending events are available, the worker waits (or exits with --once).
Each consumed event creates linked runtime records in cf-pipeline-states.duckdb
(run_events, pipeline_runs, run_queue) before engine execution starts.
Notes
- DuckDB as default storage: split DB files keep the high-churn data (pipelines, states) isolated from the mostly static ontology and steps. Parquet export/materialization can be added later for bulk scans and interchange.
sandcastle/srcis legacy and not part of ongoing development; new work happens undersandcastle/*packages.
Pipeline steps header
Processing pipelines must declare the step catalogs they rely on via a steps header:
{
"@id": "ex:MyPipeline",
"@type": "cf:ProcessingPipeline",
"cf:hasStepsHeader": { "@id": "ex:stepsHeader" }
},
{
"@id": "ex:stepsHeader",
"@type": "cf:StepsHeader",
"cf:stepsPath": [
"path/to/steps.jsonld",
"path/to/other_steps.jsonld"
]
}
The runner uses these paths when --steps is not provided on the CLI.
Pipeline plugins header
Pipelines should also declare plugin directories so the runner can load implementations without --plugins:
{
"@id": "ex:MyPipeline",
"@type": "cf:ProcessingPipeline",
"cf:hasPluginsHeader": { "@id": "ex:pluginsHeader" }
},
{
"@id": "ex:pluginsHeader",
"@type": "cf:PluginsHeader",
"cf:pluginPath": [
"path/to/plugin/bin",
"path/to/other/plugin/bin"
]
}
Publishing
cf_ontology is published with the dedicated Windows workflow:
- Workflow:
.github/workflows/cf_ontology_windows_publish.yml - Package directory:
sandcastle/cf_ontology - PyPI tag:
cf-ontology-v<version> - TestPyPI tag:
cf-ontology-v<version>-test
Local preflight:
powershell -ExecutionPolicy Bypass -File scripts/mimic_windows_python_publish_workflow.ps1 `
-WorkflowFile .github/workflows/cf_ontology_windows_publish.yml `
-PackageDir sandcastle/cf_ontology `
-PythonExe py `
-PythonVersion 3.13
Queue a dry-run dispatch:
powershell -ExecutionPolicy Bypass -File scripts/queue_windows_python_publish_workflow.ps1 `
-WorkflowFile .github/workflows/cf_ontology_windows_publish.yml `
-PackageDir sandcastle/cf_ontology `
-PublishTarget testpypi `
-Ref main `
-RequireLocalPass `
-DryRun
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cf_ontology-0.1.4.tar.gz.
File metadata
- Download URL: cf_ontology-0.1.4.tar.gz
- Upload date:
- Size: 102.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36ea4db56c53b904e7325e11dcecb1ae48b6edf2955045892e17cdfffb61ce70
|
|
| MD5 |
036d29ab59e4d1ddd2f408d734c8e74f
|
|
| BLAKE2b-256 |
1d585196ea76527020b0501251976791d2b8f6d21d845fbaac604d4004ff192f
|
File details
Details for the file cf_ontology-0.1.4-py3-none-any.whl.
File metadata
- Download URL: cf_ontology-0.1.4-py3-none-any.whl
- Upload date:
- Size: 101.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5fa458d3e2fc1525693fc2288cdf4c3b890584216b331e6d1106f761ff09a79
|
|
| MD5 |
6a2fe637041970fbdb9c93c9fb93185b
|
|
| BLAKE2b-256 |
00deedf69389ba2093f326970370eaf09d678ea69a4cb9af0bc132e1df1884e2
|