Skip to main content

A package for assessing the quality and structure of ontologies.

Project description

OntoCheck

Query-Driven Ontology Assessment for Scientific Domain Applications

PyPI Documentation License: BSD-2


Overview

As scientific fields increasingly adopt FAIR data principles, ontologies have become essential for encoding the semantics of scientific investigations. Yet evaluating ontology quality remains a manual, technically demanding bottleneck. Current frameworks emphasize structural correctness but fail to assess practical utility against the real-world queries posed by domain scientists.

OntoCheck is an open-source Python tool that unifies domain-agnostic structural metrics with a novel, query-driven assessment methodology. By analyzing SPARQL queries derived from natural-language competency questions, OntoCheck compares the required query terms against an ontology's full vocabulary to yield complementary metrics for vocabulary coverage and utilization density. This empowers domain scientists and data engineers to make evidence-based decisions about ontology selection without requiring deep expertise in formal knowledge representation.

OntoCheck is actively developed and maintained by the SDLE Research Center at Case Western Reserve University.


Installation

pip install OntoCheck

Requirements: Python 3.8 or later.


Quick Start

Command-Line Interface

# Display available metrics and usage information
ontocheck -h

# Run specific metrics on an ontology file
ontocheck path/to/ontology.ttl --metrics altLabelCheck definitionCheck

# Run all available task-agnostic metrics
ontocheck path/to/ontology.ttl --metrics all

# Specify custom output file paths
ontocheck path/to/ontology.ttl --metrics all --log-file results.log --csv-file results.csv

Python API

from ontocheck import run_ontology_assessment

# Run selected metrics
run_ontology_assessment(
    ttl_file="path/to/ontology.ttl",
    metrics=["altLabelCheck", "definitionCheck", "isolatedElements"],
)

# Run all task-agnostic metrics
run_ontology_assessment(
    ttl_file="path/to/ontology.ttl",
    metrics="all",
)

Task-Based Assessment

from ontocheck import task_based_metric_v_0_0_1

result = task_based_metric_v_0_0_1(
    ttl_file="path/to/ontology.ttl",
    questions="competency_questions.json",
    domain_prefixes=["mds"],
    domain_ns_fragments=["cwrusdle.bitbucket.io/mds"],
)

print(f"Relevance: {result['relevance']:.2%}")
print(f"Accuracy:  {result['accuracy']:.2%}")

Available Metrics

OntoCheck provides 17 task-agnostic metrics organized into four categories, along with a task-based assessment methodology.

Labeling

Metric Function Description
checkLabel mainLabelCheck_v_0_0_1 Proportion of named classes carrying human-readable identifiers
altLabelCheck mainAltLabelCheck_v_0_0_1 Proportion of named classes carrying synonyms
definitionCheck mainDefCheck_v_0_0_1 Proportion of named classes carrying formal definitions

Structural

Metric Function Description
isolatedElements check_for_isolated_elements Identifies orphaned classes within the ontology
classConnections count_class_connected_components Identifies disconnected subgraphs
missingDomainRange get_properties_missing_domain_and_range Identifies undeclared domain and range restrictions
leafNodeCheck mainLeafNodeCheck_v_0_0_1 Identifies all leaf nodes in the ontology hierarchy
semanticConnection mainSemanticConnection_v_0_0_1 Verifies grounding in upper-level ontologies (e.g., CCO, BFO)

Accessibility

Metric Function Description
sparqlEndpoint check_sparql_accessibility_ttl Verifies reachability of the SPARQL endpoint
rdfDump check_rdf_dump_accessibility_ttl Verifies availability of the RDF data dump
humanLicense check_human_readable_license_ttl Verifies presence and fitness of licensing information
externalLinks check_external_data_provider_links_ttl Checks validity of external links within the ontology

Naming Convention

Metric Function Description
classCapitalCheck mainClassNameCapitalCheck_v_0_0_1 Flags departures from standard capitalization
classSpaceCheck mainClassNameSpaceCheck_v_0_0_1 Flags use of spaces in class identifiers
spellCheck spell_check_v_0_0_1 Spell checking on labels and definitions
duplicateLabels find_duplicate_labels_from_graph Identifies duplicate labels across entities
searchClass mainClassSearch_v_0_0_1 Identifies classes matching a user-specified string

Task-Based Assessment

The task-based methodology measures how well an ontology supports analytical queries by computing two complementary metrics from SPARQL competency questions:

  • Relevance = |T_a intersection T_o| / |T_a| -- the fraction of task-required terms that the ontology defines
  • Accuracy = |T_a intersection T_o| / |T_o| -- the fraction of ontology terms utilized by the task queries

where T_a is the set of domain terms extracted from the SPARQL queries and T_o is the set of domain terms defined in the ontology.


Documentation

Full documentation is available at ontocheck.readthedocs.io.


Authors

  • Rishabh Kundu
  • Redad Mehdi
  • Van D. Tran
  • Ethan Frakes
  • Abhishek Daundkar
  • Maliesha Sumudumalie
  • Vibha S. Mandayam
  • Jacob A. Lample
  • Mengjie Li
  • Laura S. Bruckman
  • Erika I. Barcelos
  • Alp Sehirlioglu
  • Roger H. French
  • Yinghui Wu

Affiliation

Materials Data Science for Stockpile Stewardship Center of Excellence (MDS3 COE), Case Western Reserve University, Cleveland, OH 44106, USA


Acknowledgments

  • U.S. Department of Energy's National Nuclear Security Administration -- Award Number DE-NA0004104 and Contract Number B647887
  • U.S. Department of Energy's Office of Energy Efficiency and Renewable Energy (EERE) under the Solar Energy Technologies Office (SETO) -- Agreement Numbers DE-EE0009353 and DE-EE0009347
  • U.S. National Science Foundation -- Award Number 2133576

How to Cite

If you use OntoCheck in your work, please cite:

Rishabh Kundu, Redad Mehdi, Van D. Tran, Ethan Frakes, Abhishek Daundkar, Maliesha Sumudumalie, Vibha S. Mandayam, Jacob A. Lample, Mengjie Li, Laura S. Bruckman, Erika I. Barcelos, Alp Sehirlioglu, Roger H. French, Yinghui Wu (2025). OntoCheck: Query-Driven Ontology Assessments for Scientific Domain Applications. [Python]. https://pypi.org/project/OntoCheck/


License

OntoCheck is released under the BSD-2-Clause License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ontocheck-0.0.5.0.tar.gz (38.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ontocheck-0.0.5.0-py3-none-any.whl (54.2 kB view details)

Uploaded Python 3

File details

Details for the file ontocheck-0.0.5.0.tar.gz.

File metadata

  • Download URL: ontocheck-0.0.5.0.tar.gz
  • Upload date:
  • Size: 38.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for ontocheck-0.0.5.0.tar.gz
Algorithm Hash digest
SHA256 f28e28e6cfa0386f40525310f591c6f61d54dc790b8b7a4e8148f1f27f8ea121
MD5 803936f5031f014348c8d1146e1315e4
BLAKE2b-256 8ccf10d3b9d7136f73050a226e17d25dc08b07017cd8e289bc65b5d3c0a19669

See more details on using hashes here.

File details

Details for the file ontocheck-0.0.5.0-py3-none-any.whl.

File metadata

  • Download URL: ontocheck-0.0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 54.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for ontocheck-0.0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 95ccf35bfafa77718bff3975e819a29007f81ccaca42b1ebed53c48e7330eaf0
MD5 a7d99f2b02ce7fc29bd6d7f65ab1c17a
BLAKE2b-256 fc50baf7ffc5f20cc17aeef79e26e83cb8a2ed2b9d0a728f5f2a79c0c4f663e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page