Skip to main content

Ontosample is a package that offers different sampling techniques for OWL ontologies.

Project description

OntoSample

OntoSample is a python package that offers classic sampling techniques for OWL ontologies/knowledge bases. Furthermore, we have tailored the classic sampling techniques to the setting of concept learning making use of learning problem.

Paper: Accelerating Concept Learning via Sampling

Installation

pip install ontosample

or

# 1. clone 
git clone https://github.com/dice-group/Ontolearn.git 
# 2. setup virtual environment
python -m venv venv 
# 3. activate the virtual environment
source venv/bin/activate # for Unix and macOS
.\venv\Scripts\activate  # for Windows
# 4. install dependencies
pip install -r requirements.txt

Usage

from ontolearn_light.knowledge_base import KnowledgeBase
from ontosample.classic_samplers import RandomNodeSampler

# 1. Initialize KnowledgeBase object using the path of the ontology
kb = KnowledgeBase(path="KGs/Family/family-benchmark_rich_background.owl")

# 2. Initialize the sampler and generate the sample
sampler = RandomNodeSampler(kb)
sampled_kb = sampler.sample(30)  # will generate a sample with 30 nodes

# 3. Save the sampled ontology
sampler.save_sample(kb=sampled_kb, filename='sampled_kb')

Check the examples folder for more.

About the paper

Abstact

Node classification is an important task in many fields, e.g., predicting entity types in knowledge graphs, classifying papers in citation graphs, or classifying nodes in social networks. In many cases, it is crucial to explain why certain predictions are made. Towards this end, concept learning has been proposed as a means of interpretable node classification: given positive and negative examples in a knowledge base, concepts in description logics are learned that serve as classification models. However, state-of-the-art concept learners, including EvoLearner and CELOE exhibit long runtimes. In this paper, we propose to accelerate concept learning with graph sampling techniques. We experiment with seven techniques and tailor them to the setting of concept learning. In our experiments, we achieve a reduction in training size by over 90% while maintaining a high predictive performance.

Reproducing paper results

You will find in examples folder the script used to generate the results in paper. evaluation_table_generator.py generates every result for each dataset-sampler-sampling_size combination and store them in a csv.

To generate results of Table 2

Install the whole ontolearn package to use its learning algorithms like EvoLearner and CELOE because they are not included here to keep the number of dependencies low.

pip install ontolearn

The evaluation results for a certain sampling percentage can be simply reproduced by using examples/evaluation_table_generator.py.

There are the following arguments that the user can give:

  • learner → type of learner: 'evolerner' or 'celeo'.
  • datasets_and_lp → list containing the name of the json files that contains the path to the knowledge graph and the learning problem.
  • samplers → list of the abbreviation of the samplers as strings.
  • csv_path → path of the csv file to save the results.
  • sampling_size → the sampling percentage
  • iterations → number of iterations for each sampler

Table 2 results can be generated using the following instructions:

  1. Execute the script evaluation_table_generator.py using the default parameters.
  2. After the script has finished executing, set the argument --learner to celoe
  3. Set the csv path to another path by using the --csv_path argument.
  4. Execute again.

In the end you will have 2 csv files, one for each learner.

Note 1: Not all datasets are included in the project because some of them are too large. You can download all the SML-bench datasets here. They need to go to their respective folder named after them inside KGs directory.

Note 2: Keep in mind that this file needs a considerable amount of time to execute (more than 40 hours for each concept learner depending on the machine specifications) when using the default values which were also used to construct the results for the paper.

If you want quicker execution, you can enter a lower number of iterations.


To generate results of Figure 1

To generate results used in Figure 1 you need to follow the instructions below when writing the command to execute the script examples/evaluation_table_generator.py:

cd examples
python evaluation_table_generator.py --datasets_and_lp {"hepatitis_lp.json", "carcinogenesis_lp.json"} --samplers {"RNLPC", "RWJLPC", "RWJPLPC", "RELPC", "FFLPC"} --sampling_size 0.25

Repeat the command for sampling sizes of 0.20, 0.15, 0.10, 0.5

Note: Make sure to set a different csv path using the --csv_path argument each time you execute to avoid overriding the previous results.

Citing

@inproceedings{10.1145/3583780.3615158,
author = {Baci, Alkid and Heindorf, Stefan},
title = {Accelerating Concept Learning via Sampling},
year = {2023},
isbn = {9798400701245},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3583780.3615158},
doi = {10.1145/3583780.3615158},
abstract = {Node classification is an important task in many fields, e.g., predicting entity types in knowledge graphs, classifying papers in citation graphs, or classifying nodes in social networks. In many cases, it is crucial to explain why certain predictions are made. Towards this end, concept learning has been proposed as a means of interpretable node classification: given positive and negative examples in a knowledge base, concepts in description logics are learned that serve as classification models. However, state-of-the-art concept learners, including EvoLearner and CELOE exhibit long runtimes. In this paper, we propose to accelerate concept learning with graph sampling techniques. We experiment with seven techniques and tailor them to the setting of concept learning. In our experiments, we achieve a reduction in training size by over 90\% while maintaining a high predictive performance.},
booktitle = {Proceedings of the 32nd ACM International Conference on Information and Knowledge Management},
pages = {3733–3737},
numpages = {5},
keywords = {knowledge bases, concept learning, graph sampling},
location = {Birmingham, United Kingdom},
series = {CIKM '23}
}

In case of any question please feel free to open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ontosample-0.2.5.tar.gz (57.1 kB view details)

Uploaded Source

Built Distribution

ontosample-0.2.5-py3-none-any.whl (61.8 kB view details)

Uploaded Python 3

File details

Details for the file ontosample-0.2.5.tar.gz.

File metadata

  • Download URL: ontosample-0.2.5.tar.gz
  • Upload date:
  • Size: 57.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for ontosample-0.2.5.tar.gz
Algorithm Hash digest
SHA256 0e83ad78162fb2f1f251adf579e1d6c0d25a62a596e2195d8d95003ad6d01126
MD5 63beaffe88e0985a2a503d21d6e0dc30
BLAKE2b-256 e245275f51e3862ffc849c0caa4b0e3603bc62cc0e070bfad8270670323e261f

See more details on using hashes here.

File details

Details for the file ontosample-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: ontosample-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 61.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for ontosample-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 d553d0bf39140185581d1483a9b07c567e49793c63d5115bb565dbbba30aab04
MD5 3b04b9b08b13e10697b72c7302e98cd1
BLAKE2b-256 57227313b5e198655688c325a3a16a9cecaad2acc0685671075c315cbb8112bb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page