No project description provided
Project description
FOLD-R-PP
This is a python implementation of FOLD-R++ algorithm which is built for binary classification tasks. FOLD-R++ algorithm learns default rules that is represented as an answer set program which is a logic program that include negation of predicates and follow the stable model semantics for interpretation. Default logic (with exceptions) closely model human thinking.
Installation
Only function library:
python3 -m pip install foldrpp
With the dataset examples:
git clone https://github.com/hwd404/FOLD-R-PP.git
Prerequisites
This FOLD-R++ implementation is developed with only python3. No library is needed.
Usage
Data preparation
-
FOLD-R++ takes tabular data as input, the first line of the input data file should be the feature names of each column.
id age bp ... ane label 1 48 80 ... no ckd 2 7 50 ... no ckd ... ... ... ... ... ... -
FOLD-R++ does not have to encode data like one-hot or integer encoding. It can deal with numerical, categorical, and even mixed-type features (one feature contains categorical and numerical values at same time) directly. The types of features should be identified before loading data:
- numerical features will be dealt as mixed-type features in this implementation. Group them together as numerical features, they will have in/equality and numerical comparison literals as candidates (=, !=, <=, >).
- categorical features will only have in/equality literals as candidates (only literals with = and != would be generated).
Many UCI dataset are included as examples in data directory. Their data preparation are listed in datasets.py. For example the UCI kidney dataset can be loaded with the following configuration:
str_attrs = ['al', 'su', 'rbc', 'pc', 'pcc', 'ba', 'htn', 'dm', 'cad', 'appet', 'pe', 'ane']
num_attrs = ['age', 'bp', 'sg', 'bgr', 'bu', 'sc', 'sod', 'pot', 'hemo', 'pcv', 'wbcc', 'rbcc']
label, pos_val = 'label', 'ckd'
model = Foldrpp(str_attrs, num_attrs, label, pos_val)
data = model.load_data('data/kidney/kidney.csv')
str_attrs lists categorical features, num_attrs lists numerical and mixed-type features, label is the name of classification label, pos_val indicates the positive value of the label, model is an initialized classifier object with the configuration of kidney dataset. Note: For binary classification tasks, the label value with more examples should be selected as the label's positive value.
Training
FOLD-R++ generates an explainable model that is represented by an answer set program for classification tasks. Here's a training example for kidney dataset:
data_train, data_test = split_data(data, ratio=0.8, rand=True)
model.fit(data_train, ratio=0.5)
Note that the hyperparameter ratio in fit function ranges between 0 and 1. The default value is 0.5. It represents the threshold ratio of false positive (exception) examples to true positive examples that a rule can imply.
The generated rules are stored in the model object, to print out the rules:
for r in model.asp():
print(r)
output:
label(X,'ckd') :- sc(X,N1), N1>1.2.
label(X,'ckd') :- sg(X,N2), N2=<1.015.
label(X,'ckd') :- hemo(X,N3), N3=<12.7.
label(X,'ckd') :- pcv(X,'?'), not al(X,'0').
label(X,'ckd') :- dm(X,'yes').
Testing
Given data_test, a list of test data samples, the predict function will predict the classification outcome for each of these data samples.
ys_test_hat = model.predict(data_test)
The original label of test samples:
ys_test = [x['label'] for x in data_test]
Accuracy, precision, recall and F1 score:
acc, p, r, f1 = get_scores(ys_test_hat, ys_test)
The code of the above examples can be found in main.py.
Save model and Load model
from foldrpp import save_model_to_file, load_model_from_file
save_model_to_file(model, 'model.txt')
saved_model = load_model_from_file('model.txt')
A trained model can be saved to a json file with save_model_to_file function. load_model_from_file function helps load model from the json file.
{
"str_attrs": [
"al",
"su",
"rbc",
"pc",
"pcc",
"ba",
"htn",
"dm",
"cad",
"appet",
"pe",
"ane"
],
"num_attrs": [
"age",
"bp",
"sg",
"bgr",
"bu",
"sc",
"sod",
"pot",
"hemo",
"pcv",
"wbcc",
"rbcc"
],
"flat_rules": [
{
"head": ["label", "==", "ckd"],
"main_items": [["sc", ">", 1.2]],
"ab_items": []
},
{
"head": ["label", "==", "ckd"],
"main_items": [["sg", "=<", 1.015]],
"ab_items": []
},
{
"head": ["label", "==", "ckd"],
"main_items": [["hemo", "=<", 12.7]],
"ab_items": []
},
{
"head": ["label", "==", "ckd"],
"main_items": [["pcv", "==", "?"], ["al", "!=", "0"]],
"ab_items": []
},
{
"head": ["label", "==", "ckd"],
"main_items": [["dm", "==", "yes"]],
"ab_items": []
}
],
"rule_head": [
"label",
"==",
"ckd"
],
"label": "label",
"pos_val": "ckd"
}
The generate rules in json file can also be fine-tuned as needed.
Explanation and Proof Tree
FOLD-R++ provides simple format proof for predictions with proof_rules function, the parameter get_all means whether or not to list all the answer sets (all rules that predict the test example as positive).
for r in saved_model.proof_rules(x, get_all=True):
print(r)
Here is an example for an instance from kidney dataset. The generated answer set program is :
label(X,'ckd') :- sc(X,N1), N1>1.2.
label(X,'ckd') :- sg(X,N2), N2=<1.015.
label(X,'ckd') :- hemo(X,N3), N3=<12.7.
label(X,'ckd') :- not al(X,'0'), not al(X,'?').
And the generated justification for an instance predicted as positive:
{'sc': 0.9, 'sg': 1.01, 'hemo': 12.4, 'al': '0'}
[T]label(X,'ckd') :- sg(X,N1), [T]N1=<1.015.
[T]label(X,'ckd') :- hemo(X,N1), [T]N1=<12.7.
There are 2 answers have been generated for the current instance, because get_all has been set as True when calling proof_rules function. Only 1 answer will be generated if get_all is False. In the generated answers, each literal has been tagged with a label. [T] means True, [F] means False, and [U] means unnecessary to evaluate.
FOLD-R++ also provide proof tree for predictions with proof_trees function.
for r in saved_model.proof_trees(x, get_all=False):
print(r)
And generate proof tree for the instance above:
{'sc': 1.0, 'sg': 1.025, 'hemo': 16.1, 'al': '0'}
label(X,'ckd') does not hold because
the value of sc is 1.0 which should be greater than 1.2 does not hold
label(X,'ckd') does not hold because
the value of sg is 1.025 which should be less equal to 1.015 does not hold
label(X,'ckd') does not hold because
the value of hemo is 16.1 which should be less equal to 12.7 does not hold
label(X,'ckd') does not hold because
the value of al is '0' which should not equal '0' does not hold and
the value of al is '0' which should not equal '?' does hold
For an instance predicted as negative, there's no answer set. Instead, the explanation has to list the predictions of all rules, and the parameter get_all will be ignored:
{'sc': 0.9, 'sg': '?', 'hemo': 15.0, 'al': '?'}
[F]label(X,'ckd') :- sc(X,N1), [F]N1>1.2.
[F]label(X,'ckd') :- sg(X,N1), [F]N1=<1.015.
[F]label(X,'ckd') :- hemo(X,N1), [F]N1=<12.7.
[F]label(X,'ckd') :- not [F]al(X,'0'), not [T]al(X,'?').
{'sc': 0.9, 'sg': '?', 'hemo': 15.0, 'al': '?'}
label(X,'ckd') does not hold because
the value of sc is 0.9 which should be greater than 1.2 does not hold
label(X,'ckd') does not hold because
the value of sg is '?' which should be less equal to 1.015 does not hold
label(X,'ckd') does not hold because
the value of hemo is 15.0 which should be less equal to 12.7 does not hold
label(X,'ckd') does not hold because
the value of al is '?' which should not equal '0' does hold and
the value of al is '?' which should not equal '?' does not hold
Citation
@inproceedings{foldrpp,
author = {Wang, Huaduo and Gupta, Gopal},
title = {{FOLD-R++}: A Scalable Toolset for Automated Inductive Learning of Default Theories from Mixed Data},
year = {2022},
isbn = {978-3-030-99460-0},
publisher = {Springer-Verlag},
address = {Berlin, Heidelberg},
url = {https://doi.org/10.1007/978-3-030-99461-7_13},
doi = {10.1007/978-3-030-99461-7_13},
booktitle = {Functional and Logic Programming: 16th International Symposium, FLOPS 2022},
pages = {224–242},
numpages = {19},
location = {Kyoto, Japan}
}
@article{foldrm,
title = {{FOLD-RM}: A Scalable, Efficient, and Explainable Inductive Learning Algorithm for Multi-Category Classification of Mixed Data},
DOI={10.1017/S1471068422000205},
journal = {Theory and Practice of Logic Programming},
publisher={Cambridge University Press},
author={Wang, Huaduo and Shakerin, Farhad and Gupta, Gopal},
year={2022},
pages={1--20}
}
@article{DBLP:journals/corr/abs-1804-11162,
author={Joaqu{\'{\i}}n Arias and Manuel Carro and Elmer Salazar and Kyle Marple and Gopal Gupta},
title={Constraint Answer Set Programming without Grounding},
journal={CoRR},
volume={abs/1804.11162},
year={2018},
url={http://arxiv.org/abs/1804.11162}
}
Acknowledgement
Authors gratefully acknowledge support from NSF grants IIS 1718945, IIS 1910131, IIP 1916206, and from Amazon Corp, Atos Corp and US DoD.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.