wittgenstein

Implementation of ruleset covering algorithms for explainable machine learning

These details have not been verified by PyPI

Project links

Homepage

Project description

# wittgenstein

_And is there not also the case where we play and - make up the rules as we go along?
- Ludwig Wittgenstein_

This module implements two iterative coverage-based ruleset algorithms: IREP and RIPPERk.

Performance is similar to sklearn's DecisionTree CART implementation (see [Performance Tests](https://github.com/imoscovitz/ruleset/blob/master/Performance%20Tests.ipynb)).

For algorithm details, see my medium post or the papers below in _Useful References_.

## Installation

To install, use
```bash
$ pip install wittgenstein
```

To uninstall, use
```bash
$ pip uninstall wittgenstein
```

## Usage

Usage syntax is similar to sklearn's. The current version, however, does require that data be passed in as a Pandas DataFrame.

Once you have loaded and split your data...
```python
>>> import pandas as pd
>>> df = pd.read_csv(dataset_filename)
>>> from sklearn.model_selection import train_test_split # or any other mechanism you want to use for data partitioning
>>> train, test = train_test_split(df, test_size=.33)
```
We can fit a ruleset classifier using RIPPER or IREP:
```
>>> import wittgenstein as lw
>>> ripper_clf = lw.RIPPER() # Or irep_clf = lw.IREP() to build a model using IREP
>>> ripper_clf.fit(train, class_feat='Party') # Or you can call .fit with params train_X, train_y. See docstrings for hyperparameter options.
>>> ripper_clf
<RIPPER object with fit ruleset (k=2, prune_size=0.33, dl_allowance=64)> # Hyperparameter details available in the docstrings and medium post
```

Access the underlying trained model with the ruleset_ attribute. A ruleset is a disjunction of conjunctions -- 'V' represents 'or'; '^' represents 'and'.
```
>>> ripper_clf.ruleset_
<Ruleset object: [physician-fee-freeze=n] V [synfuels-corporation-cutback=y^adoption-of-the-budget-resolution=y^anti-satellite-test-ban=n]>
```
To score our fit model:
```
>>> test_X = test.drop(class_feat, axis=1)
>>> test_y = test[class_feat]
>>> ripper_clf.score(test_X, test_y)
0.9985686906328078
```
Default scoring metric is accuracy. You can pass in alternate scoring functions, including those available through sklearn:
```
from sklearn.metrics import precision_score, recall_score
>>> precision = clf.score(X_test, y_test, precision_score)
>>> recall = clf.score(X_test, y_test, recall_score)
>>> print(f'precision: {precision} recall: {recall})
precision: 0.9914..., recall: 0.9953...
```
To perform predictions:
```
>>> ripper_clf.predict(new_data)[:5]
[True, True, False, True, False]
```
We can also ask our model to tell us why it made each positive prediction that it did:
```
>>> ripper_clf.predict(new_data)[:5]
([True, True, False, True, True]
[<Rule object: [physician-fee-freeze=n]>],
[<Rule object: [physician-fee-freeze=n]>,
<Rule object: [synfuels-corporation-cutback=y^adoption-of-the-budget-resolution=y^anti-satellite-test-ban=n]>], # This example met multiple sufficient conditions for a positive prediction
[],
[<Rule object: [physician-fee-freeze=n]>],
[])
```

## Useful references
- My medium post about the package (coming soon)
- [Furnkrantz-Widmer IREP paper](https://pdfs.semanticscholar.org/f67e/bb7b392f51076899f58c53bf57d5e71e36e9.pdf)
- [Cohen's RIPPER paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.2612&rep=rep1&type=pdf)
- [Partial decision trees](https://researchcommons.waikato.ac.nz/bitstream/handle/10289/1047/uow-cs-wp-1998-02.pdf?sequence=1&isAllowed=y)
- [C4.5 paper including all the gory details on MDL](https://pdfs.semanticscholar.org/cb94/e3d981a5e1901793c6bfedd93ce9cc07885d.pdf)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.3.4

Apr 3, 2023

0.3.3

Apr 3, 2023

0.3.2

Aug 9, 2021

0.3.0

Aug 9, 2021

0.2.3

May 21, 2020

0.2.2 yanked

May 21, 2020

0.2.1 yanked

May 19, 2020

0.2.0

May 5, 2020

0.1.6

Apr 18, 2019

0.1.5

Mar 7, 2019

0.1.4

Mar 5, 2019

0.1.3

Mar 5, 2019

This version

0.1.1

Feb 22, 2019

0.1.0

Feb 22, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wittgenstein-0.1.1.tar.gz (19.8 kB view hashes)

Uploaded Feb 22, 2019 Source

Built Distributions

wittgenstein-0.1.1-py3.6.egg (89.6 kB view hashes)

Uploaded Feb 22, 2019 Source

wittgenstein-0.1.1-py3-none-any.whl (41.7 kB view hashes)

Uploaded Feb 22, 2019 Python 3

Hashes for wittgenstein-0.1.1.tar.gz

Hashes for wittgenstein-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`08571897fea9958e6f695d330177a28db3b056025b104fa450c2c4a02752d3ff`
MD5	`00cfd6de665a6806885e28611c7b9f6d`
BLAKE2b-256	`67b8151a6e3da17cd81e050e25748a1823ae369ecc6af300cc269f6f950c1abf`

Hashes for wittgenstein-0.1.1-py3.6.egg

Hashes for wittgenstein-0.1.1-py3.6.egg
Algorithm	Hash digest
SHA256	`a26106974c3dc27a5876d5af43e0c8d1d0b5fcf22405371008863bc35aedd0b5`
MD5	`af4b72a876ace503e9440a7c102cbd17`
BLAKE2b-256	`703d54998f3a5e2c70d1dc56e55bd1bca08a3874ef00e74b3aec2028620f2318`

Hashes for wittgenstein-0.1.1-py3-none-any.whl

Hashes for wittgenstein-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7d38d20829ecdc24cd7de429c4bf6d5a71c4a784532e3d4e3bf32b8231493781`
MD5	`c244bb169d2cc4707780daa7f7c199db`
BLAKE2b-256	`994d605574f34dc898df686708a54d6671737e9058822bcb7204bffea7a148c2`