Skip to main content

Python tool to create or inspect a transparent and ethical AI.

Project description

TransparentAI

A transparent AI from A to Z!

Documentation PyPI version

This library is a toolbox so that you can create or inspect an AI on every step of the pipeline.

This is a new tool so if you found any bugs or other kind of problems please do not hesitate to report them on the issues GitHub page from the library here : https://github.com/Nathanlauga/transparentai/issues.

TransparentAI Pipeline

Documentation is available here : API Documentation.

Installation

You can install it with PyPI :

pip install transparentai

Or by cloning GitHub repository

git clone https://github.com/Nathanlauga/transparentai.git
cd transparentai
python setup.py install

Library tools

Supported objects:

submodule object description
datasets StructuredDataset Can handle Structured dataset (tabular)
models ClassificationModel Can handle classifier model with predict and predict_proba functions
models RegressionModel Can handle regression model with a predict function
fairness DatasetBiasMetric Can handle a dataset with a target column
fairness ModelBiasMetric Can handle a dataset and predictions (Classification and regression)
explainer ModelExplainer Can handle tree and linear model

How to use it

Take a look on the Getting started page of the documenation or you can search specific use cases in the notebooks/ directory.

Here is some example for the StructuredDataset, DatasetBiasMetric, ClassificationModel and ModelExplainer. But I take a look on the links above, there are a lot more to see!

StructuredDataset

Using the Adult dataset which is include in the library let's observe the data with some graphics.

from transparentai.datasets import StructuredDataset, load_adult
adult = load_adult()

Create the StructuredDataset object :

# target is not mandatory it just split data in the graphics for each target value
dataset = StructuredDataset(df=adult, target='income')

Then you can use differents plotting functions to have a better understanding of the dataset.

To start I recommend the following :

dataset.plot_dataset_overview() # Shows an overview of the data
dataset.plot_missing_values() # Plots missing values
dataset.plot_variables() # Plots each variable, one by one
dataset.plot_numeric_var_relation() # Plots each numeric var pair
dataset.plot_cat_and_num_variables() # Plots each numeric and categorical var pair
dataset.plot_correlations() # Plots correlations

But if you want to see a particular variable or variable combination you can use the following line of codes :

dataset.plot_one_categorical_variable(var='income')

dataset.plot_two_numeric_variables(var1='education-num', var2='hours-per-week', nrows=10000)

dataset.plot_one_cat_and_num_variables(var1='relationship', var2='age')

dataset.plot_one_cat_and_num_variables(var1='income', var2='age')

DatasetBiasMetric

Import DatasetBiasMetric class.

from transparentai.fairness import DatasetBiasMetric

Define privileged_groups

privileged_groups = {
    'marital-status': ['Married-civ-spouse','Married-AF-spouse'],
    'race': ['White'],
    'gender': ['Male']
}

Create the instance

dataset_bias = DatasetBiasMetric(dataset, privileged_groups, favorable_label='>50K')

Retrieve the bias metrics as a pandas DataFrame

dataset_bias.get_bias_metrics()
 		                Disparate impact 	Statistical parity difference
attr 	        index 		
age category 	>50K 	0.257312 	        -0.222479
marital-status 	>50K 	0.143299 	        -0.382106
race 	        >50K 	0.600592 	        -0.101445
gender 	        >50K 	0.359655 	        -0.194516

Plot one attribute bias.

dataset_bias.plot_bias(attr='gender')

ClassificationModel

from transparentai.models import ClassificationModel

You need a trained classifier to use the ClassificationModel class. Then with compute_scores() function you will be able to access score.

model = ClassificationModel(model=clf)
model.compute_scores(X=X_test, y=y_test, threshold=0.5)

Shows classification scores :

model.plot_scores()
Overall model performance
	    accuracy 	f1 	        precision 	recall 	    roc_auc
score 	0.864313 	0.860986 	0.859721 	0.864313 	{0: 0.9104387547348203}

ModelExplainer

This class is using Shap library to get the feature importance.

from transparentai.explainer import ModelExplainer
explainer = ModelExplainer(model=clf, X=X_test, model_type='tree')

Get the global feature importance :

# I just take 100 rows for the example
explainer.explain_global(X_test.sample(100))
{'age': 0.04400247162436626,
 'workclass': 0.012615442187332302,
 'fnlwgt': 0.011500706212146071,
 'education': 0.014303318875909592,
 'education-num': 0.06320364016403923,
 'marital-status': 0.04457869696787154,
 'occupation': 0.025353718692010623,
 'relationship': 0.06538595560703962,
 'race': 0.0030357403950878343,
 'gender': 0.008150837046393543,
 'capital-gain': 0.05191285416804516,
 'capital-loss': 0.004889414454684037,
 'hours-per-week': 0.03416860048567794,
 'native-country': 0.003552990714228435,
 'age category': 0.013148817808960036}

Global feature importance plot :

explainer.plot_global_explain(top=10)

The variable feature_names is a mapping dictionary so that categorical variables that are encoded as number (e.g. 'gender': Male is 1 and Female 0) can retrieve the original values.

one_row = X.iloc[42]
explainer.explain_local(one_row, feature_classes=feature_names)
{'age=36': 0.001512160581860371,
 'workclass=Private': -0.001553052083354487,
 'fnlwgt=465326': 0.014316324086275927,
 'education=HS-grad': -0.008492161121589561,
 'education-num=9': -0.06452835138642059,
 'marital-status=Married-civ-spouse': 0.028260101147975548,
 'occupation=Farming-fishing': -0.09721002961961403,
 'relationship=Husband': 0.04156683952625826,
 'race=White': -2.3502936087425042e-05,
 'gender=Male': 0.002139375823244336,
 'capital-gain=0': -0.044484324557015495,
 'capital-loss=0': -0.007543452374593471,
 'hours-per-week=40': -0.014963517277665232,
 'native-country=United-States': -0.0014164286240020375,
 'age category=Adult': 0.004620017927818481}

Plot local explanation :

explainer.plot_local_explain(one_row, top=10, feature_classes=feature_names)

Contributing

See the contributing file.

PRs accepted.

Credits and ressources

See the ressources file where I explain why I created this tool and mainly I quote my different inspirations and ressources.

Author

This work is led by Nathan Lauga, french Data Scientist.

License

This project use a MIT License.

Why ?

I believe that the code should be re-used for community projects and also inside private projects. AI transparency needs to be available for everyone even it's a private AI!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transparentai-0.1.3.tar.gz (679.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page