A Python package for maximum information coefficient differential privacy
Project description
MIC_DP: Maximum Information Coefficient Differential Privacy
mic_dp is a Python package that enables differentially private data transformation guided by the Maximum Information Coefficient (MIC), with application to both supervised and unsupervised learning tasks. Traditional differential privacy (DP) mechanisms often degrade utility uniformly across features. In contrast, mic_dp uses MIC to scale the noise injection, preserving more utility in informative features.
Summary
This package includes functions for:
- Calculating MIC, Pearson, and Mahalanobis-based feature relevance
- Feature selection based on scaled importance
- Applying Gaussian or Laplace DP mechanisms using custom noise scaling
- Evaluating MAE, clustering scores, and plotting results
Our experiments show that MIC-guided DP mechanisms consistently outperform Pearson, Mahalanobis, and baseline DP in terms of feature and prediction accuracy under privacy constraints. In unsupervised settings, MIC-DP preserves cluster structures better, as shown by silhouette score, ARI, and V-measure.
Installation
You can install the package directly from PyPI:
pip install micdp
Or install from source:
git clone https://github.com/merlery/mic_dp.git
cd mic_dp
pip install -e .
Quick Start
Here's a simple example of how to use mic_dp for supervised learning:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from core import (
noise_scaling_MIC,
calculate_sensitivity,
correlated_dp_gaussian,
mean_absolute_error
)
# Load and preprocess your data
df = pd.read_csv('your_dataset.csv')
df.dropna(inplace=True)
X = df.select_dtypes(include=['number'])
X_norm = pd.DataFrame(MinMaxScaler().fit_transform(X), columns=X.columns)
y = df['target_column'] # Your target variable
# Calculate MIC-based noise scaling factors
noise_factors = noise_scaling_MIC(y, X_norm, amplification_factor=5)
# Calculate sensitivity for each feature
sensitivity = calculate_sensitivity(X_norm)
# Apply differential privacy with MIC-guided noise scaling
private_X = correlated_dp_gaussian(
X_norm.copy(),
noise_factors,
sensitivity,
epsilon=0.5, # Privacy budget
delta=1e-5 # Privacy relaxation parameter
)
# Evaluate the utility loss
mae = mean_absolute_error(X_norm, private_X)
print(f"Mean Absolute Error: {mae:.4f}")
Detailed Example
For a more comprehensive example, see the supervised_experiment.py script, which demonstrates:
- Loading and preprocessing the Adult Census Income dataset
- Calculating feature relevance using MIC, Pearson, and Mahalanobis methods
- Applying differential privacy with different noise scaling strategies
- Evaluating and comparing the utility of each approach
- Visualizing the results
To run the example:
python examples/supervised_experiment.py
Experimental Results
MIC-guided noise scaling consistently outperforms conventional approaches in preserving prediction accuracy and clustering structure under differential privacy constraints.
API Reference
Core Functions
noise_scaling_MIC(target, features, factor): Calculate noise scaling factors based on Maximum Information Coefficientnoise_scaling_pearson(target, features, factor): Calculate noise scaling factors based on Pearson correlationnoise_scaling_mahalanobis_distances(target, features, factor): Calculate noise scaling factors based on Mahalanobis distancescalculate_sensitivity(features): Calculate sensitivity for each feature based on its rangecorrelated_dp_gaussian(X, noise_factors, sensitivity, epsilon, delta): Apply Gaussian differential privacy with custom noise scalingcorrelated_dp_laplace(X, noise_factors, sensitivity, epsilon, delta): Apply Laplace differential privacy with custom noise scalingfeature_selection(percentage, X, noise_scaling_factor): Select features based on their noise scaling factorsmean_absolute_error(y_true, y_pred): Calculate mean absolute error between true and predicted valuescluster_and_evaluate(df, name, n_clusters): Perform clustering and evaluate the resultscalculate_ari(labels1, labels2): Calculate Adjusted Rand Index between two cluster labelingscalculate_v_measure(labels1, labels2): Calculate V-measure between two cluster labelings
Citation
If you use this package in your research, please cite:
@article{yang2025micdp,
title={mic\_dp: A Python package for maximum information coefficient differential privacy},
author={Yang, Wenjun and Al-masri, Eyhab and Kotevska, Olivera},
journal={Journal of Open Source Software},
year={2025}
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgements
We acknowledge the creators of the ACI and HED datasets for making their data publicly available.
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mic_dp-0.3.tar.gz.
File metadata
- Download URL: mic_dp-0.3.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
931792cee3b2044038e965bc62fdca2d9d9d5d01cd079fbc6134c34f9a3adb5d
|
|
| MD5 |
84217d402b6d702433aba43b9ab52307
|
|
| BLAKE2b-256 |
0daf508e68d5a99d042b743364224b662836ab87e6c2bbf4a5e3a45e0ae53347
|
File details
Details for the file mic_dp-0.3-py3-none-any.whl.
File metadata
- Download URL: mic_dp-0.3-py3-none-any.whl
- Upload date:
- Size: 3.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6221590fb869d7ea4f97aada8218f9b0c5cfa3dbc3a01e733d011fe517ca51e3
|
|
| MD5 |
e13e860a9788eab39c8dc51ab3078b9e
|
|
| BLAKE2b-256 |
4f02c82cde376b951058b08104c7c0f21e46a5268bdbbfa01c857113177e37b8
|