A Python package for maximum information coefficient differential privacy

These details have not been verified by PyPI

Project links

Homepage

Project description

MIC_DP: Maximum Information Coefficient Differential Privacy

mic_dp is a Python package that enables differentially private data transformation guided by the Maximum Information Coefficient (MIC), with application to both supervised and unsupervised learning tasks. Traditional differential privacy (DP) mechanisms often degrade utility uniformly across features. In contrast, mic_dp uses MIC to scale the noise injection, preserving more utility in informative features.

Summary

This package includes functions for:

Calculating MIC, Pearson, and Mahalanobis-based feature relevance
Feature selection based on scaled importance
Applying Gaussian or Laplace DP mechanisms using custom noise scaling
Evaluating MAE, clustering scores, and plotting results

Our experiments show that MIC-guided DP mechanisms consistently outperform Pearson, Mahalanobis, and baseline DP in terms of feature and prediction accuracy under privacy constraints. In unsupervised settings, MIC-DP preserves cluster structures better, as shown by silhouette score, ARI, and V-measure.

Installation

You can install the package directly from PyPI:

pip install micdp

Or install from source:

git clone https://github.com/merlery/mic_dp.git
cd mic_dp
pip install -e .

Quick Start

Here's a simple example of how to use mic_dp for supervised learning:

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from core import (
    noise_scaling_MIC,
    calculate_sensitivity,
    correlated_dp_gaussian,
    mean_absolute_error
)

# Load and preprocess your data
df = pd.read_csv('your_dataset.csv')
df.dropna(inplace=True)
X = df.select_dtypes(include=['number'])
X_norm = pd.DataFrame(MinMaxScaler().fit_transform(X), columns=X.columns)
y = df['target_column']  # Your target variable

# Calculate MIC-based noise scaling factors
noise_factors = noise_scaling_MIC(y, X_norm, amplification_factor=5)

# Calculate sensitivity for each feature
sensitivity = calculate_sensitivity(X_norm)

# Apply differential privacy with MIC-guided noise scaling
private_X = correlated_dp_gaussian(
    X_norm.copy(),
    noise_factors,
    sensitivity,
    epsilon=0.5,  # Privacy budget
    delta=1e-5  # Privacy relaxation parameter
)

# Evaluate the utility loss
mae = mean_absolute_error(X_norm, private_X)
print(f"Mean Absolute Error: {mae:.4f}")

Detailed Example

For a more comprehensive example, see the supervised_experiment.py script, which demonstrates:

Loading and preprocessing the Adult Census Income dataset
Calculating feature relevance using MIC, Pearson, and Mahalanobis methods
Applying differential privacy with different noise scaling strategies
Evaluating and comparing the utility of each approach
Visualizing the results

To run the example:

python examples/supervised_experiment.py

Experimental Results

MIC-guided noise scaling consistently outperforms conventional approaches in preserving prediction accuracy and clustering structure under differential privacy constraints.

Feature MAE comparison for MIC-DP vs. state-of-art approaches Prediction MAE comparison for MIC-DP vs. state-of-art approaches

API Reference

Core Functions

noise_scaling_MIC(target, features, factor): Calculate noise scaling factors based on Maximum Information Coefficient
noise_scaling_pearson(target, features, factor): Calculate noise scaling factors based on Pearson correlation
noise_scaling_mahalanobis_distances(target, features, factor): Calculate noise scaling factors based on Mahalanobis distances
calculate_sensitivity(features): Calculate sensitivity for each feature based on its range
correlated_dp_gaussian(X, noise_factors, sensitivity, epsilon, delta): Apply Gaussian differential privacy with custom noise scaling
correlated_dp_laplace(X, noise_factors, sensitivity, epsilon, delta): Apply Laplace differential privacy with custom noise scaling
feature_selection(percentage, X, noise_scaling_factor): Select features based on their noise scaling factors
mean_absolute_error(y_true, y_pred): Calculate mean absolute error between true and predicted values
cluster_and_evaluate(df, name, n_clusters): Perform clustering and evaluate the results
calculate_ari(labels1, labels2): Calculate Adjusted Rand Index between two cluster labelings
calculate_v_measure(labels1, labels2): Calculate V-measure between two cluster labelings

Citation

If you use this package in your research, please cite:

@article{yang2025micdp,
  title={mic\_dp: A Python package for maximum information coefficient differential privacy},
  author={Yang, Wenjun and Al-masri, Eyhab and Kotevska, Olivera},
  journal={Journal of Open Source Software},
  year={2025}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

We acknowledge the creators of the ACI and HED datasets for making their data publicly available.

References

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.3

Apr 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mic_dp-0.3.tar.gz (3.9 kB view details)

Uploaded Apr 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mic_dp-0.3-py3-none-any.whl (3.7 kB view details)

Uploaded Apr 24, 2025 Python 3

File details

Details for the file mic_dp-0.3.tar.gz.

File metadata

Download URL: mic_dp-0.3.tar.gz
Upload date: Apr 24, 2025
Size: 3.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for mic_dp-0.3.tar.gz
Algorithm	Hash digest
SHA256	`931792cee3b2044038e965bc62fdca2d9d9d5d01cd079fbc6134c34f9a3adb5d`
MD5	`84217d402b6d702433aba43b9ab52307`
BLAKE2b-256	`0daf508e68d5a99d042b743364224b662836ab87e6c2bbf4a5e3a45e0ae53347`

See more details on using hashes here.

File details

Details for the file mic_dp-0.3-py3-none-any.whl.

File metadata

Download URL: mic_dp-0.3-py3-none-any.whl
Upload date: Apr 24, 2025
Size: 3.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for mic_dp-0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6221590fb869d7ea4f97aada8218f9b0c5cfa3dbc3a01e733d011fe517ca51e3`
MD5	`e13e860a9788eab39c8dc51ab3078b9e`
BLAKE2b-256	`4f02c82cde376b951058b08104c7c0f21e46a5268bdbbfa01c857113177e37b8`

See more details on using hashes here.

mic-dp 0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MIC_DP: Maximum Information Coefficient Differential Privacy

Summary

Installation

Quick Start

Detailed Example

Experimental Results

API Reference

Core Functions

Citation

License

Acknowledgements

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes