Skip to main content

A privacy-preserving data publishing tool with parameter optimization

Project description

Privacy Tuner

Privacy Tuner 是一个通用的隐私保护数据发布工具。它通过智能参数搜索,在用户指定的隐私预算和效用目标约束下,自动寻找最优的数据扰动参数,从而在保护隐私的同时最大化数据的实用性。最终输出净化后的数据集、最优参数组合以及可视化报告。

特性

  • 模块化设计:支持可插拔的隐私机制、风险度量、效用评估和优化器。
  • 多种隐私模型:差分隐私(拉普拉斯、高斯)、经验风险、k-匿名等(持续扩展)。
  • 多种数据类型:表格数据原生支持,预留文本、图像接口。
  • 自动参数搜索:内置网格搜索、贝叶斯优化、遗传算法,自动寻找最优参数。
  • 可解释性:生成风险-效用曲线和自然语言报告,帮助理解参数选择。

安装

  1. 克隆仓库:

    git clone https://github.com/yourusername/privacy_tuner.git
    cd privacy_tuner
    
  2. 创建虚拟环境(推荐):

    python -m venv venv
    source venv/bin/activate  # Linux/Mac
    venv\Scripts\activate     # Windows
    
  3. 安装依赖:

    pip install -r requirements.txt
    
  4. (可选)安装为开发模式:

    pip install -e .
    

快速开始

下面是一个使用 Adult 数据集的完整示例:

import numpy as np
from sklearn.datasets import fetch_openml
from privacy_tuner.input import CSVLoader
from privacy_tuner.core.mechanisms import CopulaMechanism
from privacy_tuner.core.risk import EmpiricalRiskMeter
from privacy_tuner.core.utility import TSTRClassifier
from privacy_tuner.core.optimizers import GridSearchOptimizer
from privacy_tuner.output import CSVExporter, SimpleReporter

# 加载数据
adult = fetch_openml(data_id=1590, as_frame=True)
df = adult.data.select_dtypes(include=[np.number]).dropna()
y = (adult.target == '>50K').astype(int).values
data = np.column_stack([df.values, y])

# 设置组件
mechanism = CopulaMechanism()
risk_meter = EmpiricalRiskMeter()
utility = TSTRClassifier(target_column=-1)
optimizer = GridSearchOptimizer()

# 定义参数空间和风险约束
param_space = {
    'sampling_ratio': [0.5, 1.0, 2.0],
    'noise_scale': [0.0, 0.1, 0.5]
}
risk_constraint = (0.0, 1.0)

# 执行搜索
best_params, best_utility, best_risk = optimizer.search(
    original_data=data,
    mechanism=mechanism,
    risk_meter=risk_meter,
    utility_evaluator=utility,
    param_space=param_space,
    risk_constraint=risk_constraint
)

print(f"最优参数: {best_params}")
print(f"效用: {best_utility:.4f}, 风险: {best_risk:.4f}")

更多示例请参考 examples/ 目录。

核心模块

  • 输入层DataLoader, FeatureAnalyzer
  • 隐私机制PrivacyMechanism(实现:CopulaMechanism, LaplaceMechanism, GaussianMechanism
  • 风险度量RiskMeter(实现:EmpiricalRiskMeter, DPRisk
  • 效用评估UtilityEvaluator(实现:TSTRClassifier, QueryError, ClusteringSilhouette
  • 优化器Optimizer(实现:GridSearchOptimizer, BayesianOptimizer, GeneticOptimizer
  • 输出层CSVExporter, SimpleReporter

文档

详细开发者文档请见 docs/developer_guide.md

贡献

欢迎贡献代码、报告问题或提出建议。请阅读 CONTRIBUTING.md(如有)了解详情。

许可证

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

privacy_tuner-0.1.0.tar.gz (29.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

privacy_tuner-0.1.0-py3-none-any.whl (44.4 kB view details)

Uploaded Python 3

File details

Details for the file privacy_tuner-0.1.0.tar.gz.

File metadata

  • Download URL: privacy_tuner-0.1.0.tar.gz
  • Upload date:
  • Size: 29.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for privacy_tuner-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9b5a809ceb18e64f39f7eefbb062b025980b24896532345ea829d45f9fa0a007
MD5 da3d3bf3cfb5901fafcac2b92fed5fe1
BLAKE2b-256 47fe8c2e0cd4fe9698730d3862496dfe7c974fbc0bd64d2916cf992a5743a145

See more details on using hashes here.

File details

Details for the file privacy_tuner-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: privacy_tuner-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 44.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for privacy_tuner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 37fe173d193d759834256c51364188649622cd4bd2db01f392c713c3e52294cb
MD5 e6fcf6e4d2f866daa0ce7f73072123f5
BLAKE2b-256 51a87d23e84d1ecde3947127cc2c410d03a7ce9e83155186358730325912ff21

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page