A Python library for privacy-preserving machine learning
Project description
SecureML is an open-source Python library that integrates with popular machine learning frameworks like TensorFlow and PyTorch. It provides developers with easy-to-use utilities to ensure that AI agents handle sensitive data in compliance with data protection regulations.
Key Features
- Data Anonymization Utilities:
- K-anonymity implementation with adaptive generalization
- Pseudonymization with format-preserving encryption
- Configurable data masking with statistical property preservation
- Hierarchical data generalization with taxonomy support
- Automatic sensitive data detection
- Privacy-Preserving Training Methods:
- Differential privacy integration with PyTorch (via Opacus) and TensorFlow (via TF Privacy)
- Federated learning with Flower, allowing training on distributed data without centralization
- Support for secure aggregation and privacy-preserving federated learning
- Compliance Checkers: Tools to analyze datasets and model configurations for potential privacy risks
- Synthetic Data Generation: Utilities to create synthetic datasets that mimic real data
- Regulation-Specific Presets:
- Pre-configured YAML settings aligned with major regulations (GDPR, CCPA, HIPAA)
- Detailed compliance requirements for each regulation
- Customizable identifiers for personal data and sensitive information
- Integration with compliance checking functionality
- Audit Trails and Reporting: Automatic logging of privacy measures and model decisions
Installation
Disclaimer: Due to Tensorflow-privacy compatibility issues, SecureML is only available up to Python 3.11. We will update as soon as Tensorflow-privacy releases a version compatible to Python 3.12+
With pip (Python 3.11):
pip install secureml
Optional Dependencies
# For generating PDF reports for compliance and audit trails
pip install secureml[pdf]
# For secure key management with HashiCorp Vault
pip install secureml[vault]
# For all optional components
pip install secureml[pdf,vault]
Quick Start
Data Anonymization
Anonymizing a dataset to comply with privacy regulations:
import pandas as pd
from secureml import anonymize
# Load your dataset
data = pd.DataFrame({
"name": ["John Doe", "Jane Smith", "Bob Johnson"],
"age": [32, 45, 28],
"email": ["john.doe@example.com", "jane.smith@example.com", "bob.j@example.com"],
"ssn": ["123-45-6789", "987-65-4321", "456-78-9012"],
"zip_code": ["10001", "94107", "60601"],
"income": [75000, 82000, 65000]
})
# Anonymize using k-anonymity
anonymized_data = anonymize(
data,
method="k-anonymity",
k=2,
sensitive_columns=["name", "email", "ssn"]
)
print(anonymized_data)
Compliance Checking with Regulation Presets
SecureML includes built-in presets for major regulations (GDPR, CCPA, HIPAA) that define the compliance requirements specific to each regulation:
import pandas as pd
from secureml import check_compliance
# Load your dataset
data = pd.read_csv("your_dataset.csv")
# Model configuration
model_config = {
"model_type": "neural_network",
"input_features": ["age", "income", "zip_code"],
"output": "purchase_likelihood",
"training_method": "standard_backprop"
}
# Check compliance with GDPR
report = check_compliance(
data=data,
model_config=model_config,
regulation="GDPR"
)
# View compliance issues
if report.has_issues():
print("Compliance issues found:")
for issue in report.issues:
print(f"- {issue['component']}: {issue['issue']} ({issue['severity']})")
print(f" Recommendation: {issue['recommendation']}")
Privacy-Preserving Machine Learning
Train a model with differential privacy guarantees:
import torch.nn as nn
import pandas as pd
from secureml import differentially_private_train
# Create a simple PyTorch model
model = nn.Sequential(
nn.Linear(10, 64),
nn.ReLU(),
nn.Linear(64, 2),
nn.Softmax(dim=1)
)
# Load your dataset
data = pd.read_csv("your_dataset.csv")
# Train with differential privacy
private_model = differentially_private_train(
model=model,
data=data,
epsilon=1.0, # Privacy budget
delta=1e-5, # Privacy delta parameter
epochs=10,
batch_size=64
)
Synthetic Data Generation
Generate synthetic data that maintains the statistical properties of the original data:
import pandas as pd
from secureml import generate_synthetic_data
# Load your dataset
data = pd.read_csv("your_dataset.csv")
# Generate synthetic data
synthetic_data = generate_synthetic_data(
template=data,
num_samples=1000,
method="statistical", # Options: simple, statistical, sdv-copula, gan
sensitive_columns=["name", "email", "ssn"]
)
print(synthetic_data.head())
Documentation
For detailed documentation, examples, and API reference, visit our documentation.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request or Issue. Our focus is expanding supported legislations beyond GDPR, CCPA, and HIPAA. You can help us with that!
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file secureml-0.2.1.tar.gz.
File metadata
- Download URL: secureml-0.2.1.tar.gz
- Upload date:
- Size: 81.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0df1e75bef4d4b92c19f7c11d10361c8c47ed76b6dd2ab2fd4d8b18a52c9907
|
|
| MD5 |
79e7c5d5d7ec205bf8b95deb029af3d2
|
|
| BLAKE2b-256 |
87070b0c087be3c52264cdab996d3ee84c9c01e6aed94d9f3abee2a666726375
|
File details
Details for the file secureml-0.2.1-py3-none-any.whl.
File metadata
- Download URL: secureml-0.2.1-py3-none-any.whl
- Upload date:
- Size: 88.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc9d75b1bd4d6bcce20f3a432b57393f1a703c9d2d3e98d5648bea017f0d7ce2
|
|
| MD5 |
83f1353175a2e52cedcdd184cd82190a
|
|
| BLAKE2b-256 |
f823ccf60167347901f392c11f1a1257f8444471ca338b949f5d82ffdab52767
|