Skip to main content

A Python library for privacy-preserving machine learning

Project description

SecureML Logo

CI/CD Status Tests Status PyPI Version License Python Versions

SecureML is an open-source Python library that integrates with popular machine learning frameworks like TensorFlow and PyTorch. It provides developers with easy-to-use utilities to ensure that AI agents handle sensitive data in compliance with data protection regulations.

Key Features

  • Data Anonymization Utilities:
    • K-anonymity implementation with adaptive generalization
    • Pseudonymization with format-preserving encryption
    • Configurable data masking with statistical property preservation
    • Hierarchical data generalization with taxonomy support
    • Automatic sensitive data detection
  • Privacy-Preserving Training Methods:
    • Differential privacy integration with PyTorch (via Opacus) and TensorFlow (via TF Privacy)
    • Federated learning with Flower, allowing training on distributed data without centralization
    • Support for secure aggregation and privacy-preserving federated learning
  • Compliance Checkers: Tools to analyze datasets and model configurations for potential privacy risks
  • Synthetic Data Generation: Utilities to create synthetic datasets that mimic real data
  • Regulation-Specific Presets:
    • Pre-configured YAML settings aligned with major regulations (GDPR, CCPA, HIPAA)
    • Detailed compliance requirements for each regulation
    • Customizable identifiers for personal data and sensitive information
    • Integration with compliance checking functionality
  • Audit Trails and Reporting: Automatic logging of privacy measures and model decisions

Installation

Disclaimer: Due to Tensorflow-privacy compatibility issues, SecureML is only available up to Python 3.11. We will update as soon as Tensorflow-privacy releases a version compatible to Python 3.12+

With pip (Python 3.11):

pip install secureml

Optional Dependencies

# For generating PDF reports for compliance and audit trails
pip install secureml[pdf]

# For secure key management with HashiCorp Vault
pip install secureml[vault]

# For all optional components
pip install secureml[pdf,vault]

Quick Start

Data Anonymization

Anonymizing a dataset to comply with privacy regulations:

import pandas as pd
from secureml import anonymize

# Load your dataset
data = pd.DataFrame({
    "name": ["John Doe", "Jane Smith", "Bob Johnson"],
    "age": [32, 45, 28],
    "email": ["john.doe@example.com", "jane.smith@example.com", "bob.j@example.com"],
    "ssn": ["123-45-6789", "987-65-4321", "456-78-9012"],
    "zip_code": ["10001", "94107", "60601"],
    "income": [75000, 82000, 65000]
})
    
# Anonymize using k-anonymity
anonymized_data = anonymize(
    data,
    method="k-anonymity",
    k=2,
        sensitive_columns=["name", "email", "ssn"]
    )
    
    print(anonymized_data)

Compliance Checking with Regulation Presets

SecureML includes built-in presets for major regulations (GDPR, CCPA, HIPAA) that define the compliance requirements specific to each regulation:

import pandas as pd
from secureml import check_compliance
    
# Load your dataset
data = pd.read_csv("your_dataset.csv")
    
# Model configuration
model_config = {
    "model_type": "neural_network",
    "input_features": ["age", "income", "zip_code"],
    "output": "purchase_likelihood",
    "training_method": "standard_backprop"
}
    
# Check compliance with GDPR
report = check_compliance(   
    data=data,
    model_config=model_config,
    regulation="GDPR"
)
    
# View compliance issues
if report.has_issues():
    print("Compliance issues found:")
    for issue in report.issues:
        print(f"- {issue['component']}: {issue['issue']} ({issue['severity']})")
        print(f"  Recommendation: {issue['recommendation']}")

Privacy-Preserving Machine Learning

Train a model with differential privacy guarantees:

import torch.nn as nn
import pandas as pd
from secureml import differentially_private_train
    
# Create a simple PyTorch model
model = nn.Sequential(
    nn.Linear(10, 64),
    nn.ReLU(),
    nn.Linear(64, 2),
    nn.Softmax(dim=1)
)
    
# Load your dataset
data = pd.read_csv("your_dataset.csv")
    
# Train with differential privacy
private_model = differentially_private_train(
    model=model,
    data=data,
    epsilon=1.0,  # Privacy budget
    delta=1e-5,   # Privacy delta parameter
    epochs=10,
    batch_size=64
)

Synthetic Data Generation

Generate synthetic data that maintains the statistical properties of the original data:

import pandas as pd
from secureml import generate_synthetic_data
    
# Load your dataset
data = pd.read_csv("your_dataset.csv")
    
# Generate synthetic data
synthetic_data = generate_synthetic_data(
    template=data,
    num_samples=1000,
    method="statistical",  # Options: simple, statistical, sdv-copula, gan
    sensitive_columns=["name", "email", "ssn"]
)
    
print(synthetic_data.head())

Documentation

For detailed documentation, examples, and API reference, visit our documentation.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request or Issue. Our focus is expanding supported legislations beyond GDPR, CCPA, and HIPAA. You can help us with that!

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

secureml-0.2.1.tar.gz (81.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

secureml-0.2.1-py3-none-any.whl (88.0 kB view details)

Uploaded Python 3

File details

Details for the file secureml-0.2.1.tar.gz.

File metadata

  • Download URL: secureml-0.2.1.tar.gz
  • Upload date:
  • Size: 81.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for secureml-0.2.1.tar.gz
Algorithm Hash digest
SHA256 d0df1e75bef4d4b92c19f7c11d10361c8c47ed76b6dd2ab2fd4d8b18a52c9907
MD5 79e7c5d5d7ec205bf8b95deb029af3d2
BLAKE2b-256 87070b0c087be3c52264cdab996d3ee84c9c01e6aed94d9f3abee2a666726375

See more details on using hashes here.

File details

Details for the file secureml-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: secureml-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 88.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for secureml-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fc9d75b1bd4d6bcce20f3a432b57393f1a703c9d2d3e98d5648bea017f0d7ce2
MD5 83f1353175a2e52cedcdd184cd82190a
BLAKE2b-256 f823ccf60167347901f392c11f1a1257f8444471ca338b949f5d82ffdab52767

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page