Convenience functions for symetrically encrypting/decrypting huggingface Datasets
Project description
Installation
pip install encrypted-datasets
Usage
Raw string key
from datasets import load_dataset
from encrypted_datasets import encrypt_dataset, decrypt_dataset
huggingface_api_token = 'API_TOKEN'
downloaded_dataset = load_dataset('organization/dataset_repo', token=huggingface_api_token)
key = 'Your Symetric encryption key'
decrypted_dataset = decrypt_dataset(downloaded_dataset, key)
# Make modifications to decrypted_dataset...
re_encrypted_dataset = encrypt_dataset(decrypted_dataset, key)
re_encrypted_dataset.push_to_hub('organization/dataset_repo',token=huggingface_api_token)
AWS Key management service key
In this method, you use an AWS KMS key to encrypt data keys that are stored in huggingface with the data.
Create new encryped dataset and upload it to huggingface hub
from encrypted_datasets import EncryptedDataset, KMSCypher
import boto3
kms_client = boto3.client('kms')
kms_key_id = '<KMS_KEY_ID>'
cypher = KMSCypher(
key_id=kms_key_id,
client=client
)
dataset = Dataset.from_pandas(...)
encrypted_dataset = EncryptedDataset.encrypt(dataset, cypher)
encrypted_dataset.push_to_hub('organization/repo_id', token='<ACCESS_TOKEN>')
Load encrypted dataset, modify it, and reupload
from encrypted_datasets import EncryptedDataset, KMSCypher
import boto3
kms_client = boto3.client('kms')
kms_key_id = '<KMS_KEY_ID>'
hf_token= '<HF_TOKEN>'
cypher = KMSCypher(
key_id=kms_key_id,
client=client
)
encrypted_dataset = EncryptedDataset.load('organization/repo_id', token=hf_token)
dataset = encrypted_dataset.decrypt(cypher)
# Make modifications to dataset...
new_encrypted_dataset = EncryptedDataset.encrypt(dataset, cypher)
new_encrypted_dataset.push_to_hub('organization/repo_id', token=hf_token)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file encrypted_datasets-1.0.10.tar.gz
.
File metadata
- Download URL: encrypted_datasets-1.0.10.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.6 Darwin/23.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15d7f9cafd087706b3c032e4a3f204d31a6e150b8cc9c008594051ccd264e5b9 |
|
MD5 | 7393d3b1beadaa6dc5c51dccc5ae6092 |
|
BLAKE2b-256 | 4f0d2a424f33bde188feb683f427eab6339614ff23e52282f9abac51b9df2bb4 |
File details
Details for the file encrypted_datasets-1.0.10-py3-none-any.whl
.
File metadata
- Download URL: encrypted_datasets-1.0.10-py3-none-any.whl
- Upload date:
- Size: 4.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.6 Darwin/23.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 062151cc161dcccf798cd2b76180bde713681a50c7ab05ce4c785f5380a4ac46 |
|
MD5 | 8e6184b64a0b31b9bc5b748cf777d210 |
|
BLAKE2b-256 | e61075dcbc472499bb508c9ac63ce9df62b3dadafa6976eb6da9857ec4d402ca |