Skip to main content

Convenience functions for symetrically encrypting/decrypting huggingface Datasets

Project description

Installation

pip install encrypted-datasets

Usage

Raw string key

from datasets import load_dataset
from encrypted_datasets import encrypt_dataset, decrypt_dataset

huggingface_api_token = 'API_TOKEN'
downloaded_dataset = load_dataset('organization/dataset_repo', token=huggingface_api_token)
key = 'Your Symetric encryption key'

decrypted_dataset = decrypt_dataset(downloaded_dataset, key)

# Make modifications to decrypted_dataset...

re_encrypted_dataset = encrypt_dataset(decrypted_dataset, key)

re_encrypted_dataset.push_to_hub('organization/dataset_repo',token=huggingface_api_token)

AWS Key management service key

In this method, you use an AWS KMS key to encrypt data keys that are stored in huggingface with the data.

Create new encryped dataset and upload it to huggingface hub

from encrypted_datasets import EncryptedDataset, KMSCypher
import boto3

kms_client = boto3.client('kms')
kms_key_id = '<KMS_KEY_ID>'

cypher = KMSCypher(
    key_id=kms_key_id,
    client=client
)

dataset = Dataset.from_pandas(...)

encrypted_dataset = EncryptedDataset.encrypt(dataset, cypher)

encrypted_dataset.push_to_hub('organization/repo_id', token='<ACCESS_TOKEN>')

Load encrypted dataset, modify it, and reupload

from encrypted_datasets import EncryptedDataset, KMSCypher
import boto3

kms_client = boto3.client('kms')
kms_key_id = '<KMS_KEY_ID>'
hf_token= '<HF_TOKEN>'

cypher = KMSCypher(
    key_id=kms_key_id,
    client=client
)

encrypted_dataset = EncryptedDataset.load('organization/repo_id', token=hf_token)

dataset = encrypted_dataset.decrypt(cypher)

# Make modifications to dataset...


new_encrypted_dataset = EncryptedDataset.encrypt(dataset, cypher)

new_encrypted_dataset.push_to_hub('organization/repo_id', token=hf_token)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

encrypted_datasets-1.0.10.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

encrypted_datasets-1.0.10-py3-none-any.whl (4.2 kB view details)

Uploaded Python 3

File details

Details for the file encrypted_datasets-1.0.10.tar.gz.

File metadata

  • Download URL: encrypted_datasets-1.0.10.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.6 Darwin/23.5.0

File hashes

Hashes for encrypted_datasets-1.0.10.tar.gz
Algorithm Hash digest
SHA256 15d7f9cafd087706b3c032e4a3f204d31a6e150b8cc9c008594051ccd264e5b9
MD5 7393d3b1beadaa6dc5c51dccc5ae6092
BLAKE2b-256 4f0d2a424f33bde188feb683f427eab6339614ff23e52282f9abac51b9df2bb4

See more details on using hashes here.

File details

Details for the file encrypted_datasets-1.0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for encrypted_datasets-1.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 062151cc161dcccf798cd2b76180bde713681a50c7ab05ce4c785f5380a4ac46
MD5 8e6184b64a0b31b9bc5b748cf777d210
BLAKE2b-256 e61075dcbc472499bb508c9ac63ce9df62b3dadafa6976eb6da9857ec4d402ca

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page