Skip to main content

Python bindings for self-encryption library

Project description

self_encryption

Self encrypting files (convergent encryption plus obfuscation)

Crate Documentation
Documentation
MaidSafe website SAFE Dev Forum SAFE Network Forum

Table of Contents

Overview

A version of convergent encryption with an additional obfuscation step. This pattern allows secured data that can also be de-duplicated. This library presents an API that takes a set of bytes and returns a secret key derived from those bytes, and a set of encrypted chunks.

Important Security Note: While this library provides very secure encryption of the data, the returned secret key requires the same secure handling as would be necessary for any secret key.

image of self encryption

Documentation

Features

  • Content-based chunking
  • Convergent encryption
  • Self-validating chunks
  • Hierarchical data maps for handling large files
  • Streaming encryption/decryption
  • Python bindings
  • Flexible storage backend support
  • Custom storage backends via functors

Usage

Rust Usage

Installation

Add this to your Cargo.toml:

[dependencies]
self_encryption = "0.30"
bytes = "1.0"

Basic Operations

use self_encryption::{encrypt, decrypt_full_set};
use bytes::Bytes;

// Basic encryption/decryption
fn basic_example() -> Result<()> {
    let data = Bytes::from("Hello, World!".repeat(1000));  // Must be at least 3072 bytes
    
    // Encrypt data
    let (data_map, encrypted_chunks) = encrypt(data.clone())?;
    
    // Decrypt data
    let decrypted = decrypt_full_set(&data_map, &encrypted_chunks)?;
    assert_eq!(data, decrypted);
    
    Ok(())
}

Storage Backends

use self_encryption::{shrink_data_map, get_root_data_map, decrypt_from_storage};
use std::collections::HashMap;
use std::sync::{Arc, Mutex};

// Memory Storage Example
fn memory_storage_example() -> Result<()> {
    let storage = Arc::new(Mutex::new(HashMap::new()));
    
    // Store function
    let store = |hash, data| {
        storage.lock().unwrap().insert(hash, data);
        Ok(())
    };
    
    // Retrieve function
    let retrieve = |hash| {
        storage.lock().unwrap()
            .get(&hash)
            .cloned()
            .ok_or_else(|| Error::Generic("Chunk not found".into()))
    };
    
    // Use with data map operations
    let shrunk_map = shrink_data_map(data_map, store)?;
    let root_map = get_root_data_map(shrunk_map, retrieve)?;
    
    Ok(())
}

// Disk Storage Example
fn disk_storage_example() -> Result<()> {
    let chunk_dir = PathBuf::from("chunks");
    
    // Store function
    let store = |hash, data| {
        let path = chunk_dir.join(hex::encode(hash));
        std::fs::write(path, data)?;
        Ok(())
    };
    
    // Retrieve function
    let retrieve = |hash| {
        let path = chunk_dir.join(hex::encode(hash));
        Ok(Bytes::from(std::fs::read(path)?))
    };
    
    // Use with data map operations
    let shrunk_map = shrink_data_map(data_map, store)?;
    let root_map = get_root_data_map(shrunk_map, retrieve)?;
    
    Ok(())
}

Python Usage

Installation

pip install self-encryption

Basic Operations

from self_encryption import encrypt, decrypt

# Basic in-memory encryption/decryption
def basic_example():
    # Create test data (must be at least 3072 bytes)
    data = b"Hello, World!" * 1000
    
    # Encrypt data - returns data map and encrypted chunks
    data_map, chunks = encrypt(data)
    print(f"Data encrypted into {len(chunks)} chunks")
    print(f"Data map has child level: {data_map.child()}")
    
    # Decrypt data
    decrypted = decrypt(data_map, chunks)
    assert data == decrypted

File Operations

from pathlib import Path
from self_encryption import encrypt_from_file, decrypt_from_storage, streaming_encrypt_from_file

def file_example():
    # Setup paths
    input_path = Path("large_file.dat")
    chunk_dir = Path("chunks")
    output_path = Path("decrypted_file.dat")
    
    # Ensure chunk directory exists
    chunk_dir.mkdir(exist_ok=True)
    
    # Regular file encryption - stores all chunks at once
    data_map, chunk_names = encrypt_from_file(str(input_path), str(chunk_dir))
    print(f"File encrypted into {len(chunk_names)} chunks")
    
    # Streaming encryption - memory efficient for large files
    def store_chunk(name_hex: str, content: bytes) -> None:
        chunk_path = chunk_dir / name_hex
        chunk_path.write_bytes(content)
    
    data_map = streaming_encrypt_from_file(str(input_path), store_chunk)
    print(f"File encrypted with streaming method")
    
    # Create chunk retrieval function
    def get_chunk(hash_hex: str) -> bytes:
        chunk_path = chunk_dir / hash_hex
        return chunk_path.read_bytes()
    
    # Decrypt file
    decrypt_from_storage(data_map, str(output_path), get_chunk)

Advanced Features

from self_encryption import shrink_data_map, get_root_data_map

def advanced_example():
    # Create custom storage backend
    chunk_store = {}
    
    def store_chunk(name_hex: str, content: bytes) -> None:
        chunk_store[name_hex] = content
    
    def get_chunk(name_hex: str) -> bytes:
        return chunk_store[name_hex]
    
    # Use streaming encryption with custom storage
    data_map = streaming_encrypt_from_file("large_file.dat", store_chunk)
    
    # Get root data map for hierarchical storage
    root_map = get_root_data_map(data_map, get_chunk)
    print(f"Root data map level: {root_map.child()}")

Implementation Details

Core Process

  • Files are split into chunks of up to 1MB
  • Each chunk is processed in three steps:
    1. Compression (using Brotli)
    2. Encryption (using AES-256-CBC)
    3. XOR obfuscation

Key Generation and Security

  • Each chunk's encryption uses keys derived from the content hashes of three chunks:

    For chunk N:
    - Uses hashes from chunks [N, N+1, N+2]
    - Combined hash = hash(N) || hash(N+1) || hash(N+2)
    - Split into:
      - Pad (first X bytes)
      - Key (next 16 bytes for AES-256)
      - IV  (final 16 bytes)
    
  • This creates a chain of dependencies where each chunk's encryption depends on its neighbors

  • Provides both convergent encryption and additional security through the interdependencies

Encryption Flow

  1. Content Chunking:

    • File is split into chunks of optimal size
    • Each chunk's raw content is hashed (SHA3-256)
    • These hashes become part of the DataMap
  2. Per-Chunk Processing:

    // For each chunk:
    1. Compress data using Brotli
    2. Generate key materials:
       - Combine three consecutive chunk hashes
       - Extract pad, key, and IV
    3. Encrypt compressed data using AES-256-CBC
    4. XOR encrypted data with pad for obfuscation
    
  3. DataMap Creation:

    • Stores both pre-encryption (src) and post-encryption (dst) hashes
    • Maintains chunk ordering and size information
    • Required for both encryption and decryption processes

Decryption Flow

  1. Chunk Retrieval:

    • Use DataMap to identify required chunks
    • Retrieve chunks using dst_hash as identifier
  2. Per-Chunk Processing:

    // For each chunk:
    1. Regenerate key materials using src_hashes from DataMap
    2. Remove XOR obfuscation using pad
    3. Decrypt using AES-256-CBC with key and IV
    4. Decompress using Brotli
    
  3. Chunk Reassembly:

    • Chunks are processed in order specified by DataMap
    • Reassembled into original file

Storage Features

  • Flexible backend support through trait-based design

  • Supports both memory and disk-based storage

  • Streaming operations for memory efficiency

  • Hierarchical data maps for large files:

    // DataMap shrinking for large files
    1. Serialize large DataMap
    2. Encrypt serialized map using same process
    3. Create new DataMap with fewer chunks
    4. Repeat until manageable size reached
    

Security Properties

  • Content-based convergent encryption
  • Additional security through chunk interdependencies
  • Self-validating chunks through hash verification
  • No single point of failure in chunk storage
  • Tamper-evident through hash chains

Performance Optimizations

  • Parallel chunk processing where possible
  • Streaming support for large files
  • Efficient memory usage through chunking
  • Optimized compression settings
  • Configurable chunk sizes

This implementation provides a balance of:

  • Security (through multiple encryption layers)
  • Deduplication (through convergent encryption)
  • Performance (through parallelization and streaming)
  • Flexibility (through modular storage backends)

License

Licensed under the General Public License (GPL), version 3 (LICENSE http://www.gnu.org/licenses/gpl-3.0.en.html).

Linking Exception

self_encryption is licensed under GPLv3 with linking exception. This means you can link to and use the library from any program, proprietary or open source; paid or gratis. However, if you modify self_encryption, you must distribute the source to your modified version under the terms of the GPLv3.

See the LICENSE file for more details.

Contributing

Want to contribute? Great :tada:

There are many ways to give back to the project, whether it be writing new code, fixing bugs, or just reporting errors. All forms of contributions are encouraged!

For instructions on how to contribute, see our Guide to contributing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

self_encryption-0.32.41.tar.gz (156.3 kB view details)

Uploaded Source

Built Distributions

self_encryption-0.32.41-cp312-none-win_amd64.whl (938.3 kB view details)

Uploaded CPython 3.12 Windows x86-64

self_encryption-0.32.41-cp311-none-win_amd64.whl (937.3 kB view details)

Uploaded CPython 3.11 Windows x86-64

self_encryption-0.32.41-cp310-none-win_amd64.whl (937.2 kB view details)

Uploaded CPython 3.10 Windows x86-64

self_encryption-0.32.41-cp39-none-win_amd64.whl (937.4 kB view details)

Uploaded CPython 3.9 Windows x86-64

self_encryption-0.32.41-cp38-none-win_amd64.whl (937.4 kB view details)

Uploaded CPython 3.8 Windows x86-64

File details

Details for the file self_encryption-0.32.41.tar.gz.

File metadata

  • Download URL: self_encryption-0.32.41.tar.gz
  • Upload date:
  • Size: 156.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for self_encryption-0.32.41.tar.gz
Algorithm Hash digest
SHA256 48db1e70d793db8bf89bd055ba1dd24241f13bdb6b588ac9168a65b33459a2be
MD5 e2c8bf298f30c62596daf4e67b17fafe
BLAKE2b-256 9d7ab07de07aa643909e08cc75f45a0ab7acec696639722277261386e630a549

See more details on using hashes here.

Provenance

The following attestation bundles were made for self_encryption-0.32.41.tar.gz:

Publisher: python-publish.yml on dirvine/self_encryption

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file self_encryption-0.32.41-cp312-none-win_amd64.whl.

File metadata

File hashes

Hashes for self_encryption-0.32.41-cp312-none-win_amd64.whl
Algorithm Hash digest
SHA256 9e85b538d81d00d13a1acae4190fc2248070a5a0afa72060afef5446457535f6
MD5 031de772494c64026a2f9505439d2069
BLAKE2b-256 2bca762c6f496149071b26999b33127c01f5233010eca5bd93d939b60afab4d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for self_encryption-0.32.41-cp312-none-win_amd64.whl:

Publisher: python-publish.yml on dirvine/self_encryption

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file self_encryption-0.32.41-cp311-none-win_amd64.whl.

File metadata

File hashes

Hashes for self_encryption-0.32.41-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 c204eedbbe5d8fc5978ac5dbfa242be85e04b096e2faedf6915b858a31abd2c8
MD5 111f654286fc987cc4d17d75f52ce081
BLAKE2b-256 a55f5f8692d0befb8c864f31a13ad2d5218048055a942945f6d35161f115a1f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for self_encryption-0.32.41-cp311-none-win_amd64.whl:

Publisher: python-publish.yml on dirvine/self_encryption

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file self_encryption-0.32.41-cp310-none-win_amd64.whl.

File metadata

File hashes

Hashes for self_encryption-0.32.41-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 fdfdce7023b2d29a8f21b47672555cef4ef1fb54490959c3b5a13c825ec4945d
MD5 a6e8a5d01f78178528da2733c871e88c
BLAKE2b-256 22456313e4fa9bb6f22f8fd8154429285d4ceaef73b569d722fd093b615c0cf4

See more details on using hashes here.

Provenance

The following attestation bundles were made for self_encryption-0.32.41-cp310-none-win_amd64.whl:

Publisher: python-publish.yml on dirvine/self_encryption

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file self_encryption-0.32.41-cp39-none-win_amd64.whl.

File metadata

File hashes

Hashes for self_encryption-0.32.41-cp39-none-win_amd64.whl
Algorithm Hash digest
SHA256 5a61be43fa2509caa5209a3dfcdddab0eca4011aad14c1446d42307b37bf4ec4
MD5 cac84c9bc216db47c3aa3e96ab52aa4d
BLAKE2b-256 28f0ea1c9df4919135ae1c7497bbe767fa95b0fa5a27f300dddde8c4144a367a

See more details on using hashes here.

Provenance

The following attestation bundles were made for self_encryption-0.32.41-cp39-none-win_amd64.whl:

Publisher: python-publish.yml on dirvine/self_encryption

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file self_encryption-0.32.41-cp38-none-win_amd64.whl.

File metadata

File hashes

Hashes for self_encryption-0.32.41-cp38-none-win_amd64.whl
Algorithm Hash digest
SHA256 374e7773c3964c252f52efd37939f2c3296a13abe62f300d28195e7599d81bef
MD5 4f53b17271fa54f9c949828e209c2ba9
BLAKE2b-256 415abd33c20d42ab294f34ab0ddc09c27f3d6df2dfeaca64453215041569dce0

See more details on using hashes here.

Provenance

The following attestation bundles were made for self_encryption-0.32.41-cp38-none-win_amd64.whl:

Publisher: python-publish.yml on dirvine/self_encryption

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page