Skip to main content

Add your description here

Project description

Syft-RPC Package Documentation

Overview

The syft-rpc package provides the foundational RPC (Remote Procedure Call) protocol and serialization mechanisms for the SyftBox ecosystem. It handles the low-level details of serializing Python objects, managing RPC communication, and ensuring data integrity across distributed systems.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                          syft-rpc                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────────┐ │
│  │   Protocol   │    │     RPC      │    │    RPC_DB        │ │
│  │              │    │              │    │                  │ │
│  │ - Serialize  │    │ - Server     │    │ - Store RPCs     │ │
│  │ - Deserialize│    │ - Client     │    │ - Track status   │ │
│  │ - Type map   │    │ - Handlers   │    │ - Query history  │ │
│  │ - Rebuild    │    │              │    │                  │ │
│  └──────────────┘    └──────────────┘    └──────────────────┘ │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │                    Type System                            │  │
│  │                                                           │  │
│  │  Basic Types: int, str, float, bool, bytes, None        │  │
│  │  Collections: list, tuple, dict, set                     │  │
│  │  Complex: Pydantic models, dataclasses, custom objects   │  │
│  │                                                           │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Serialization Protocol

The protocol handles serialization of Python objects into a format that can be transmitted and reconstructed:

┌─────────────────┐      ┌─────────────────┐      ┌─────────────────┐
│  Python Object  │      │  Serialized     │      │  Python Object  │
│                 │      │  Representation │      │                 │
│  - Native types │ ───► │  - Type info    │ ───► │  - Restored     │
│  - Pydantic     │      │  - Data bytes   │      │  - Same type    │
│  - Dataclasses  │      │  - Metadata     │      │  - Same value   │
└─────────────────┘      └─────────────────┘      └─────────────────┘
     serialize()              transmit              deserialize()

Serialization Format

Each serialized object contains:

{
    "__type__": "module.ClassName",      # Type identifier
    "__version__": 1,                     # Protocol version
    "data": {...},                        # Actual data
    "__metadata__": {...}                 # Optional metadata
}

Core Components

1. Protocol Module

The protocol module handles the core serialization logic:

from syft_rpc.protocol import serialize, deserialize, rebuild

# Serialize any Python object
data = {"users": ["alice", "bob"], "count": 2}
serialized = serialize(data)  # Returns bytes

# Deserialize back to Python object
restored = deserialize(serialized)
assert restored == data

# Rebuild with type information
from pydantic import BaseModel

class User(BaseModel):
    name: str
    email: str

user = User(name="Alice", email="alice@example.com")
serialized = serialize(user)

# On the receiving end
rebuilt_user = rebuild(deserialize(serialized))
assert isinstance(rebuilt_user, User)
assert rebuilt_user.name == "Alice"

2. Type Registration

The protocol maintains a registry of serializable types:

from syft_rpc.protocol import register_type, get_type_string

# Register custom type
@register_type
class CustomData:
    def __init__(self, value):
        self.value = value
    
    def to_dict(self):
        return {"value": self.value}
    
    @classmethod
    def from_dict(cls, data):
        return cls(data["value"])

# Type string for serialization
type_string = get_type_string(CustomData)  # "module.CustomData"

3. RPC Communication

Basic RPC server and client implementation:

from syft_rpc.rpc import RPCServer, RPCClient

# Server side
server = RPCServer()

@server.register
def add(a: int, b: int) -> int:
    return a + b

@server.register
def get_user(user_id: str) -> dict:
    return {"id": user_id, "name": "Alice"}

# Start server
server.serve(host="localhost", port=8000)

# Client side
client = RPCClient("localhost", 8000)

# Call remote functions
result = client.call("add", a=5, b=3)
print(result)  # 8

user = client.call("get_user", user_id="123")
print(user)  # {"id": "123", "name": "Alice"}

Serialization Examples

Basic Types

from syft_rpc.protocol import serialize, deserialize

# Numbers
assert deserialize(serialize(42)) == 42
assert deserialize(serialize(3.14)) == 3.14

# Strings and bytes
assert deserialize(serialize("Hello")) == "Hello"
assert deserialize(serialize(b"Binary")) == b"Binary"

# Collections
data = {
    "list": [1, 2, 3],
    "tuple": (4, 5, 6),
    "set": {7, 8, 9},
    "dict": {"nested": True}
}
assert deserialize(serialize(data)) == data

Pydantic Models

from pydantic import BaseModel
from typing import List, Optional
from datetime import datetime

class Task(BaseModel):
    id: str
    title: str
    completed: bool = False
    tags: List[str] = []
    due_date: Optional[datetime] = None

# Create and serialize
task = Task(
    id="task-001",
    title="Write documentation",
    tags=["docs", "important"],
    due_date=datetime.now()
)

serialized = serialize(task)
restored_task = rebuild(deserialize(serialized))

assert isinstance(restored_task, Task)
assert restored_task.id == "task-001"
assert restored_task.tags == ["docs", "important"]

Dataclasses

from dataclasses import dataclass
from typing import List

@dataclass
class Product:
    name: str
    price: float
    categories: List[str]
    in_stock: bool = True

product = Product(
    name="Laptop",
    price=999.99,
    categories=["Electronics", "Computers"]
)

# Serialize and restore
serialized = serialize(product)
restored = rebuild(deserialize(serialized))

assert isinstance(restored, Product)
assert restored.price == 999.99

Complex Nested Structures

@dataclass
class Address:
    street: str
    city: str
    country: str

class Person(BaseModel):
    name: str
    age: int
    address: Address
    friends: List['Person'] = []

# Create complex structure
alice_addr = Address("123 Main St", "Boston", "USA")
alice = Person(name="Alice", age=30, address=alice_addr)

bob_addr = Address("456 Oak Ave", "Seattle", "USA")
bob = Person(name="Bob", age=28, address=bob_addr)

alice.friends.append(bob)
bob.friends.append(alice)

# Serialize with circular references
serialized = serialize(alice)
restored_alice = rebuild(deserialize(serialized))

assert restored_alice.name == "Alice"
assert restored_alice.friends[0].name == "Bob"
assert isinstance(restored_alice.address, Address)

UTF-8 Support

The protocol fully supports UTF-8 encoded data:

from syft_rpc.protocol import serialize, deserialize

# International characters
data = {
    "english": "Hello",
    "spanish": "Hola",
    "chinese": "你好",
    "arabic": "مرحبا",
    "emoji": "👋🌍"
}

serialized = serialize(data)
restored = deserialize(serialized)

for key, value in data.items():
    assert restored[key] == value

RPC Database

Track and manage RPC calls with the database module:

from syft_rpc.rpc_db import RPCDatabase

# Initialize database
db = RPCDatabase("rpc_history.db")

# Log RPC call
call_id = db.log_call(
    method="get_user",
    params={"user_id": "123"},
    caller="alice@example.com"
)

# Update with result
db.update_result(call_id, result={"name": "Alice", "id": "123"})

# Query history
recent_calls = db.get_recent_calls(limit=10)
user_calls = db.get_calls_by_method("get_user")

Error Handling

The protocol includes comprehensive error handling:

from syft_rpc.protocol import SerializationError, DeserializationError

try:
    # Attempt to serialize non-serializable object
    serialize(lambda x: x)  # Functions can't be serialized
except SerializationError as e:
    print(f"Serialization failed: {e}")

try:
    # Attempt to deserialize corrupted data
    deserialize(b"corrupted data")
except DeserializationError as e:
    print(f"Deserialization failed: {e}")

# Safe serialization with fallback
def safe_serialize(obj, fallback=None):
    try:
        return serialize(obj)
    except SerializationError:
        if fallback is not None:
            return serialize(fallback)
        return serialize({"error": "Could not serialize object"})

Performance Optimization

1. Batch Operations

from syft_rpc.protocol import serialize_batch, deserialize_batch

# Serialize multiple objects efficiently
objects = [
    {"id": 1, "data": "first"},
    {"id": 2, "data": "second"},
    {"id": 3, "data": "third"},
]

# Batch serialization
serialized_batch = serialize_batch(objects)

# Batch deserialization
restored_objects = deserialize_batch(serialized_batch)
assert len(restored_objects) == 3

2. Compression

import zlib
from syft_rpc.protocol import serialize, deserialize

def compress_serialize(obj):
    serialized = serialize(obj)
    compressed = zlib.compress(serialized)
    return compressed

def decompress_deserialize(compressed):
    decompressed = zlib.decompress(compressed)
    return deserialize(decompressed)

# Large data structure
large_data = {"items": [{"id": i, "data": "x" * 1000} for i in range(100)]}

# Compare sizes
normal = serialize(large_data)
compressed = compress_serialize(large_data)
print(f"Normal: {len(normal)} bytes")
print(f"Compressed: {len(compressed)} bytes")
print(f"Compression ratio: {len(compressed) / len(normal):.2%}")

3. Caching

from functools import lru_cache
import hashlib

@lru_cache(maxsize=1000)
def cached_serialize(obj_hash):
    # Cache serialization results for repeated objects
    return _do_serialize(obj_hash)

def smart_serialize(obj):
    # Create hash of object
    obj_bytes = str(obj).encode()
    obj_hash = hashlib.sha256(obj_bytes).hexdigest()
    
    # Use cached result if available
    return cached_serialize(obj_hash)

Security Considerations

1. Type Validation

from syft_rpc.protocol import set_allowed_types, SerializationError

# Restrict allowed types for security
set_allowed_types([
    int, str, float, bool, list, dict,
    "myapp.models.User",  # Specific allowed class
    "myapp.models.Task",
])

# This will now fail
try:
    serialize(eval)  # Dangerous function
except SerializationError as e:
    print("Blocked dangerous type")

2. Size Limits

from syft_rpc.protocol import set_size_limit

# Set maximum serialized size (10MB)
set_size_limit(10 * 1024 * 1024)

# Large objects will be rejected
huge_list = list(range(10_000_000))
try:
    serialize(huge_list)
except SerializationError as e:
    print("Object too large")

3. Sanitization

def sanitize_before_deserialize(data: bytes) -> bytes:
    # Check for suspicious patterns
    if b"__import__" in data or b"eval" in data:
        raise ValueError("Potentially malicious data")
    return data

# Safe deserialization
def safe_deserialize(data: bytes):
    sanitized = sanitize_before_deserialize(data)
    return deserialize(sanitized)

Integration with Other Packages

With syft-event

from syft_event import Request, Response
from syft_rpc.protocol import serialize, deserialize

# Serialize request for transport
request = Request(
    id="123",
    sender="alice@example.com",
    url=SyftBoxURL("syft://bob@example.com/app_data/api/rpc/data"),
    method="POST",
    body=serialize({"action": "get_data", "filters": {"active": True}})
)

# On receiving end
data = deserialize(request.body)
# Process data...

With syft-proxy

from syft_proxy.models import RPCSendRequest
from syft_rpc.protocol import serialize

# Prepare RPC request with serialized data
rpc_request = RPCSendRequest(
    app_name="data_processor",
    destination="carol@example.com",
    params=serialize({
        "operation": "aggregate",
        "data": [1, 2, 3, 4, 5]
    })
)

Best Practices

  1. Always validate deserialized data - Don't trust external data
  2. Use type hints - Helps with serialization and documentation
  3. Handle errors gracefully - Network issues can corrupt data
  4. Version your protocols - For backward compatibility
  5. Monitor performance - Serialization can be CPU intensive
  6. Implement timeouts - For RPC calls
  7. Log important operations - For debugging and auditing

Testing

import pytest
from syft_rpc.protocol import serialize, deserialize, rebuild

def test_round_trip():
    """Test that data survives serialization round trip."""
    test_data = {
        "string": "test",
        "number": 42,
        "float": 3.14,
        "bool": True,
        "none": None,
        "list": [1, 2, 3],
        "dict": {"nested": "value"}
    }
    
    serialized = serialize(test_data)
    restored = deserialize(serialized)
    
    assert restored == test_data

def test_pydantic_model():
    """Test Pydantic model serialization."""
    from pydantic import BaseModel
    
    class TestModel(BaseModel):
        name: str
        value: int
    
    model = TestModel(name="test", value=123)
    serialized = serialize(model)
    restored = rebuild(deserialize(serialized))
    
    assert isinstance(restored, TestModel)
    assert restored.name == "test"
    assert restored.value == 123

def test_error_handling():
    """Test error handling for invalid data."""
    with pytest.raises(Exception):
        deserialize(b"invalid data")
    
    with pytest.raises(Exception):
        deserialize(b"")

Troubleshooting

Common issues and solutions:

  1. Import errors after deserialization

    • Ensure all custom classes are imported before deserializing
    • Use rebuild() instead of raw deserialize()
  2. Circular reference errors

    • The protocol handles most circular references
    • For complex cases, implement custom serialization
  3. Performance issues

    • Use batch operations for multiple objects
    • Consider compression for large data
    • Profile serialization bottlenecks
  4. Type mismatch errors

    • Ensure sender and receiver have same class definitions
    • Use version checking for protocol compatibility

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syft_rpc-0.3.0.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

syft_rpc-0.3.0-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file syft_rpc-0.3.0.tar.gz.

File metadata

  • Download URL: syft_rpc-0.3.0.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for syft_rpc-0.3.0.tar.gz
Algorithm Hash digest
SHA256 09a25fa808885fc629c73ee4db6052416dc9df62fd8d06bd436d92d7a197443f
MD5 58a2b9e977b5478bed518a02c2ef4308
BLAKE2b-256 598a9c262564995b9d5bbc63fffac9a00bd8314a5e57b6ad48c3bc1013ff7d80

See more details on using hashes here.

File details

Details for the file syft_rpc-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: syft_rpc-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for syft_rpc-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b923355f24c726b9af3c98c24239cf6c6d5f339903f1bcd1c7a6ea6b3991b35c
MD5 1e9592be9cd67c6f81ab95ae38eb1baf
BLAKE2b-256 9bf8f7123c3e07edb2333a742dac638fed5b0716da8cde70a180d54ac10f2106

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page