Skip to main content

Doing remote data science with SyftBox

Project description

RDS

Requirements

Quick Install

Available on Pypi. Install with

uv pip install syft-rds

Or you can clone the repo and set the dev Python environment with all dependencies:

just setup

Getting Started

Run the Demo

The notebook notebooks/quickstart/full_flow.ipynb contains a complete example of the RDS workflow from both the Data Owner (DO) and Data Scientist (DS) perspectives.

This demo uses a mock in-memory stack that simulates SyftBox functionality locally - no external services required.

To run the demo:

just jupyter

Then open notebooks/quickstart/full_flow.ipynb and run through the cells.

The demo covers a basic remote data science workflow:

  1. Data Owner creates a dataset with private and mock (public) data
  2. Data Scientist explores available datasets (can only see mock data)
  3. Data Scientist submits code to run on private data
  4. Data Owner reviews and runs the code on private data
  5. Data Owner shares the results
  6. Data Scientist views the output

Private Dataset Storage

Storage Locations

Private datasets are stored in ~/.syftbox/private_datasets/<email>/<dataset-name>/ and are NEVER synced to the SyftBox relay server. This ensures true client-side privacy - your private data never leaves your machine.

Mock (public) datasets are stored in ~/SyftBox/datasites/<email>/public/datasets/ and ARE synced to the relay server, allowing other users to explore your dataset structure and submit job requests.

~/.syftbox/
  private_datasets/
    your-email@example.com/
      my-dataset/           # Your private data (NEVER synced anywhere)
        data.csv
        ...

~/SyftBox/
  datasites/
    your-email@example.com/
      public/
        datasets/
          my-dataset/         # Your mock data (synced to the SyftBox server and other datasites)
            mock_data.csv
            README.md

Migration from v0.4.x

⚠️ BREAKING CHANGE in syft-rds v0.5.0

If you have existing datasets created with syft-rds v0.4.x, you'll need to recreate them:

  1. Note your existing dataset names
  2. Upgrade to syft-rds v0.5.0+
  3. Re-create datasets using the same private data source
  4. The new version will automatically use the new location

Old private data in datasites/<email>/private/ will not interfere and will be automatically cleaned up when you delete the datasets.

Development

Running Tests

# Run all tests
just test

# Run specific test suites
just test-unit
just test-integration
just test-notebooks

Building

# Build the wheel package
just build

# Bump version (patch/minor/major)
just bump patch

Cleaning Up

Remove generated files and directories:

just clean

Available Commands

See all available commands:

just --list

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syft_rds-0.5.0.tar.gz (185.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

syft_rds-0.5.0-py3-none-any.whl (220.4 kB view details)

Uploaded Python 3

File details

Details for the file syft_rds-0.5.0.tar.gz.

File metadata

  • Download URL: syft_rds-0.5.0.tar.gz
  • Upload date:
  • Size: 185.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for syft_rds-0.5.0.tar.gz
Algorithm Hash digest
SHA256 cfff8978919f705690b86a59e81ca7765c9c8582e20b4e14837fecf6a87b5cf9
MD5 d6a07f46413f304c7b24bbf2f57ca5af
BLAKE2b-256 de66afb2abc9616194ff818fab54033468239fb19606b12b81c82b0c690e43dd

See more details on using hashes here.

File details

Details for the file syft_rds-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: syft_rds-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 220.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for syft_rds-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4b93236b74a2ee3873013c98310237698199378f925ff3c2788e02fbfbf7f96f
MD5 aa214140f45a72ddfda03f8e88b41ba7
BLAKE2b-256 4f3e8b258be7b2232c6196630e80d6e31a20db45ef49fd982a63c6ae7e9e09fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page