Doing remote data science with SyftBox
Project description
RDS
Requirements
Quick Install
Available on Pypi. Install with
uv pip install syft-rds
Or you can clone the repo and set the dev Python environment with all dependencies:
just setup
Getting Started
Run the Demo
The notebook notebooks/quickstart/full_flow.ipynb contains a complete example of the RDS workflow from both the Data Owner (DO) and Data Scientist (DS) perspectives.
This demo uses a mock in-memory stack that simulates SyftBox functionality locally - no external services required.
To run the demo:
just jupyter
Then open notebooks/quickstart/full_flow.ipynb and run through the cells.
The demo covers a basic remote data science workflow:
- Data Owner creates a dataset with private and mock (public) data
- Data Scientist explores available datasets (can only see mock data)
- Data Scientist submits code to run on private data
- Data Owner reviews and runs the code on private data
- Data Owner shares the results
- Data Scientist views the output
Private Dataset Storage
Storage Locations
Private datasets are stored in ~/.syftbox/private_datasets/<email>/<dataset-name>/ and are NEVER synced to the SyftBox relay server. This ensures true client-side privacy - your private data never leaves your machine.
Mock (public) datasets are stored in ~/SyftBox/datasites/<email>/public/datasets/ and ARE synced to the relay server, allowing other users to explore your dataset structure and submit job requests.
~/.syftbox/
private_datasets/
your-email@example.com/
my-dataset/ # Your private data (NEVER synced anywhere)
data.csv
...
~/SyftBox/
datasites/
your-email@example.com/
public/
datasets/
my-dataset/ # Your mock data (synced to the SyftBox server and other datasites)
mock_data.csv
README.md
Migration from v0.4.x
⚠️ BREAKING CHANGE in syft-rds v0.5.0
If you have existing datasets created with syft-rds v0.4.x, you'll need to recreate them:
- Note your existing dataset names
- Upgrade to syft-rds v0.5.0+
- Re-create datasets using the same private data source
- The new version will automatically use the new location
Old private data in datasites/<email>/private/ will not interfere and will be automatically cleaned up when you delete the datasets.
Development
Running Tests
# Run all tests
just test
# Run specific test suites
just test-unit
just test-integration
just test-notebooks
Building
# Build the wheel package
just build
# Bump version (patch/minor/major)
just bump patch
Cleaning Up
Remove generated files and directories:
just clean
Available Commands
See all available commands:
just --list
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file syft_rds-0.5.0.tar.gz.
File metadata
- Download URL: syft_rds-0.5.0.tar.gz
- Upload date:
- Size: 185.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cfff8978919f705690b86a59e81ca7765c9c8582e20b4e14837fecf6a87b5cf9
|
|
| MD5 |
d6a07f46413f304c7b24bbf2f57ca5af
|
|
| BLAKE2b-256 |
de66afb2abc9616194ff818fab54033468239fb19606b12b81c82b0c690e43dd
|
File details
Details for the file syft_rds-0.5.0-py3-none-any.whl.
File metadata
- Download URL: syft_rds-0.5.0-py3-none-any.whl
- Upload date:
- Size: 220.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b93236b74a2ee3873013c98310237698199378f925ff3c2788e02fbfbf7f96f
|
|
| MD5 |
aa214140f45a72ddfda03f8e88b41ba7
|
|
| BLAKE2b-256 |
4f3e8b258be7b2232c6196630e80d6e31a20db45ef49fd982a63c6ae7e9e09fb
|