Python SDK for discovering and using AI models across the SyftBox network

These details have not been verified by PyPI

Project links

Project description

Syft Hub SDK

Build Federated RAG and tap into distributed data sources without centralizing knowledge.

LLMs often fail on domain-specific questions, not from lack of capability, but from missing access to expert data. RAG extends their reach with external context, but only if you already own the data.

🚀 Quick Start: Federated RAG

from syft_hub import Client

client = Client()

# Choose data sources from the network
hacker_news_source = client.load_service("demo@openmined.org/hacker-news")
arxiv_source = client.load_service("demo@openmined.org/arxiv-agents")
github_source = client.load_service("demo@openmined.org/trending-github")

# Choose an LLM to synthesize insights
claude_llm = client.load_service("aggregator@openmined.org/claude-3.5-sonnet")

# Create a Federated RAG pipeline
fedrag_pipeline = client.pipeline(
    data_sources=[hacker_news_source, arxiv_source, github_source],
    synthesizer=claude_llm
)

# Run your query across the network
query = "What methods can help improve context in LLM agents?"
result = fedrag_pipeline.run(messages=[{"role": "user", "content": query}])

print(result)

What just happened?

Each data source was queried on its own infrastructure (no data centralization)
Only relevant snippets were retrieved and shared
The LLM synthesized insights from multiple sources into one answer
Data owners maintained full control and privacy

📦 Installation

# Basic installation
pip install syft-hub

For Jupyter/Colab: Make sure Syft runtime is available:

!pip install syfthub syft-installer
import syft_installer as si

# Make sure Syft runtime is running
si.install_and_run_if_needed()

Outside Jupyter/Colab: Donwload and run distributed protocol, SyftBox.

💡 Why Federated RAG?

Traditional approaches to domain-specific AI have a fundamental flaw: data owners must hand over their raw data and lose control. This introduces legal, privacy, and intellectual property risks.

The result? Most organizations say no, and AI stays limited to public training data.

Federated RAG solves this by letting AI "walk the halls" and consult distributed data sources without centralizing knowledge:

🔒 Privacy-preserving: Data stays where it belongs
🌐 Distributed: Query multiple sources in one pipeline
⚡ Selective sharing: Only relevant snippets are returned
🎯 Domain expertise: Access specialized knowledge networks

Think of it like a student gathering input from multiple teachers (biology, law, ethics professors) rather than studying alone—the result is far richer.

🎯 Core Concepts

1. Data Sources

Data sources are federated peers that own their data. They don't ship it to you — you query them at runtime.

# Load data sources from the network
hacker_news = client.load_service("demo@openmined.org/hacker-news")
arxiv_papers = client.load_service("demo@openmined.org/arxiv-agents")

2. Synthesizers (LLMs)

Synthesizers take insights from multiple data sources and combine them into coherent answers.

# Load an LLM for generation
claude = client.load_service("aggregator@openmined.org/claude-3.5-sonnet")

3. Pipelines

Pipelines orchestrate federated queries across data sources and route results to synthesizers.

# Create a pipeline
pipeline = client.pipeline(
    data_sources=[source1, source2, source3],
    synthesizer=llm
)

# Run queries
result = pipeline.run(messages=[{"role": "user", "content": "Your question"}])

🌐 How It Works

Distributed Indexing: Each data source maintains its own private index (embeddings of their documents)
Federated Query: When you run a pipeline, your query is broadcast to selected data sources
Local Retrieval: Each source searches its own index and returns only the top-k most relevant snippets
Aggregation: The pipeline collects all snippets and ranks them globally
Synthesis: The LLM receives the best snippets and generates a grounded answer

Key insight: Raw data never leaves the source. Only relevant snippets are shared based on semantic similarity to your query.

📚 Examples

Example 1: Multi-Source Domain Expertise

Query specialized knowledge across different domains:

from syft_hub import Client

client = Client()

# Load domain-specific sources
medical_db = client.load_service("<medical_institution>/medical-research")
pharma_trials = client.load_service("<pharma_company>/clinical-trials")
patient_notes = client.load_service("<hospital_X>/doctor-notes")

# Load synthesizer
gpt4 = client.load_service("aggregator@openmined.org/gpt-4")

# Create pipeline
medical_rag = client.pipeline(
    data_sources=[medical_db, pharma_trials, patient_notes],
    synthesizer=gpt4
)

# Ask domain-specific questions
query = "Is drug X safe for diabetes patients with kidney disease?"
answer = medical_rag.run(messages=[{"role": "user", "content": query}])

print(answer)

Example 2: Single Source RAG

Not every use case needs multiple sources:

# Query a single specialized source
company_docs = client.load_service("yourcompany@example.com/internal-docs")
assistant = client.load_service("aggregator@openmined.org/claude-3.5-sonnet")

rag_pipeline = client.pipeline(
    data_sources=[company_docs],
    synthesizer=assistant
)

result = rag_pipeline.run(
    messages=[{"role": "user", "content": "What's our remote work policy?"}]
)
```---

## 🔍 Service Discovery

Discover available data sources and LLMs on the network:

```python
# List all services
client.list_services()

# List services by type
chat_services = client.list_services(
    service_type="chat",
    tags=["opensource"],
    max_cost=0.10
)

search_services = client.list_services(
    service_type="search",
    datasite="yourcompany@example.com"
)

💰 Setup accounting for paid models

Each new user received $20 upon registration (but hey, most are anyway free!)

Preview and manage costs for federated queries:

# Setup accounting for paid services
await client.register_accounting(
    email="your@email.com",
    password="your_password"
)

# Check account balance
info = await client.get_account_info()
print(f"Balance: ${info['balance']}")

# Get cost estimate for multi-source RAG
estimate = pipeline.estimate_cost()

🏥 Health Monitoring

Monitor service availability and performance:

# Check single service health
status = await client.check_service_health("service-name", timeout=5.0)

# Start continuous monitoring
monitor = client.start_health_monitoring(
    services=["service1", "service2"],
    check_interval=30.0
)

📖 API Reference

Client Methods

Method	Description
`load_service(identifier)`	Load a data source or LLM from the network
`pipeline(data_sources, synthesizer)`	Create a Federated RAG pipeline
`list_services(service_type, ...)`	Discover available services
`chat("datasite/service_name", messages, ...)`	Direct chat with an LLM
`search("datasite/service_name", query, ...)`	Search a data source
`register_accounting(email, password, ...)`	Register to use paid services
`connect_accounting(email, password, ...)`	Setup existing account for paid services

Pipeline Methods

Method	Description
`run(messages)`	Execute federated query and synthesize results

Response Objects

ChatResponse:

response.message.content    # The AI's response
response.cost              # Cost of the request
response.usage.total_tokens # Tokens used
response.service           # Service name

SearchResponse:

response.results           # List of search results
response.cost              # Cost of the request
response.query             # Original query

# Individual result
result.content             # Document content
result.score              # Similarity score (0-1)
result.metadata           # Additional metadata

🤝 Contributing

Contributions are welcome! This SDK is part of the broader SyftBox ecosystem for privacy-preserving AI.

📄 License

See LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.32

Nov 22, 2025

0.1.31

Nov 22, 2025

0.1.30

Oct 24, 2025

0.1.29

Oct 24, 2025

0.1.28

Oct 21, 2025

0.1.27

Oct 7, 2025

0.1.26

Oct 5, 2025

This version

0.1.25

Oct 5, 2025

0.1.24

Oct 5, 2025

0.1.23

Oct 5, 2025

0.1.22

Oct 5, 2025

0.1.21

Oct 5, 2025

0.1.20

Sep 29, 2025

0.1.19

Sep 29, 2025

0.1.18

Sep 29, 2025

0.1.17

Sep 25, 2025

0.1.16

Sep 25, 2025

0.1.15

Sep 25, 2025

0.1.14

Sep 25, 2025

0.1.13

Sep 25, 2025

0.1.12

Sep 25, 2025

0.1.11

Sep 25, 2025

0.1.10

Sep 25, 2025

0.1.9

Sep 25, 2025

0.1.7

Sep 25, 2025

0.1.5

Sep 25, 2025

0.1.4

Sep 25, 2025

0.1.2

Sep 21, 2025

0.1.1

Sep 21, 2025

0.1.0

Sep 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syft_hub-0.1.25.tar.gz (141.3 kB view details)

Uploaded Oct 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

syft_hub-0.1.25-py3-none-any.whl (156.8 kB view details)

Uploaded Oct 5, 2025 Python 3

File details

Details for the file syft_hub-0.1.25.tar.gz.

File metadata

Download URL: syft_hub-0.1.25.tar.gz
Upload date: Oct 5, 2025
Size: 141.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for syft_hub-0.1.25.tar.gz
Algorithm	Hash digest
SHA256	`82ba1bce288c2056982aec5c2c6f226330c4f39aeaa9d8e71e9c1b597c09fba5`
MD5	`c6f5c2d36d10991b49e39e14427242ae`
BLAKE2b-256	`e6a5d4f742eaa910b0d79fca2ef8bc2d72d29520650482b164dfdadc5aba0cc0`

See more details on using hashes here.

File details

Details for the file syft_hub-0.1.25-py3-none-any.whl.

File metadata

Download URL: syft_hub-0.1.25-py3-none-any.whl
Upload date: Oct 5, 2025
Size: 156.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for syft_hub-0.1.25-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5f897d9d1e83ab1d4af40c24e1a25921c47277e1dfa0c68446f93f961fb26ab5`
MD5	`5544705a262555b07b5cc1ab1f2a3bbf`
BLAKE2b-256	`af7baae7da00e3739f4fc78cb498b1ed081b9b833b6ac0f9b918ad113680f6ff`

See more details on using hashes here.

syft-hub 0.1.25

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Syft Hub SDK

🚀 Quick Start: Federated RAG

📦 Installation

💡 Why Federated RAG?

🎯 Core Concepts

1. Data Sources

2. Synthesizers (LLMs)

3. Pipelines

🌐 How It Works

📚 Examples

Example 1: Multi-Source Domain Expertise

Example 2: Single Source RAG

💰 Setup accounting for paid models

🏥 Health Monitoring

📖 API Reference

Client Methods

Pipeline Methods

Response Objects

🤝 Contributing

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes