Distributed MoE inference network — the load is split, friends help
Project description
Sawyer — Distributed MoE Inference Network
Status: Active prototype — Provider onboarding and APIs are evolving. Sawyer is under active development toward an alpha milestone.
"The load is split. Friends help."
Named for Tom Sawyer, who turned an impossible chore into a community effort by making participation irresistible. Sawyer turns GPU inference — a credit-draining trap — into a distributed network where each node carries a piece of the load, and everyone benefits.
Sawyer does not require providers to host full models. Providers host isolated MoE expert workloads that the router activates only when needed. That is why Sawyer is not just another distributed inference project — it distributes only the sparse, independently activated sub-networks that MoE architectures make possible.
Built on Bedrock for node identity, consent-gated routing, and auditability. Sawyer runs on Bedrock. Sawyer does not own Bedrock.
Use Cases
For the Developer with a Laptop
You're building an app that calls LLM APIs. GPT-4 costs $0.03/1K tokens. Claude Haiku is cheaper but still adds up. Sawyer gives you 500K tokens for $5/mo -- roughly $0.01/1K tokens -- with chat and code models available from day one.
bash / Git Bash:
pip install sawyer-core
sawyer run # One command, everything starts
sawyer run glm-5.1:cloud # Pick a model
PowerShell:
pip install sawyer-core
sawyer run # One command, everything starts
sawyer run glm-5.1:cloud # Pick a model
No GPU required. Your laptop connects to the network and inference happens on nodes that have the hardware. You just type.
For the Gamer with a 4090
Your gaming rig sits idle most of the day. Sawyer puts those GPUs to work. You host expert weights and serve inference to the network. The more you contribute, the more you earn -- your 4090 on Tier 4 earns 4x what a laptop on Tier 1 earns per token.
bash / Git Bash:
pip install sawyer-core
sawyer serve # Host experts, start earning
sawyer serve --model mixtral # Pick a model to serve
sawyer status # Check your earnings
PowerShell:
pip install sawyer-core
sawyer serve # Host experts, start earning
sawyer serve --model mixtral # Pick a model to serve
sawyer status # Check your earnings
Tier 4 (24GB+ VRAM) can host any model's experts. Tier 1 (4GB) can still participate with Qwen1.5-MoE experts. Everyone earns.
For the Small Team
Your startup needs inference but can't justify GPU costs. Subscribe at the Builder tier ($20/mo, 2M tokens) and route all your calls through Sawyer. Your inference cost drops by 3-6x compared to major API providers. No rate limits, no surprise bills.
For the Hobbyist
You want to experiment with local models but only have one GPU. Sawyer lets you download experts, serve them locally or to the network, and use the chat client to interact. Free tier with 50K tokens to start, or subscribe for more.
How You Earn
Sawyer is not volunteer computing. You get paid for the compute you contribute.
The Pool
Every quarter, 70% of all subscription revenue goes into the provider pool:
Provider Pool = Total Subscribers x $5/month x 3 months x 70%
With 100 subscribers:
- Revenue: $1,500/quarter
- Provider pool: $1,050
- Platform: $450
With 1,000 subscribers:
- Revenue: $15,000/quarter
- Provider pool: $10,500
- Platform: $4,500
The pool grows as the network grows. More subscribers means more money for everyone.
Distribution
The pool is split two ways:
| Share | What | Why |
|---|---|---|
| 90% | Throughput | Tokens served x tier multiplier — the work you do |
| 10% | Uptime | Just being available matters, even if traffic is low |
Hardware Tiers
Your earnings depend on what you bring to the network. A 4090 earns more per token than a 1060 because it can host larger, more valuable experts.
| Tier | VRAM | Multiplier | Can Host | Monthly Estimate* |
|---|---|---|---|---|
| Tier 1 | 4 GB | 1x | Qwen1.5-MoE experts (0.5 GB) | $5-15 |
| Tier 2 | 8 GB | 2x | + DeepSeek-V2 experts (0.8 GB) | $15-40 |
| Tier 3 | 12 GB | 3x | + Mixtral experts (1.5 GB) | $30-80 |
| Tier 4 | 24 GB+ | 4x | All experts, local models | $60-200+ |
*Estimates based on 100 subscribers, varies with network size and utilization.
Same 100K tokens served:
- Tier 1 laptop: 100K x 1x = 100K weighted
- Tier 4 monster: 100K x 4x = 400K weighted
The monster PC earns 4x for the same token count because it invested in hardware that can do more for the network.
Example: The Kid with the Monster PC
100 subscribers, one quarter:
| Provider | Tokens | Uptime | Tier | Weighted | Token Earnings | Uptime Earnings | Total |
|---|---|---|---|---|---|---|---|
| Monster PC (4090) | 500K | 720h | Tier 4 (4x) | 2,000,000 | $893 | $35 | $928 |
| Gaming rig (3060) | 200K | 500h | Tier 3 (3x) | 600,000 | $268 | $24 | $292 |
| Midrange (1060) | 50K | 300h | Tier 1 (1x) | 50,000 | $22 | $15 | $37 |
The kid with the 4090 doing most of the work gets the biggest slice. That's the point.
Payouts
- Quarterly: January-March, April-June, July-September, October-December
- Minimum payout: $25. Below that, your earnings roll over to next quarter
- Methods: Stripe Connect (primary), PayPal
- Failed payouts roll over — nobody loses money
- Providers must complete Stripe Connect onboarding to receive payouts. Earnings accumulate but cannot be disbursed until Stripe onboarding is complete. Run
sawyer provider onboarding <id>to get started.
Pricing
| Tier | Price | Tokens | Per 1K Tokens | Best For |
|---|---|---|---|---|
| Explorer | $5/mo | 500K | $0.01 | Prototyping, experimentation |
| Builder | $20/mo | 2M | $0.01 | Development, testing |
| Operator | $50/mo | 5M | $0.01 | Production workloads |
~3-6x cheaper than GPT-4, roughly in line with Claude Haiku pricing. No rate limits. No surprise bills. Token budget resets monthly, unused tokens roll over (max 1 month).
Supported Models
Use sawyer models to list available models, or filter by use case:
bash / Git Bash:
sawyer models # All models
sawyer models --use chat # Chat-focused models
sawyer models --use code # Code-focused models
PowerShell:
sawyer models # All models
sawyer models --use chat # Chat-focused models
sawyer models --use code # Code-focused models
| Model | Params | Experts | Active/Token | Q4 Size | Expert Size | Best For |
|---|---|---|---|---|---|---|
| Mixtral 8x7B | 46.7B | 8 | 2 | ~24 GB | ~1.5 GB | Chat, Code |
| DeepSeek-V2 Lite | 15.7B | 64 | 6 | ~9 GB | ~0.8 GB | Chat |
| Qwen1.5-MoE | 14.3B | 60 | 4 | ~7 GB | ~0.5 GB | Chat (lightweight) |
| DBRX Instruct | 132B | 16 | 4 | ~65 GB | ~2.5 GB | Code |
Architecture
[User/Client]
|
v
[Sawyer Router] <-- Bedrock identity, consent-gated routing
|
+--> [Node: Expert 0] (RTX 4090, Dallas) Tier 4 - 4x earnings
+--> [Node: Expert 2] (A100, Frankfurt) Tier 4 - 4x earnings
+--> [Node: Expert 5] (RTX 3060, Tokyo) Tier 3 - 3x earnings
+--> [Node: Expert 7] (GTX 1060, Sao Paulo) Tier 1 - 1x earnings
|
v
[Aggregated Output] --> User
Every node earns proportional to its hardware contribution. The 4090 in Dallas earns 4x per token because it can host the experts the network needs most.
Core Modules
1. sawyer/router/ — Expert Router
- Receives token embeddings from the user's local dense layers
- Routes to the correct expert(s) based on the model's gating network
- Aggregates expert outputs, returns to user
- Tracks latency per node, falls back to redundant experts on timeout
2. sawyer/node/ — Node Agent
- Registers with the network via Bedrock node identity
- Advertises capabilities: GPU model, VRAM, bandwidth, latency
- Hosts one or more expert weight files
- Serves inference requests via encrypted gRPC/QUIC
- Reports health and throughput to the router
3. sawyer/token/ — Token Economics
- $5/mo subscription grants 500K tokens
- Tokens debit per inference request (input + output tokens)
- Token budget resets monthly, rolls over unused tokens (max 1 month)
- Provider pool = subscribers x $5 x months x 70%
- 90% distributed by weighted throughput, 10% by uptime
- Quarterly payouts, $25 minimum, rollover below threshold
4. sawyer/provider/ — Provider Economics & Dashboard
- Node Tiers: 4 hardware tiers (4GB/8GB/12GB/24GB+) with 1x-4x earnings multipliers
- Revenue Pool: 70% of subscription revenue distributed to providers quarterly
- Quarterly Payout: Pool split by weighted contribution, $25 minimum, Stripe/PayPal
- Rollover: Below-threshold earnings carry forward — nobody loses money
- Dashboard: Real-time web UI at
http://localhost:8000/when serving — see tokens served, earnings, uptime, model breakdown, daily/weekly stats
5. sawyer/identity/ — Bedrock Integration
- Every node holds a Bedrock cryptographic identity
- Router verifies node certificates before routing
- Consent tokens gate which models a node will serve
- Audit chain logs every inference request for compliance
6. sawyer/model/ — Model Registry
- Catalog of supported MoE models tagged by use case (chat, code)
- Expert weight files versioned and checksummed
- Nodes download experts on registration or on-demand
- Supports Mixtral 8x7B, DeepSeek-V2, Qwen MoE, DBRX
Protocol
1. Node registers with Sawyer network
--> Bedrock identity issued (certificate, scope, audit chain)
--> Node advertises: GPU, VRAM, bandwidth, experts available
--> Tier classification: Tier 1-4 based on VRAM
2. User sends inference request
--> Sawyer router authenticates user (token balance check)
--> Router runs gating network locally to select experts
--> Router sends expert activation request to node(s)
--> Node validates consent token, runs expert forward pass
--> Node returns expert output, logs to audit chain
--> Router aggregates, returns to user
--> Token balance debited
3. Quarterly settlement
--> Provider pool = subscribers x $5 x 3 months x 70%
--> 90% distributed by weighted throughput (tokens x tier multiplier)
--> 10% distributed by uptime
--> Payouts >= $25 via Stripe Connect
--> Below $25 rolls over to next quarter
Installation
Requires Python 3.11 or later.
bash / Git Bash:
pip install sawyer-core
PowerShell:
pip install sawyer-core
For GPU inference (hosting expert nodes):
bash / Git Bash:
pip install sawyer-core[inference]
PowerShell:
pip install "sawyer-core[inference]"
Note: vllm and llama-cpp-python require CUDA and a C++ compiler. If installation fails, install them separately following their docs, then install sawyer-core without extras.
Or install from source for development:
bash / Git Bash:
git clone https://github.com/drc10101/sawyer.git
cd sawyer
pip install -e ".[dev]"
PowerShell:
git clone https://github.com/drc10101/sawyer.git
cd sawyer
pip install -e ".[dev]"
Running
After install, Sawyer can be run either way:
bash / Git Bash:
sawyer serve # if Python Scripts is on PATH
python -m sawyer serve # works everywhere, no PATH needed
PowerShell:
sawyer serve # if Python Scripts is on PATH
python -m sawyer serve # works everywhere, no PATH needed
One Command to Start Everything
sawyer run starts Ollama (if needed), the Sawyer router, and your agent -- one command, entire workflow:
bash / Git Bash:
sawyer run # Auto-detect best model, start everything
sawyer run glm-5.1:cloud # Use a specific model
sawyer run --no-agent # Start Sawyer only, don't launch agent
sawyer run --no-browser # Don't open browser
sawyer run --agent cursor # Launch Cursor instead of Hermes
PowerShell:
sawyer run # Auto-detect best model, start everything
sawyer run glm-5.1:cloud # Use a specific model
sawyer run --no-agent # Start Sawyer only, don't launch agent
sawyer run --no-browser # Don't open browser
sawyer run --agent cursor # Launch Cursor instead of Hermes
What sawyer run does:
- Detects Ollama -- starts it if not running
- Discovers models -- lists what's available with sizes
- Starts Sawyer router -- OpenAI-compatible API on port 8000
- Prints config -- exact copy-paste lines for your agent
- Opens browser -- chat UI at
http://localhost:8000 - Launches agent -- Hermes by default, configurable via
--agent
If you just want the chat UI and API:
bash / Git Bash:
sawyer chat # Web UI + OpenAI-compatible API
sawyer chat --ollama-bridge # Also serve local Ollama to the network
PowerShell:
sawyer chat # Web UI + OpenAI-compatible API
sawyer chat --ollama-bridge # Also serve local Ollama to the network
Provider Dashboard
When serving, Sawyer hosts a real-time dashboard at http://localhost:8000/. You'll see:
- Daily token chart — bar graph of tokens served over the last 7 days
- Weekly summary — total tokens, earnings, uptime, average daily tokens
- Daily breakdown table — date, tokens, requests, latency, uptime, earnings, errors
- Model breakdown — which models you served tokens for
- Expert breakdown — which expert shards you ran
- Tier badge — your hardware tier and earnings multiplier (1x-4x)
- Payout info — available balance, total earned, total paid, next payout date
API endpoints for programmatic access:
GET /api/stats — JSON stats for the current provider
GET /api/stats/{id} — JSON stats for a specific provider
The dashboard auto-refreshes every 30 seconds.
One-Click Install (Windows)
Double-click sawyer.bat in the repo, or run the PowerShell installer for a desktop shortcut and Start Menu entry:
irm https://infill.systems/install/sawyer.ps1 | iex
To uninstall: .\install_sawyer.ps1 -Uninstall
Why It Works
- MoE is more distributable than dense inference. Experts are independent sub-networks. Unlike tensor parallelism (which splits a single matrix across GPUs), each expert runs its own forward pass. MoE is more distributable than dense tensor-parallel inference because experts are independently activated, but Sawyer's core engineering challenge is keeping routing, expert execution, and aggregation fast enough to feel local.
- Sparsity means efficiency. Only ~25% of parameters activate per token on Mixtral. The network doesn't pay for dormant compute.
- Quantized models fit on consumer hardware. Q4_K_M Mixtral expert ~1.5GB. A 3090 can host 2-3 experts comfortably alongside other workloads.
- $5/mo is the sweet spot. Below the psychological barrier of "another subscription." Enough tokens to prototype, test, and run real workloads. Revenue sustains the network without extracting from users.
- Hardware investment is rewarded. Tier 4 (24GB+) earns 4x per token compared to Tier 1 (4GB). Your 4090 pays for itself.
Agent Integration
Sawyer exposes an OpenAI-compatible /v1/chat/completions endpoint. Any agent framework that supports custom OpenAI base URLs can use Sawyer as its LLM backend -- no SDK changes needed.
Quick Start
bash / Git Bash:
sawyer run # Starts Ollama + Sawyer + Hermes
PowerShell:
sawyer run # Starts Ollama + Sawyer + Hermes
Manual Configuration
If you prefer to configure agents manually:
Hermes (bash):
hermes config set model.base_url http://localhost:8000/v1
hermes config set model.provider openai_compatible
hermes config set model.default glm-5.1:cloud
Hermes (PowerShell):
hermes config set model.base_url "http://localhost:8000/v1"
hermes config set model.provider openai_compatible
hermes config set model.default "glm-5.1:cloud"
Claude Code (bash):
OPENAI_API_KEY=sawyer OPENAI_BASE_URL=http://localhost:8000/v1 claude
Claude Code (PowerShell):
$env:OPENAI_API_KEY="sawyer"; $env:OPENAI_BASE_URL="http://localhost:8000/v1"; claude
Cursor / Continue / Aider (bash):
export OPENAI_API_KEY=sawyer
export OPENAI_BASE_URL=http://localhost:8000/v1
Cursor / Continue / Aider (PowerShell):
$env:OPENAI_API_KEY="sawyer"
$env:OPENAI_BASE_URL="http://localhost:8000/v1"
Python (any shell):
from openai import OpenAI
client = OpenAI(
api_key="sawyer",
base_url="http://localhost:8000/v1",
)
response = client.chat.completions.create(
model="glm-5.1:cloud",
messages=[{"role": "user", "content": "Hello"}],
)
curl (bash):
curl http://localhost:8000/v1/chat/completions \
-d '{"model":"glm-5.1:cloud","messages":[{"role":"user","content":"hello"}]}' \
-H "Content-Type: application/json"
curl (PowerShell):
Invoke-RestMethod -Uri "http://localhost:8000/v1/chat/completions" `
-Method Post -ContentType "application/json" `
-Body '{"model":"glm-5.1:cloud","messages":[{"role":"user","content":"hello"}]}'
Supported Frameworks
Hermes, OpenClaw, Claude Code, Cursor, Continue, Aider, Cline, LangChain, LlamaIndex, CrewAI, AutoGPT, and any other OpenAI-compatible client.
Full integration guides with config examples for every framework: docs/agent-integration.md
Dependencies
- Bedrock (infill-bedrock): Node identity, consent tokens, audit chain
- vLLM / llama.cpp: Expert inference backend
- gRPC / QUIC: Low-latency inter-node communication
- Stripe: Subscription and host payout management
- HuggingFace Hub: Model weight distribution
License
BSL-1.1 — free for non-production use. Production use requires a paid license. Converts to Apache 2.0 after the change date.
Alpha milestone: Single-router, two-node demo with one toy MoE model — real node registration, real health checks, real routing logs, real tier-weighted earnings. Prove the network behavior first, then graduate to larger quantized MoE weights.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sawyer_core-0.4.0.tar.gz.
File metadata
- Download URL: sawyer_core-0.4.0.tar.gz
- Upload date:
- Size: 196.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e1fbc667d30403c045768e368b8c86b4e6968d067c05083461a68b821367ff9
|
|
| MD5 |
5f5dc5d30bd45132cb92c9b0dcc29f45
|
|
| BLAKE2b-256 |
16547aeabb748dea5ccc7eb6db8eec279e82e503f164a354334d3223684a95e6
|
File details
Details for the file sawyer_core-0.4.0-py3-none-any.whl.
File metadata
- Download URL: sawyer_core-0.4.0-py3-none-any.whl
- Upload date:
- Size: 165.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dff1e527f2e5d7ba9f1f928665eefae99f8f12c36571e8adacd35598cab28da2
|
|
| MD5 |
e1265b5052bbc8b5d7cc3662ca13b7d8
|
|
| BLAKE2b-256 |
b91612eba46e8a7c3e8368bcda5bf8c12d35f168d6e6a7924ee41db70cade8c1
|