Peer-to-peer distributed inference for open-source language models
Project description
Language Pipes
Peer-to-peer distributed inference for open-source language models
Language Pipes is an open-source distributed inference system built on the transformers library that splits large language model computation across multiple machines. By separating the model's text-handling components (embedding and output head) from its intermediate transformer layers, Language Pipes enables peer-to-peer inference.
Features
- OpenAI-compatible API
- Interactive setup wizard
- Automatic model download by HuggingFace ID
- Privacy-oriented architecture with layered privacy mitigations
- Decentralized peer-to-peer network with optional AES encryption
How It Works
Language models process input through a sequence of transformer layers. Each layer performs matrix multiplications between learned weights and a hidden state tensor, passing the result to the next layer. Language Pipes distributes these layers across machines, splitting the memory cost across the network while keeping the text-handling components on the origin node.
The architecture provides architectural separation: layer models operate on continuous-valued tensors rather than discrete text while the end models keep text data on trusted systems. The privacy documentation provides a probabilistic threat model that quantifies the difficulty of known inversion attacks under various mitigation configurations.
Further reading:
- Architecture Overview: runtime components and inference flow
- Job Processor State Machine: how jobs traverse the distributed pipeline
Installation
Requires Python 3.10+. For GPU support, install the appropriate PyTorch version for your CUDA configuration:
https://pytorch.org/get-started/locally/
Install from pip:
pip install language-pipes
Quick Start
The easiest way to get started is with the interactive setup wizard:
language-pipes
This launches a menu where you can create, view, and load configurations. Select Create Config to walk through the setup wizard, which guides you through your first configuration. After creating a config, select Load Config to start the server.
Configuration can also be specified via TOML files. See the CLI reference for details on loading configurations from the command line.
Two Node Example
This example shows how to distribute a model across two computers using the interactive wizard.
Node 1 (First Computer)
Start language pipes:
language-pipes
| Prompt | Value | Description |
|---|---|---|
| Node ID | node-1 |
Unique identifier for this node on the network |
| Add layer models | Y |
Starts layer model editor |
| Model ID | Qwen/Qwen3-1.7B |
HuggingFace model to load for layers |
| Device | cpu |
Hardware to run inference on |
| Max memory | 1 |
GB of RAM to use (loads part of the model) |
| Add end model | Y |
starts end node editor |
| Model ID | Qwen/Qwen3-1.7B |
Huggingface model to load for ends |
| Number of local layers | 1 | Ensure first layer is loaded on your machine |
| Enable OpenAI API | Y |
Exposes the OpenAI-compatible endpoint |
| API port | 8000 |
Port for the API server |
| First node in network | Y |
This node starts the network |
| Peer port | 5000 |
Port for peer-to-peer communication |
| Network IP | [Your local IP] |
IP address other nodes can find you at |
| Encrypt network traffic | N |
Disable encryption for simplicity |
| Advanced options | N |
Logging, security, etc. |
Node 2 (Second Computer)
Start language pipes with this command:
language-pipes
| Prompt | Value | Description |
|---|---|---|
| Node ID | node-2 |
Unique identifier for this node on the network |
| Add layer models | Y |
Starts layer model editor |
| Model ID | Qwen/Qwen3-1.7B |
HuggingFace model to load for layers |
| Device | cpu |
Hardware to run inference on |
| Max memory | 2 |
GB of RAM to use (loads part of the model) |
| Add end model | N |
starts end node editor |
| Enable OpenAI API | Y |
Exposes the OpenAI-compatible endpoint |
| API port | 8000 |
Port for the API server |
| First node in network | N |
This node starts the network |
| Bootstrap Address | [node-1 IP] |
IP address of node-1 |
| Peer port | 5000 |
Port for peer-to-peer communication |
| Encrypt network traffic | N |
Disable encryption for simplicity |
| Advanced options | N |
Logging, security, etc. |
Node-2 connects to node-1 and loads the remaining model layers. The model is now distributed across both machines and ready for inference.
Test the API
The model is accessible via the OpenAI-compatible API.
Example using the OpenAI Python library:
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:8000/v1", # node-1 IP address
api_key="not-needed" # API key not required for Language Pipes
)
response = client.chat.completions.create(
model="Qwen/Qwen3-1.7B",
max_completion_tokens=100,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a haiku about distributed systems."}
]
)
print(response.choices[0].message.content)
Install the OpenAI library with: pip install openai
See the OpenAI-compatible API documentation for the full endpoint reference and sampling parameter descriptions.
Supported Models
Language pipes currently supports a few model families including llama3, Phi4, Qwen3, and GLM4.1v. View all tested models here
Planned Improvements
- Additional model architectures
- INT8 and INT4 quantization (currently all inference uses fp16)
- GGUF format support (currently requires safetensors)
/v1/responsesendpoint (currently only/v1/chat/completions)
Dependencies
Documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file language_pipes-1.2.0.tar.gz.
File metadata
- Download URL: language_pipes-1.2.0.tar.gz
- Upload date:
- Size: 117.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a335f5a59679eca327aaa83f7818a2aec28a854fa7a4307cba51a6fca2c2a48
|
|
| MD5 |
a476d31537259b953466c9f16f1aae58
|
|
| BLAKE2b-256 |
e43e82ed3407a38109e873a06b0905b8f7b40854e89d9d2ed6b9537f04baf724
|
Provenance
The following attestation bundles were made for language_pipes-1.2.0.tar.gz:
Publisher:
publish.yml on erinclemmer/language-pipes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
language_pipes-1.2.0.tar.gz -
Subject digest:
4a335f5a59679eca327aaa83f7818a2aec28a854fa7a4307cba51a6fca2c2a48 - Sigstore transparency entry: 976526206
- Sigstore integration time:
-
Permalink:
erinclemmer/language-pipes@ebe14dbd2a55d2db0f0092c9c5a3a61b17d32b3f -
Branch / Tag:
refs/tags/1.2.0 - Owner: https://github.com/erinclemmer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ebe14dbd2a55d2db0f0092c9c5a3a61b17d32b3f -
Trigger Event:
release
-
Statement type:
File details
Details for the file language_pipes-1.2.0-py3-none-any.whl.
File metadata
- Download URL: language_pipes-1.2.0-py3-none-any.whl
- Upload date:
- Size: 87.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2edccb749cd4001d8b0ea77bd30c720a49af82e9b91a93d388d7dbc6d29adc1
|
|
| MD5 |
7e4d951a36a8512691d4ba09c0be145c
|
|
| BLAKE2b-256 |
45499b2c05fccb4acd73fa0a0a700a9d9ffd1b7ad77e08232d1f0f008f9e0a76
|
Provenance
The following attestation bundles were made for language_pipes-1.2.0-py3-none-any.whl:
Publisher:
publish.yml on erinclemmer/language-pipes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
language_pipes-1.2.0-py3-none-any.whl -
Subject digest:
e2edccb749cd4001d8b0ea77bd30c720a49af82e9b91a93d388d7dbc6d29adc1 - Sigstore transparency entry: 976526207
- Sigstore integration time:
-
Permalink:
erinclemmer/language-pipes@ebe14dbd2a55d2db0f0092c9c5a3a61b17d32b3f -
Branch / Tag:
refs/tags/1.2.0 - Owner: https://github.com/erinclemmer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ebe14dbd2a55d2db0f0092c9c5a3a61b17d32b3f -
Trigger Event:
release
-
Statement type: