Skip to main content

Peer-to-peer distributed inference for open-source language models

Project description

Language Pipes

Peer-to-peer distributed inference for open-source language models

Release GitHub license PyPI - Downloads

Language Pipes is an open-source distributed inference system built on the transformers library that splits large language model computation across multiple machines. By separating the model's text-handling components (embedding and output head) from its intermediate transformer layers, Language Pipes enables peer-to-peer inference.

Features

  • OpenAI-compatible API
  • Interactive setup wizard
  • Automatic model download by HuggingFace ID
  • Privacy-oriented architecture with layered privacy mitigations
  • Decentralized peer-to-peer network with optional AES encryption

How It Works

Language models process input through a sequence of transformer layers. Each layer performs matrix multiplications between learned weights and a hidden state tensor, passing the result to the next layer. Language Pipes distributes these layers across machines, splitting the memory cost across the network while keeping the text-handling components on the origin node.

The architecture provides architectural separation: layer models operate on continuous-valued tensors rather than discrete text while the end models keep text data on trusted systems. The privacy documentation provides a probabilistic threat model that quantifies the difficulty of known inversion attacks under various mitigation configurations.

Further reading:

Installation

Requires Python 3.10+. For GPU support, install the appropriate PyTorch version for your CUDA configuration:
https://pytorch.org/get-started/locally/

Install from pip:

pip install language-pipes

Quick Start

The easiest way to get started is with the interactive setup wizard:

language-pipes

This launches a menu where you can create, view, and load configurations. Select Create Config to walk through the setup wizard, which guides you through your first configuration. After creating a config, select Load Config to start the server.

Configuration can also be specified via TOML files. See the CLI reference for details on loading configurations from the command line.


Two Node Example

This example shows how to distribute a model across two computers using the interactive wizard.

Node 1 (First Computer)

Start language pipes:

language-pipes
Prompt Value Description
Node ID node-1 Unique identifier for this node on the network
Add layer models Y Starts layer model editor
Model ID Qwen/Qwen3-1.7B HuggingFace model to load for layers
Device cpu Hardware to run inference on
Max memory 1 GB of RAM to use (loads part of the model)
Add end model Y starts end node editor
Model ID Qwen/Qwen3-1.7B Huggingface model to load for ends
Number of local layers 1 Ensure first layer is loaded on your machine
Enable OpenAI API Y Exposes the OpenAI-compatible endpoint
API port 8000 Port for the API server
First node in network Y This node starts the network
Peer port 5000 Port for peer-to-peer communication
Network IP [Your local IP] IP address other nodes can find you at
Encrypt network traffic N Disable encryption for simplicity
Advanced options N Logging, security, etc.

Node 2 (Second Computer)

Start language pipes with this command:

language-pipes
Prompt Value Description
Node ID node-2 Unique identifier for this node on the network
Add layer models Y Starts layer model editor
Model ID Qwen/Qwen3-1.7B HuggingFace model to load for layers
Device cpu Hardware to run inference on
Max memory 2 GB of RAM to use (loads part of the model)
Add end model N starts end node editor
Enable OpenAI API Y Exposes the OpenAI-compatible endpoint
API port 8000 Port for the API server
First node in network N This node starts the network
Bootstrap Address [node-1 IP] IP address of node-1
Peer port 5000 Port for peer-to-peer communication
Encrypt network traffic N Disable encryption for simplicity
Advanced options N Logging, security, etc.

Node-2 connects to node-1 and loads the remaining model layers. The model is now distributed across both machines and ready for inference.

Test the API

The model is accessible via the OpenAI-compatible API.

Example using the OpenAI Python library:

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8000/v1",  # node-1 IP address
    api_key="not-needed"  # API key not required for Language Pipes
)

response = client.chat.completions.create(
    model="Qwen/Qwen3-1.7B",
    max_completion_tokens=100,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a haiku about distributed systems."}
    ]
)

print(response.choices[0].message.content)

Install the OpenAI library with: pip install openai

See the OpenAI-compatible API documentation for the full endpoint reference and sampling parameter descriptions.

Supported Models

Language pipes currently supports a few model families including llama3, Phi4, Qwen3, and GLM4.1v. View all tested models here

Planned Improvements

  • Additional model architectures
  • INT8 and INT4 quantization (currently all inference uses fp16)
  • GGUF format support (currently requires safetensors)
  • /v1/responses endpoint (currently only /v1/chat/completions)

Dependencies

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

language_pipes-1.2.0.tar.gz (117.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

language_pipes-1.2.0-py3-none-any.whl (87.3 kB view details)

Uploaded Python 3

File details

Details for the file language_pipes-1.2.0.tar.gz.

File metadata

  • Download URL: language_pipes-1.2.0.tar.gz
  • Upload date:
  • Size: 117.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for language_pipes-1.2.0.tar.gz
Algorithm Hash digest
SHA256 4a335f5a59679eca327aaa83f7818a2aec28a854fa7a4307cba51a6fca2c2a48
MD5 a476d31537259b953466c9f16f1aae58
BLAKE2b-256 e43e82ed3407a38109e873a06b0905b8f7b40854e89d9d2ed6b9537f04baf724

See more details on using hashes here.

Provenance

The following attestation bundles were made for language_pipes-1.2.0.tar.gz:

Publisher: publish.yml on erinclemmer/language-pipes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file language_pipes-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: language_pipes-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 87.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for language_pipes-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e2edccb749cd4001d8b0ea77bd30c720a49af82e9b91a93d388d7dbc6d29adc1
MD5 7e4d951a36a8512691d4ba09c0be145c
BLAKE2b-256 45499b2c05fccb4acd73fa0a0a700a9d9ffd1b7ad77e08232d1f0f008f9e0a76

See more details on using hashes here.

Provenance

The following attestation bundles were made for language_pipes-1.2.0-py3-none-any.whl:

Publisher: publish.yml on erinclemmer/language-pipes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page