Skip to main content

A decentralized gossip learning framework for P2P edge intelligence

Project description

QuinkGL: Decentralized Gossip Learning Framework

PyPI version Python 3.9+ License: Apache 2.0

QuinkGL is a fully decentralized, peer-to-peer (P2P) federated learning framework that enables collaborative model training across distributed devices without relying on a central parameter server. Built on gossip-based protocols, QuinkGL addresses the core challenges of decentralized learning: communication efficiency, non-IID data heterogeneity, and Byzantine fault tolerance.


Motivation

Centralized federated learning (FL) architectures such as FedAvg [McMahan et al., 2017] depend on a parameter server for global aggregation, introducing a single point of failure and a communication bottleneck. As edge computing scales — driven by IoT proliferation and privacy-sensitive domains like healthcare — decentralized alternatives become essential.

QuinkGL draws from the gossip learning paradigm [Ormándi et al., 2013], where nodes exchange model updates directly with randomly selected peers. This eliminates server dependency and enables organic convergence through repeated local interactions. The framework extends this foundation with:

  • Data-aware peer selection via privacy-preserving fingerprints
  • Entropy-weighted aggregation inspired by RNEP [Kang & Lee, 2024]
  • Byzantine-resilient strategies including Krum [Blanchard et al., 2017] and TrimmedMean
  • Pluggable architecture for topology, aggregation, and model strategies

Key Features

Feature Description
Fully Decentralized No central server — pure P2P gossip protocol
Non-IID Resilient AffinityTopology + EntropyWeightedAvg + FedProx + SCAFFOLD for heterogeneous data
Privacy-Preserving Fingerprints Quantized, noised data summaries for peer matching (ε-DP ready)
Byzantine Fault Tolerance Krum, MultiKrum, TrimmedMean aggregation strategies
NAT Traversal IPv8 with UDP hole punching + automatic tunnel fallback
Framework Agnostic PyTorch, TensorFlow, or custom model wrappers
Swarm Manifest Cryptographic commitment to training protocol (model, aggregation, topology)
Personalized FL APFL adaptive mixing, FedRep-style backbone/head split
Staleness-Aware StalenessWeightedFedAvg for asynchronous environments
Variance Reduction SCAFFOLD with gossip-adapted control variates (Karimireddy et al., 2020)
Error Feedback Residual buffer for biased compressors — convergence-guaranteed Top-k/quantization
Spectral Analysis Runtime algebraic connectivity (λ₂) and spectral gap measurement for topology evaluation
Observability Event-driven telemetry with terminal rendering

Installation

pip install quinkgl

For development:

git clone https://github.com/aliseyhann/QuinkGL-Gossip-Learning-Framework.git
cd QuinkGL-Gossip-Learning-Framework
pip install -e .

Quick Start

import asyncio
import torch.nn as nn
from quinkgl import GossipNode, PyTorchModel, AffinityTopology, EntropyWeightedAvg

# 1. Define your model
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 10)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = x.view(x.size(0), -1)
        return self.fc2(self.relu(self.fc1(x)))

# 2. Wrap the model
model = PyTorchModel(SimpleNet(), device="cpu")

# 3. Create and run the node
async def main():
    node = GossipNode(
        node_id="alice",
        domain="mnist",
        model=model,
        port=7000,
        topology=AffinityTopology(min_affinity=0.3),
        aggregation=EntropyWeightedAvg(),
    )

    await node.start()
    await node.run_continuous(training_data)
    await node.shutdown()

asyncio.run(main())

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                          GossipNode                              │
│    (Production-ready node with P2P networking + fallback)        │
├──────────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌────────────────┐  ┌──────────────────────┐ │
│  │ PyTorchModel │  │ RandomTopology │  │      FedAvg          │ │
│  │ TensorFlow   │  │ CyclonTopology │  │ FedProx  │ FedAvgM  │ │
│  │ CustomModel  │  │ AffinityTopol. │  │ Krum │ TrimmedMean  │ │
│  │              │  │                │  │ EntropyWeightedAvg   │ │
│  │              │  │                │  │ StalenessWeighted    │ │
│  └──────────────┘  └────────────────┘  └──────────────────────┘ │
├──────────────────────────────────────────────────────────────────┤
│  ┌────────────────────────────────────────────────────────────┐  │
│  │    DataFingerprint ─► AffinityScore ─► Peer Selection     │  │
│  │    (Privacy-preserving data distribution summaries)       │  │
│  └────────────────────────────────────────────────────────────┘  │
├──────────────────────────────────────────────────────────────────┤
│  ┌────────────────────────────────────────────────────────────┐  │
│  │           ModelAggregator (Train → Gossip → Aggregate)    │  │
│  └────────────────────────────────────────────────────────────┘  │
├──────────────────────────────────────────────────────────────────┤
│  ┌────────────────────────────────────────────────────────────┐  │
│  │         IPv8 Network Layer + Tunnel Fallback              │  │
│  │      (P2P, NAT Traversal, UDP Hole Punching, Relay)      │  │
│  └────────────────────────────────────────────────────────────┘  │
├──────────────────────────────────────────────────────────────────┤
│  ┌────────────────────────────────────────────────────────────┐  │
│  │    Observability: EventEmitter → TelemetryClient          │  │
│  └────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────┘

Project Structure

QuinkGL/
├── src/quinkgl/
│   ├── core/                  # LearningNode (network-agnostic abstraction)
│   ├── models/                # PyTorch, TensorFlow, personalized model wrappers
│   ├── topology/              # RandomTopology, CyclonTopology, AffinityTopology, SpectralAnalyzer
│   ├── aggregation/           # FedAvg, FedProx, FedAvgM, Krum, TrimmedMean,
│   │                          # EntropyWeightedAvg, StalenessWeightedFedAvg, Scaffold
│   ├── fingerprint/           # DataFingerprint, AffinityWeights, FingerprintComputer
│   ├── manifest/              # SwarmManifest, DataPolicy, CollaborationPolicy
│   ├── gossip/                # Protocol primitives, ModelAggregator orchestration
│   ├── network/               # GossipNode, IPv8 manager, gossip community
│   ├── training/              # Convergence monitoring, prototype-based alignment
│   ├── serialization/         # Model weight serialization, compression pipeline, Error Feedback
│   ├── storage/               # Model checkpointing
│   ├── observability/         # EventEmitter, RuntimeEvent, TerminalObserver
│   ├── telemetry/             # TelemetryClient
│   └── utils/                 # Shared utilities
├── tests/                     # 364+ unit tests
└── docs/                      # Deployment guides, research notes

Package Responsibilities

Package Responsibility
core Public node abstraction without transport concerns
gossip Round orchestration and protocol primitives
network IPv8 transport, NAT traversal, and wire delivery
aggregation Model merge strategies (pluggable)
topology Peer selection, partial-view management, spectral analysis
fingerprint Privacy-preserving data distribution summaries
manifest Cryptographic swarm identity and policy declaration
training Convergence monitoring, prototype alignment (FedProto/FedPAC)
serialization Model weight serialization, compression pipeline, error feedback
observability Event-driven runtime telemetry

Topology Strategies

QuinkGL provides pluggable peer selection strategies that determine which peers to exchange models with each round.

Strategy Approach Literature
RandomTopology Uniform random peer selection Ormándi et al., 2013
CyclonTopology Periodic shuffling for network exploration Voulgaris et al., 2005
AffinityTopology Data-aware peer selection via fingerprint similarity with exploration–exploitation balancing Domain-aware collaboration (this work)

Spectral Analysis

The SpectralAnalyzer provides runtime measurement of topology quality through algebraic connectivity and spectral gap — quantities that directly determine gossip convergence speed [Koloskova et al., 2020].

from quinkgl.topology import SpectralAnalyzer, build_ring_adjacency

analyzer = SpectralAnalyzer()
report = analyzer.analyze(build_ring_adjacency(10))
print(report.summary())
# n=10 e=10 λ₂=0.3820 gap=0.1315 connected=True mix_time≤17.5
Metric Meaning
algebraic_connectivity (λ₂) Fiedler value — positive ↔ connected graph
spectral_gap (1−|λ₂(W)|) Larger gap → faster gossip convergence
mixing_time_upper Upper bound: log(n) / spectral_gap
is_connected Whether the graph is fully connected

AffinityTopology — Like-Attracts-Like

AffinityTopology selects peers based on data distribution similarity using privacy-preserving fingerprints. It incorporates:

  • Multi-signal affinity — label buckets (40%), feature moments (30%), gradient similarity (15%), collaboration history (15%)
  • Cold-start resilience — three phases (blind → learning → exploiting) with decaying exploration ratio
  • Adaptive collaboration graph — EMA-blended edge weights with automatic decay and eviction of stale edges

Communication Efficiency — Error Feedback

QuinkGL's compression pipeline (Delta → Sparsify → Quantize → Serialize → Zlib) uses biased compressors (Top-k, QSGD). Without correction, these break convergence guarantees. The ErrorFeedbackState module implements the Error Feedback mechanism [Alistarh et al., 2018] that accumulates the compression residual and re-injects it in the next round:

from quinkgl.serialization import CompressionConfig, SparsificationConfig

config = CompressionConfig(
    sparsification=SparsificationConfig(top_k_ratio=0.01),
    error_feedback=True,   # activate EF — turns biased compressor effectively unbiased
)

Key property: Over K rounds, Σ compressed_outputs + final_residual = Σ raw_deltas (information conservation, verified by unit tests). Supports EF21-style momentum blending and optional residual norm capping.

Aggregation Strategies

All strategies implement the AggregationStrategy interface and are hot-swappable.

Strategy Type Description Reference
FedAvg Standard Weighted averaging by sample count McMahan et al., 2017
FedProx Non-IID Proximal term to limit client drift Li et al., 2020
FedAvgM Stability Server momentum for smoother convergence Hsu et al., 2019
EntropyWeightedAvg Non-IID Shannon entropy–based weighting (RNEP-inspired) Kang & Lee, 2024
StalenessWeightedFedAvg Async Exponential penalty for stale updates
Scaffold Non-IID Control-variate drift correction (gossip variant) Karimireddy et al., 2020
TrimmedMean Byzantine Trim extreme values before averaging Yin et al., 2018
Krum / MultiKrum Byzantine Select most central update(s) Blanchard et al., 2017

EntropyWeightedAvg — RNEP-Inspired Aggregation

Weights each peer's contribution by the Shannon entropy of its local label distribution. Peers with diverse (high-entropy) data exert more influence on the aggregated model, while skewed (low-entropy) peers contribute less — preventing overfitting to biased local distributions.

from quinkgl import EntropyWeightedAvg

aggregation = EntropyWeightedAvg(
    entropy_floor=0.01,    # minimum weight for single-class peers
    fallback_weight=1.0,   # weight when no distribution metadata available
)

Scaffold — Variance Reduction via Control Variates

Implements the SCAFFOLD algorithm [Karimireddy et al., 2020] adapted for gossip topology. Each node maintains a control variate that estimates its local gradient drift. The gossip variant replaces the central server's global control variate with a running EMA of peer control variates.

from quinkgl import Scaffold

aggregation = Scaffold(
    learning_rate=0.01,       # local SGD learning rate
    global_learning_rate=1.0, # aggregation-side scaling
    control_momentum=0.0,     # 0.0 = classic EF, 0.9 = EF21 momentum
)

Key property: SCAFFOLD provably reduces the gradient variance caused by non-IID data, unlike FedProx which only adds a proximal penalty.


Privacy-Preserving Data Fingerprints

Each node computes a lightweight, privacy-preserving summary of its local data distribution. Raw statistics are never shared — all fields are transformed before transmission.

Raw Field Privacy Transform Output
Label distribution Quantize into buckets (low/medium/high) label_buckets
Feature moments (mean, var) Add calibrated Gaussian noise noised_moments
Sample count Bucket into ranges (e.g., "1k–10k") sample_bucket
Gradient moments Noise + disabled by default (gradient inversion risk) gradient_moments

Fingerprints are exchanged during peer discovery and used by AffinityTopology to compute affinity scores.


Swarm Manifest

The Swarm Manifest provides cryptographic commitment to the training protocol. It binds model architecture, aggregation strategy, topology rules, and data policy into a single SHA-256 hash — analogous to a BitTorrent info hash. Two peers with the same manifest ID are, by definition, running the same training protocol.


Personalized Federated Learning

QuinkGL supports personalization techniques to handle statistical heterogeneity:

Technique Description
APFL (Adaptive Personalized FL) Adaptive mixing coefficient between local and global models
FedRep-style split Shared backbone + personalized head via ModelSplit
FedProto / FedPAC Prototype-based alignment and classifier collaboration

Public API Overview

Core

Class Description
LearningNode Framework node without networking (bring your own transport)
GossipNode Production node with IPv8 P2P + automatic tunnel fallback

Models

Class Description
PyTorchModel Wrapper for PyTorch nn.Module with NaN validation, gradient clipping
TensorFlowModel Wrapper for TensorFlow/Keras models
ModelWrapper Base class for custom framework wrappers
PersonalizedModelWrapper Base for APFL-style personalized models
TrainingConfig Training configuration (epochs, batch_size, lr, grad_clip, optimizer)

Fingerprint

Class Description
DataFingerprint Privacy-preserving data distribution summary
FingerprintComputer Computes fingerprints from raw data with configurable privacy
AffinityWeights Weights for multi-signal affinity computation
FingerprintPrivacyConfig ε-DP budget, noise levels, bucket granularity

Manifest & Policy

Class Description
DataPolicy Minimum affinity, privacy level, cold-start rounds
CollaborationPolicy Aggregation and topology parameters
PersonalizationPolicy APFL, FedRep configuration
PrototypePolicy FedProto/FedPAC alignment settings

Observability

Class Description
EventEmitter Publish/subscribe runtime events
RuntimeEvent Structured event payload
TerminalObserver Human-readable terminal rendering
TelemetryClient Telemetry data collection

Requirements

  • Python 3.9+
  • PyTorch 1.9+ (optional, for PyTorchModel)
  • TensorFlow 2.x (optional, for TensorFlowModel)
  • IPv8 2.0+ (for P2P networking)
  • NumPy

Documentation

Document Description
QUINKGL_FRAMEWORK.md Complete user guide with all features and examples
DOMAIN_AWARE_COLLABORATION_DESIGN.md Domain-aware collaboration design (fingerprint + affinity + personalization)
SWARM_MANIFEST_PROPOSAL.md Swarm Manifest design (BitTorrent-inspired protocol identity)

References

  • McMahan et al. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS. (FedAvg)
  • Ormándi et al. (2013). Gossip Learning with Linear Models on Fully Distributed Data. Concurrency and Computation.
  • Blanchard et al. (2017). Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. NeurIPS. (Krum)
  • Yin et al. (2018). Byzantine-Robust Distributed Learning. ICML. (TrimmedMean)
  • Li et al. (2020). Federated Optimization in Heterogeneous Networks. MLSys. (FedProx)
  • Hsu et al. (2019). Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification. (FedAvgM)
  • Kang & Lee (2024). RNEP: Random Node Entropy Pairing for Efficient Decentralized Training with Non-IID Local Data. Electronics, 13(21), 4193. (EntropyWeightedAvg)
  • Karimireddy et al. (2020). SCAFFOLD: Stochastic Controlled Averaging for Federated Learning. ICML. (Scaffold)
  • Alistarh et al. (2018). The Convergence of Sparsified Gradient Methods. NeurIPS. (Error Feedback)
  • Richtárik et al. (2021). EF21: A New, Simpler, Theoretically Better. NeurIPS. (EF21 momentum)
  • Koloskova et al. (2020). Unified Theory of Decentralized SGD with Changing Topology and Local Updates. ICML. (Spectral Gap)
  • Boyd et al. (2006). Randomized Gossip Algorithms. IEEE Trans. Inf. Theory. (Metropolis–Hastings mixing)
  • Voulgaris et al. (2005). Cyclon: Inexpensive Membership Management for Unstructured P2P Overlays. JNSM. (CyclonTopology)
  • Deng et al. (2021). Adaptive Personalized Federated Learning. (APFL)

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Copyright 2026 Ali Seyhan, Baki Turhan


Contributing

Contributions are welcome! Please read our contributing guidelines and submit pull requests to the main repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quinkgl-0.3.0.tar.gz (132.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quinkgl-0.3.0-py3-none-any.whl (152.2 kB view details)

Uploaded Python 3

File details

Details for the file quinkgl-0.3.0.tar.gz.

File metadata

  • Download URL: quinkgl-0.3.0.tar.gz
  • Upload date:
  • Size: 132.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for quinkgl-0.3.0.tar.gz
Algorithm Hash digest
SHA256 795b8252a658f5e6f6b3ec8be1d0a783b14bc39033f68e27ec1a311c4f34f979
MD5 a90fe2cc361ac04eddc7a204fcbf28f3
BLAKE2b-256 2463253eb867a9174bf0226882aa740a40be26584119cffb4669d9e500e5216e

See more details on using hashes here.

File details

Details for the file quinkgl-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: quinkgl-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 152.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for quinkgl-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e12f58d27b4ffe23f266d97c787ded6c7c218f539d3561942640b9fdfe64859c
MD5 b71a65408ea8063689446f52d07b4b96
BLAKE2b-256 460ec45efb03a5138b9f2d4a8caf923e74e2292e937a0fc0925c0381b91b2502

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page