Skip to main content

Lightweight, async-first Kademlia DHT, with dynamic distance-based cache TTL and switchable JSON/Bencode wire serialization

Project description

kademlia-dynamic

Lightweight, async-first Kademlia DHT implementation in pure Python with zero external dependencies. Distinct from the kademlia package on PyPI, imports as kademlia_dynamic.

Installation

pip install kademlia-dynamic

Or from source:

cd kademlia_dynamic
python -m pip install -e .
from kademlia_dynamic import KademliaServer, Peer, generate_node_id

Quick Start

import asyncio
from kademlia_dynamic import KademliaServer

async def main():
    server = KademliaServer()  # or KademliaServer(serialization="bencode")
    await server.listen(port=8000)  # defaults to 127.0.0.1; pass ip="0.0.0.0" to accept LAN/WAN peers

    await server.bootstrap([("192.168.1.100", 8000)])

    await server.set("my_key", "my_value")       # str
    await server.set("my_blob", b"\x00binary")   # or bytes
    value = await server.get("my_key")
    print(value)

    server.stop()

asyncio.run(main())

API Reference

KademliaServer

Main DHT node class

KademliaServer(serialization: str = "json", verify_response_source: bool = True)
  • serialization"json" (default) or "bencode" wire format
  • verify_response_source — when True (default), responses are only accepted from the IP:port the query was sent to; forged responses from other sources are dropped. Set to False if peers legitimately reply from a different address (e.g. some NAT setups).

Additional keyword-only tuning arguments (k, alpha, query_timeout, ...) are listed under Configuration.

Methods

  • async listen(port: int, ip: str = "127.0.0.1") — Start listening for UDP packets. Pass ip="0.0.0.0" to accept connections from other machines (LAN/WAN). Only do this behind a firewall/NAT you control.
  • async bootstrap(nodes: List[Tuple[str, int]]) — Join network from bootstrap nodes
  • async set(key: str, value: str | bytes) — Store value in DHT
  • async get(key: str) -> Optional[str | bytes] — Retrieve value from DHT, same type as stored
  • async find_node(target_id: str) -> List[Peer] — Find peers close to target ID
  • async find_value(key: str) -> Tuple[Optional[str | bytes], List[Peer]] — Search for value, return closest peers if not found
  • stop() — Shut down node and close transport

Peer

Represents a network node

peer = Peer(node_id="abc123...", ip="192.168.1.100", port=8000)
peer.to_dict()  # Serializable dict
Peer.from_dict(data)  # Deserialize

Utility Functions

  • generate_node_id() -> str — Generate random 160-bit node ID (SHA1)
  • hash_key_to_node_id(key: str) -> str — Hash key to node ID space
  • xor_distance(hex_id_a: str, hex_id_b: str) -> int — Compute XOR distance

Comparison to Canonical Kademlia

Canonical references: the Kademlia paper (Maymounkov & Mazières, 2002) and BEP 5 (the BitTorrent DHT protocol).

Similarities

  • 160-bit node IDs (SHA1)
  • XOR distance metric
  • K-buckets (K=20) with replacement cache
  • Alpha concurrency (α=3)
  • Ping, find_node, store/retrieve operations
  • Periodic bucket refresh, value republishing, expiry cleanup
  • Asynchronous UDP protocol

Differences vs canonical Kademlia / BEP 5

Feature This Implementation Canonical Kademlia / BEP 5
Language Python (Language-agnostic spec)
Async Model asyncio Blocking (implementation-dependent)
Serialization JSON (default) or Bencode Bencode (bencoding)
Value TTL Dynamic (distance-based) Fixed intervals
Original Publisher Tracking Yes Not specified
Cached Value TTL Inversely proportional to distance Fixed short TTL
RPC Protocol JSON over UDP Bencoded dict over UDP
Socket Binding User-specified IP Auto-detect

Behavioral Notes

  • Node Discovery: Includes implicit peer discovery via sender fields in all responses (not explicit in BEP 5)
  • Cache TTL: Cached values expire faster (shorter TTL) for distant nodes, reducing stale caches
  • Bucket Refresh: Proactive refresh every 3600s (1h) of buckets that haven't seen activity
  • Value Expiry: Original publishers republish every 24h; non-publishers every 1h

Configuration

All tunables are per-server constructor arguments (keyword-only). The module-level constants in kademlia_dynamic/kademlia.py are only their defaults.

server = KademliaServer(
    serialization="json",                          # or "bencode"
    verify_response_source=True,                   # drop responses from unexpected addresses
    k=20,                                          # peers per bucket (K_BUCKET_SIZE)
    alpha=3,                                       # parallel queries (ALPHA_CONCURRENCY)
    query_timeout=2.0,                             # RPC timeout, seconds
    bucket_refresh_interval=3600,                  # refresh stale buckets + expiry sweep (s)
    key_expiry_seconds=86410,                      # value TTL (24h + 10s)
    non_publisher_restore_interval=3600,           # cache restore interval (1h)
    original_publisher_republish_interval=86400,   # publisher republish (24h)
    min_cache_ttl_seconds=600,                     # minimum cached value TTL (10m)
    max_dispatch_tasks=64,                         # backpressure: max concurrent inbound handlers
)

The bind address is chosen per node via listen(port, ip=...). Node ID width (160-bit) is fixed — it is tied to SHA1.

All nodes in a network should use the same k and alpha for predictable lookup behavior, and must use the same serialization.

Design Notes

JSON vs Bencode

Both are built in, pure Python, zero dependencies. Pick per-server via the serialization constructor arg:

KademliaServer(serialization="json")      # default: human-readable, easy to debug
KademliaServer(serialization="bencode")   # BitTorrent-style bencode wire format

All peers in a network must use the same serialization to interoperate. None values are omitted from encoded messages in both formats (bencode has no null type); readers treat a missing key as None.

One bencode edge case: a str value that begins with the internal byte marker \x00bytes\x00 will round-trip back as bytes. Avoid leading NUL bytes in string values (or use the JSON codec, which does not have this ambiguity).

Values: str or bytes

set()/get() accept and return either str or bytes. On the wire, bencode stores bytes natively; JSON (which has no binary type) base64-encodes bytes transparently and decodes them back on receipt. Type is preserved round-trip — a bytes value in never comes back as str, and vice versa.

Network Exposure

listen() binds 127.0.0.1 by default. To join a real network, pass ip="0.0.0.0" (binds all interfaces) and ensure the UDP port is open/forwarded on your firewall/router.

Thread Safety

Not thread-safe. Designed for single-threaded asyncio use. For multi-threaded access, wrap in locks or run each node in its own event loop.

Backpressure

Datagram dispatch queue limits concurrent inbound handler tasks (max_dispatch_tasks, default 64). Excess packets are dropped with a warning log.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kademlia_dynamic-1.1.0.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kademlia_dynamic-1.1.0-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file kademlia_dynamic-1.1.0.tar.gz.

File metadata

  • Download URL: kademlia_dynamic-1.1.0.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for kademlia_dynamic-1.1.0.tar.gz
Algorithm Hash digest
SHA256 fe23b0214a1c5e80ce7f144a96665038c25063ba10054dd2520038bc593e75e6
MD5 7fd82ce2118ffd6ff3bad9d4ba74dad7
BLAKE2b-256 e69f6cd0c8185d781830da1223c7efa27a42b27fc2eba66dc57dbedb56965838

See more details on using hashes here.

File details

Details for the file kademlia_dynamic-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for kademlia_dynamic-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 058eece777391317b1abab1ba9f4a0dd4b68803648145258f02f79bcbc038b06
MD5 6a4f0db4c553d008af028f16a2b242fc
BLAKE2b-256 7153b75596547ca31628422fcdb3f8fb1b0b38071828c8a4a749dd94a8b43ab7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page