Skip to main content

A package for distributing large files via HTTP P2P network

Project description

Shardcast

A Python package for distributing large files via an HTTP-based tree-topology network

Overview

Shardcast is designed to distribute large binary files through a multi-tier network, making it efficient to transfer large files to many clients:

  1. Origin Server: The root node that shards a large file and serves the shards via HTTP
  2. Middle Nodes: Intermediate servers that download shards from upstream servers and re-serve them
  3. Client Nodes: End nodes that download and reassemble shards into the original file

Features

  • Automatically shards large files into configurable chunks (default: 50MB)
  • Versioned distribution with auto-cleanup of old versions
  • SHA-256 integrity verification for reassembled files
  • Dynamic server performance tracking for optimal downloads
  • Concurrent downloads with automatic retries
  • Support for multiple distribution layers
  • Simple API for broadcasting files

Installation

# Install from source
git clone https://github.com/PrimeIntellect-ai/shardcast.git
cd shardcast
pip install -e .

Usage

Origin Server

Run as a standalone server:

# Start an origin server on port 8000
shardcast-origin --data-dir ./data --port 8000

Use as a library:

import shardcast

# Initialize the package
shardcast.initialize(data_dir="./data", port=8000)

# Broadcast a file
version = shardcast.broadcast("/path/to/large_file.bin")
print(f"File broadcast as version {version}")

# Shut down when done
shardcast.shutdown()

Middle Node

# Start a middle node that connects to an origin server
shardcast-middle --upstream 192.168.1.100 --data-dir ./middle_data --port 8001

# Connect to multiple upstream servers (comma-separated)
shardcast-middle --upstream 192.168.1.100,192.168.1.101 --data-dir ./middle_data --port 8001

# Using the IP_ADDR_LIST environment variable instead of --upstream
export IP_ADDR_LIST="192.168.1.100 192.168.1.101"
# or in bash array format
export IP_ADDR_LIST=("192.168.1.100" "192.168.1.101")
shardcast-middle --data-dir ./middle_data --port 8001

Client Node

# List available versions
shardcast-client --servers 192.168.1.100,192.168.1.101 --list

# Download a specific version
shardcast-client --servers 192.168.1.100,192.168.1.101 --version v1 --output-file ./downloaded_file.bin

# Using the IP_ADDR_LIST environment variable instead of --servers
export IP_ADDR_LIST="192.168.1.100 192.168.1.101"
# or in bash array format
export IP_ADDR_LIST=("192.168.1.100" "192.168.1.101")
shardcast-client --list

Configuration

Key constants are defined in shardcast/constants.py:

  • SHARD_SIZE: Size of each shard in bytes (default: 50MB)
  • MAX_DISTRIBUTION_FOLDERS: Maximum number of version folders to keep (default: 15)
  • HTTP_PORT: Default HTTP port for servers (default: 8000)
  • RETRY_ATTEMPTS: Number of retry attempts for failed downloads (default: 5)
  • MAX_CONCURRENT_DOWNLOADS: Number of concurrent download threads (default: 10)

Architecture

  • File Sharding: The origin server splits files into shards named shard_001.bin, shard_002.bin, etc.
  • Distribution: Shards are served via HTTP from the origin server and middle nodes.
  • Folder Versioning: Each broadcast creates a new folder (e.g., v1, v2), with a maximum of 15 folders.
  • Discovery: A distribution.txt file lists active shard folders and their blake3 checksums.
  • Download Optimization: Clients download concurrently and prefer faster middle nodes based on runtime performance.
  • Integrity: Clients verify the reassembled file using the blake3 checksum from distribution.txt.
cat distribution.txt
> v1: 4d1d960b53356285f45ea2e27c89a1a11d10a9601d3ba2a90851f9f227dd9295

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shardcast-0.3.2.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shardcast-0.3.2-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file shardcast-0.3.2.tar.gz.

File metadata

  • Download URL: shardcast-0.3.2.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for shardcast-0.3.2.tar.gz
Algorithm Hash digest
SHA256 460c9404e2538d7bd74b29ee5b65f5a7bea2e5cdd9235a95224633bf7caecbce
MD5 4e7d53584b88aefe4261a050f61fd530
BLAKE2b-256 2ce2de8cb58aae10895fe461d7897b7de0d62f6eb52a0b05c7b3d1ed83a8c8f1

See more details on using hashes here.

File details

Details for the file shardcast-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: shardcast-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for shardcast-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0078d58eaac06b3f5f4d2972f595364dda98230ed67397873830c8a49cd53b96
MD5 8fbb728038ce3e5ed5d4848e8b8f5ed3
BLAKE2b-256 156453149aff8d17df76047fa8057fedb18ec2cf24a0a048a806195f35554939

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page