Skip to main content

A package for distributing large files via HTTP P2P network

Project description

Shardcast

A Python package for distributing large files via an HTTP-based tree-topology network

Overview

Shardcast is designed to distribute large binary files through a multi-tier network, making it efficient to transfer large files to many clients:

  1. Origin Server: The root node that shards a large file and serves the shards via HTTP
  2. Middle Nodes: Intermediate servers that download shards from upstream servers and re-serve them
  3. Client Nodes: End nodes that download and reassemble shards into the original file

Features

  • Automatically shards large files into configurable chunks (default: 50MB)
  • Versioned distribution with auto-cleanup of old versions
  • SHA-256 integrity verification for reassembled files
  • Dynamic server performance tracking for optimal downloads
  • Concurrent downloads with automatic retries
  • Support for multiple distribution layers
  • Simple API for broadcasting files

Installation

# Install from source
git clone https://github.com/PrimeIntellect-ai/shardcast.git
cd shardcast
pip install -e .

Usage

Origin Server

Run as a standalone server:

# Start an origin server on port 8000
shardcast-origin --data-dir ./data --port 8000

Use as a library:

import shardcast

# Initialize the package
shardcast.initialize(data_dir="./data", port=8000)

# Broadcast a file
version = shardcast.broadcast("/path/to/large_file.bin")
print(f"File broadcast as version {version}")

# Shut down when done
shardcast.shutdown()

Middle Node

# Start a middle node that connects to an origin server
shardcast-middle --upstream 192.168.1.100 --data-dir ./middle_data --port 8001

# Connect to multiple upstream servers (comma-separated)
shardcast-middle --upstream 192.168.1.100,192.168.1.101 --data-dir ./middle_data --port 8001

# Using the IP_ADDR_LIST environment variable instead of --upstream
export IP_ADDR_LIST="192.168.1.100 192.168.1.101"
# or in bash array format
export IP_ADDR_LIST=("192.168.1.100" "192.168.1.101")
shardcast-middle --data-dir ./middle_data --port 8001

Client Node

# List available versions
shardcast-client --servers 192.168.1.100,192.168.1.101 --list

# Download a specific version
shardcast-client --servers 192.168.1.100,192.168.1.101 --version v1 --output-file ./downloaded_file.bin

# Using the IP_ADDR_LIST environment variable instead of --servers
export IP_ADDR_LIST="192.168.1.100 192.168.1.101"
# or in bash array format
export IP_ADDR_LIST=("192.168.1.100" "192.168.1.101")
shardcast-client --list

Configuration

Key constants are defined in shardcast/constants.py:

  • SHARD_SIZE: Size of each shard in bytes (default: 50MB)
  • MAX_DISTRIBUTION_FOLDERS: Maximum number of version folders to keep (default: 15)
  • HTTP_PORT: Default HTTP port for servers (default: 8000)
  • RETRY_ATTEMPTS: Number of retry attempts for failed downloads (default: 5)
  • MAX_CONCURRENT_DOWNLOADS: Number of concurrent download threads (default: 10)

Architecture

  • File Sharding: The origin server splits files into shards named shard_001.bin, shard_002.bin, etc.
  • Distribution: Shards are served via HTTP from the origin server and middle nodes.
  • Folder Versioning: Each broadcast creates a new folder (e.g., v1, v2), with a maximum of 15 folders.
  • Discovery: A distribution.txt file lists active shard folders and their SHA-256 checksums.
  • Download Optimization: Clients download concurrently and prefer faster middle nodes based on runtime performance.
  • Integrity: Clients verify the reassembled file using the SHA-256 checksum from distribution.txt.
cat distribution.txt
> v1: 4d1d960b53356285f45ea2e27c89a1a11d10a9601d3ba2a90851f9f227dd9295

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shardcast-0.2.0.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shardcast-0.2.0-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file shardcast-0.2.0.tar.gz.

File metadata

  • Download URL: shardcast-0.2.0.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for shardcast-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c1b7de2a7b963c0a80b0ef03f5ecff839503e547a84ede5f9377c2a40744cdf9
MD5 7d88ced6f1351a059f4783a8af8c0f86
BLAKE2b-256 afa571b0bc2a81886bd9e5913b7b5d823670a84ac62ffd3000a4bd948a87f0db

See more details on using hashes here.

File details

Details for the file shardcast-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: shardcast-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 22.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for shardcast-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 44c7e7c1fa37aaa6adc45133fa525373eac8e9e60a5b41d7cd9ab0312fd1ea55
MD5 56025f71b066333089522d278819f26c
BLAKE2b-256 b4a879b74aa10da528966383f7646db560be6f565b9ba510c2fe037c9b602bb6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page