Skip to main content

A package for distributing large files via HTTP P2P network

Project description

Shardcast

A Python package for distributing large files via an HTTP-based tree-topology network

Overview

Shardcast is designed to distribute large binary files through a multi-tier network, making it efficient to transfer large files to many clients:

  1. Origin Server: The root node that shards a large file and serves the shards via HTTP
  2. Middle Nodes: Intermediate servers that download shards from upstream servers and re-serve them
  3. Client Nodes: End nodes that download and reassemble shards into the original file

Features

  • Automatically shards large files into configurable chunks (default: 50MB)
  • Versioned distribution with auto-cleanup of old versions
  • SHA-256 integrity verification for reassembled files
  • Dynamic server performance tracking for optimal downloads
  • Concurrent downloads with automatic retries
  • Support for multiple distribution layers
  • Simple API for broadcasting files

Installation

# Install from source
git clone https://github.com/PrimeIntellect-ai/shardcast.git
cd shardcast
pip install -e .

Usage

Origin Server

Run as a standalone server:

# Start an origin server on port 8000
shardcast-origin --data-dir ./data --port 8000

Use as a library:

import shardcast

# Initialize the package
shardcast.initialize(data_dir="./data", port=8000)

# Broadcast a file
version = shardcast.broadcast("/path/to/large_file.bin")
print(f"File broadcast as version {version}")

# Shut down when done
shardcast.shutdown()

Middle Node

# Start a middle node that connects to an origin server
shardcast-middle --upstream 192.168.1.100 --data-dir ./middle_data --port 8001

# Connect to multiple upstream servers (comma-separated)
shardcast-middle --upstream 192.168.1.100,192.168.1.101 --data-dir ./middle_data --port 8001

# Using the IP_ADDR_LIST environment variable instead of --upstream
export IP_ADDR_LIST="192.168.1.100 192.168.1.101"
# or in bash array format
export IP_ADDR_LIST=("192.168.1.100" "192.168.1.101")
shardcast-middle --data-dir ./middle_data --port 8001

Client Node

# List available versions
shardcast-client --servers 192.168.1.100,192.168.1.101 --list

# Download a specific version
shardcast-client --servers 192.168.1.100,192.168.1.101 --version v1 --output-file ./downloaded_file.bin

# Using the IP_ADDR_LIST environment variable instead of --servers
export IP_ADDR_LIST="192.168.1.100 192.168.1.101"
# or in bash array format
export IP_ADDR_LIST=("192.168.1.100" "192.168.1.101")
shardcast-client --list

Configuration

Key constants are defined in shardcast/constants.py:

  • SHARD_SIZE: Size of each shard in bytes (default: 50MB)
  • MAX_DISTRIBUTION_FOLDERS: Maximum number of version folders to keep (default: 15)
  • HTTP_PORT: Default HTTP port for servers (default: 8000)
  • RETRY_ATTEMPTS: Number of retry attempts for failed downloads (default: 5)
  • MAX_CONCURRENT_DOWNLOADS: Number of concurrent download threads (default: 10)

Architecture

  • File Sharding: The origin server splits files into shards named shard_001.bin, shard_002.bin, etc.
  • Distribution: Shards are served via HTTP from the origin server and middle nodes.
  • Folder Versioning: Each broadcast creates a new folder (e.g., v1, v2), with a maximum of 15 folders.
  • Discovery: A distribution.txt file lists active shard folders and their SHA-256 checksums.
  • Download Optimization: Clients download concurrently and prefer faster middle nodes based on runtime performance.
  • Integrity: Clients verify the reassembled file using the SHA-256 checksum from distribution.txt.
cat distribution.txt
> v1: 4d1d960b53356285f45ea2e27c89a1a11d10a9601d3ba2a90851f9f227dd9295

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shardcast-0.1.7.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shardcast-0.1.7-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file shardcast-0.1.7.tar.gz.

File metadata

  • Download URL: shardcast-0.1.7.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for shardcast-0.1.7.tar.gz
Algorithm Hash digest
SHA256 26db912eafa6f8604f848414c547830ec8c3ee280cf976438e5400e150de6aea
MD5 164fb06fbb5c2c009b9cf27407a4974a
BLAKE2b-256 4792cc354cdbab593c0a8dfe548a2f002bd0da64db2a55104fc9de9026a665cf

See more details on using hashes here.

File details

Details for the file shardcast-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: shardcast-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 22.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for shardcast-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 eabc2f490cea73f58596ffdf7caed4ee0c40fad412e2245f717197808f454907
MD5 b3be6bd3a7b925b67c4091558cb4af58
BLAKE2b-256 d795ef748dae12aa8fc78f932ab993e08534e3b16e45f4c78d4bda85494ca3bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page