Skip to main content

A package for distributing large files via HTTP P2P network

Project description

Shardcast

A Python package for distributing large files via an HTTP-based tree-topology network

Overview

Shardcast is designed to distribute large binary files through a multi-tier network, making it efficient to transfer large files to many clients:

  1. Origin Server: The root node that shards a large file and serves the shards via HTTP
  2. Middle Nodes: Intermediate servers that download shards from upstream servers and re-serve them
  3. Client Nodes: End nodes that download and reassemble shards into the original file

Features

  • Automatically shards large files into configurable chunks (default: 50MB)
  • Versioned distribution with auto-cleanup of old versions
  • SHA-256 integrity verification for reassembled files
  • Dynamic server performance tracking for optimal downloads
  • Concurrent downloads with automatic retries
  • Support for multiple distribution layers
  • Simple API for broadcasting files

Installation

# Install from source
git clone https://github.com/PrimeIntellect-ai/shardcast.git
cd shardcast
pip install -e .

Usage

Origin Server

Run as a standalone server:

# Start an origin server on port 8000
shardcast-origin --data-dir ./data --port 8000

Use as a library:

import shardcast

# Initialize the package
shardcast.initialize(data_dir="./data", port=8000)

# Broadcast a file
version = shardcast.broadcast("/path/to/large_file.bin")
print(f"File broadcast as version {version}")

# Shut down when done
shardcast.shutdown()

Middle Node

# Start a middle node that connects to an origin server
shardcast-middle --upstream 192.168.1.100 --data-dir ./middle_data --port 8001

# Connect to multiple upstream servers (comma-separated)
shardcast-middle --upstream 192.168.1.100,192.168.1.101 --data-dir ./middle_data --port 8001

# Using the IP_ADDR_LIST environment variable instead of --upstream
export IP_ADDR_LIST="192.168.1.100 192.168.1.101"
# or in bash array format
export IP_ADDR_LIST=("192.168.1.100" "192.168.1.101")
shardcast-middle --data-dir ./middle_data --port 8001

Client Node

# List available versions
shardcast-client --servers 192.168.1.100,192.168.1.101 --list

# Download a specific version
shardcast-client --servers 192.168.1.100,192.168.1.101 --version v1 --output-file ./downloaded_file.bin

# Using the IP_ADDR_LIST environment variable instead of --servers
export IP_ADDR_LIST="192.168.1.100 192.168.1.101"
# or in bash array format
export IP_ADDR_LIST=("192.168.1.100" "192.168.1.101")
shardcast-client --list

Configuration

Key constants are defined in shardcast/constants.py:

  • SHARD_SIZE: Size of each shard in bytes (default: 50MB)
  • MAX_DISTRIBUTION_FOLDERS: Maximum number of version folders to keep (default: 15)
  • HTTP_PORT: Default HTTP port for servers (default: 8000)
  • RETRY_ATTEMPTS: Number of retry attempts for failed downloads (default: 5)
  • MAX_CONCURRENT_DOWNLOADS: Number of concurrent download threads (default: 10)

Architecture

  • File Sharding: The origin server splits files into shards named shard_001.bin, shard_002.bin, etc.
  • Distribution: Shards are served via HTTP from the origin server and middle nodes.
  • Folder Versioning: Each broadcast creates a new folder (e.g., v1, v2), with a maximum of 15 folders.
  • Discovery: A distribution.txt file lists active shard folders and their SHA-256 checksums.
  • Download Optimization: Clients download concurrently and prefer faster middle nodes based on runtime performance.
  • Integrity: Clients verify the reassembled file using the SHA-256 checksum from distribution.txt.
cat distribution.txt
> v1: 4d1d960b53356285f45ea2e27c89a1a11d10a9601d3ba2a90851f9f227dd9295

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shardcast-0.1.2.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shardcast-0.1.2-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file shardcast-0.1.2.tar.gz.

File metadata

  • Download URL: shardcast-0.1.2.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for shardcast-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b49dfaf0b1f215c75b2296d7446730c91051807d69acaac99b2c2df92da36d8f
MD5 dfdc1f1b1729d90666c8496fcd648bda
BLAKE2b-256 a1b8c2d10835a5f49baf13ceb1b9a4435f92ac90c5e03ea427422cb5f8ecd814

See more details on using hashes here.

File details

Details for the file shardcast-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: shardcast-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 23.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for shardcast-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e77041c4dd427520d4c0dba709d781452111d0c9aaa00af9de7653d8c4b6f292
MD5 2bf90dd0b449754c28e4ef8538a550b0
BLAKE2b-256 e1359f448194c44f5b0e1e3afea448185b16e870b61d049c2c1604a991192eae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page