Skip to main content

Monkey patch for huggingface_hub to download Git-LFS blobs from Storj

Project description

Monkey patch for HuggingFace Hub to download Git-LFS blobs from Storj

This patch aims to demonstrate the transfer speed that can be achieved with huggingface_hub Python library when utilizing the power of the Storj Decentralized Cloud Storage.

HuggingFace Hub stores all large files in Git-LFS.

image

When the huggingface_hub Python library requests to download such a file, the download request is redirected to the Git-LFS CDN hosted at cdn-lfs.huggingface.co.

This monkey patch modifies the huggingface_hub library to redirect Git-LFS downloads to the Storj Linksharing service hosted at link.storjshare.io.

Prerequisites

The Git-LFS blobs for the respective AI model must be replicated to a Storj bucket and shared it with the Storj Linksharing Service.

We have already replicated the Git-FLS blobs of the StarCoder model to a Storj bucket and shared it: https://link.storjshare.io/raw/juzlwaj7ovnst5gtkv2km3rkriha/lfs-huggingface

If you want to use another AI model, you need to use your own Storj bucket and then configure the patch to use it. See Configuration for more details.

Installation

First, install the patch module:

pip install huggingface-hub-storj-patch

Then add the following import statement at the top, before any other import, of your Python script:

import huggingface_hub_storj_patch

Now you can run your script. If the patch is applied successfully, you will see it printing the URLs from which the huggingface_hub library is downloading.

image

Configuration

These environment variables can configure the behavior of the patch.

HF_HUB_NO_STORJ

If set to true, downloads won't be redirected to the Storj Linksharing Service as if the patch is not applied.

HF_HUB_STORJ_PARALLELISM

Configures how many parallel download connections are open to the Storj Linksharing Service. The default value is 16.

HF_HUB_STORJ_URL_PREFIX

Configures the URL to the shared Storj bucket that replicates the Git-LFS blobs of the AI model. The default value is the bucket that replicates the StarCoder model: https://link.storjshare.io/raw/juzlwaj7ovnst5gtkv2km3rkriha/lfs-huggingface

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

huggingface_hub_storj_patch-0.0.6.tar.gz (8.3 kB view hashes)

Uploaded Source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page