Skip to main content

hfsync

Project description

hfsync

Sync Huggingface Transformer Models to Cloud Storage (GCS/S3 + more soon)

Quickstart

!pip install --upgrade git+https://github.com/trisongz/hfsync.git
!pip install --upgrade hfsync

Usage

from hfsync import GCSAuth, S3Auth # AZAuth (not yet supported)
from hfsync import Sync

# Note: Only use one auth client.

# To have Auth picked up from env vars / implicitly
auth_client = GCSAuth()
auth_client = S3Auth()

# To set Auth directly / explicitly
auth_client = GCSAuth(service_account='service_account', token='gcs_token')
auth_client = S3Auth(access_key='access_key', secret_key='secret_key', session_token='token')

# Local and Cloud Paths
local_path = '/content/model'
cloud_path = 'gs://bucket/model/experiment' # or 's3://bucket/model/experiment'
sync_client = Sync(local_path=local_path, cloud_path=cloud_path, auth_client=auth_client)

# Or Implicitly without an auth_client to have it figure things out based on your cloud path
sync_client = Sync(local_path=local_path, cloud_path=cloud_path)

# After training loop, sync your pretrained model to both local and cloud. 
# You don't need to explicitly call model.save_pretrained(path) as this function will do that automatically
results = sync_client.save_pretrained(model, tokenizer)
# results = {
# '/content/model/pytorch_model.bin': 'gs://bucket/model/experiment/pytorch_model.bin'
# ...
# }

# Pull Down from your bucket to local
results = sync_client.sync_to_local(overwrite=False)
# results = {
# 'gs://bucket/model/experiment/pytorch_model.bin': '/content/model/pytorch_model.bin'
# ...
# }

# Or set explicit paths if you are changing paths, for example, different dirs for each checkpoint
new_local = '/content/model2'
results = sync_client.sync_to_local(local_path=new_local, cloud_path=cloud_path, overwrite=False)
# results = {
# 'gs://bucket/model/experiment/pytorch_model.bin': '/content/model2/pytorch_model.bin'
# ...
# }

# Or set paths to use
new_cloud = 'gs://bucket/model/experiment2'
sync_client.set_paths(local_path=new_local, cloud_path=new_cloud)
results = sync_client.save_pretrained(model, tokenizer)
# results = {
# '/content/model2/pytorch_model.bin': 'gs://bucket/model/experiment2/pytorch_model.bin'
# ...
# }


# You can also use the underlying filesystem to copy a file directly

# Implicitly & Explicitly
filename = '/content/model2/pytorch_model.bin'
sync_client.copy(filename) # Copies to the set cloud_path variable -> 'gs://bucket/model/experiment2/pytorch_model.bin'
sync_client.copy(filename, dest='gs://bucket/model/experiment3') # Copies to dest variable -> 'gs://bucket/model/experiment3/pytorch_model.bin'
sync_client.copy(filename, dest='gs://bucket/model/experiment3/model.bin') # Copies to dest variable -> 'gs://bucket/model/experiment3/model.bin'

filename = 'gs://bucket/model/experiment2/pytorch_model.bin'
sync_client.copy(filename) # Copies to the set local_path variable -> '/content/model2/pytorch_model.bin'
sync_client.copy(filename, dest='/content/model3') # Copies to dest variable -> '/content/model3/pytorch_model.bin'
sync_client.copy(filename, dest='/content/model3/model.bin') # Copies to dest variable -> '/content/model3/model.bin'

# Copy Explicitly
src_file = '/content/mydataset.pb'
dest_file = 's3://bucket/data/dataset.pb'
sync_client.copy(src_file, dest_file)

Environment Variables Used

Google Cloud Storage

  • GOOGLE_API_TOKEN
  • GOOGLE_APPLICATION_CREDENTIALS

AWS S3

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_SESSION_TOKEN

Limitations

While the library tries to respect and check prior to overwriting where overwrite=false is available, it's not currently supported agnostically across all cloud. Support for other Cloud FS is WIP.

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hfsync-0.0.2.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

hfsync-0.0.2-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file hfsync-0.0.2.tar.gz.

File metadata

  • Download URL: hfsync-0.0.2.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.10

File hashes

Hashes for hfsync-0.0.2.tar.gz
Algorithm Hash digest
SHA256 0047d2d7ced343f9ea8191dc796bfe6c563e563e4c941eff9744c4524a995d77
MD5 0770b8600572bb1e0f0446a7020bad42
BLAKE2b-256 f4b14f8cdaa71fa0ea81af34d08ee745585e7a304b627ffe277b1638323ec9b9

See more details on using hashes here.

File details

Details for the file hfsync-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: hfsync-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.10

File hashes

Hashes for hfsync-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 86761a61213cd4e280d18a94dacd74bdfcf0204dbd57fa87ce48defa6016318d
MD5 5cc95b852cad7feb3cba2155202df14e
BLAKE2b-256 dd485140adfd0f0f3d14bc5a846dc34c2aace9debf2d42fa64f78d51c3208b00

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page