hfsync
Project description
hfsync
Sync Huggingface Transformer Models to Cloud Storage (GCS/S3 + more soon)
Quickstart
!pip install --upgrade git+https://github.com/trisongz/hfsync.git
!pip install --upgrade hfsync
Usage
from hfsync import GCSAuth, S3Auth # AZAuth (not yet supported)
from hfsync import Sync
# Note: Only use one auth client.
# To have Auth picked up from env vars / implicitly
auth_client = GCSAuth()
auth_client = S3Auth()
# To set Auth directly / explicitly
auth_client = GCSAuth(service_account='service_account', token='gcs_token')
auth_client = S3Auth(access_key='access_key', secret_key='secret_key', session_token='token')
# Local and Cloud Paths
local_path = '/content/model'
cloud_path = 'gs://bucket/model/experiment' # or 's3://bucket/model/experiment'
sync_client = Sync(local_path=local_path, cloud_path=cloud_path, auth_client=auth_client)
# Or Implicitly without an auth_client to have it figure things out based on your cloud path
sync_client = Sync(local_path=local_path, cloud_path=cloud_path)
# After training loop, sync your pretrained model to both local and cloud.
# You don't need to explicitly call model.save_pretrained(path) as this function will do that automatically
results = sync_client.save_pretrained(model, tokenizer)
# results = {
# '/content/model/pytorch_model.bin': 'gs://bucket/model/experiment/pytorch_model.bin'
# ...
# }
# Pull Down from your bucket to local
results = sync_client.sync_to_local(overwrite=False)
# results = {
# 'gs://bucket/model/experiment/pytorch_model.bin': '/content/model/pytorch_model.bin'
# ...
# }
# Or set explicit paths if you are changing paths, for example, different dirs for each checkpoint
new_local = '/content/model2'
results = sync_client.sync_to_local(local_path=new_local, cloud_path=cloud_path, overwrite=False)
# results = {
# 'gs://bucket/model/experiment/pytorch_model.bin': '/content/model2/pytorch_model.bin'
# ...
# }
# Or set paths to use
new_cloud = 'gs://bucket/model/experiment2'
sync_client.set_paths(local_path=new_local, cloud_path=new_cloud)
results = sync_client.save_pretrained(model, tokenizer)
# results = {
# '/content/model2/pytorch_model.bin': 'gs://bucket/model/experiment2/pytorch_model.bin'
# ...
# }
# You can also use the underlying filesystem to copy a file directly
# Implicitly & Explicitly
filename = '/content/model2/pytorch_model.bin'
sync_client.copy(filename) # Copies to the set cloud_path variable -> 'gs://bucket/model/experiment2/pytorch_model.bin'
sync_client.copy(filename, dest='gs://bucket/model/experiment3') # Copies to dest variable -> 'gs://bucket/model/experiment3/pytorch_model.bin'
sync_client.copy(filename, dest='gs://bucket/model/experiment3/model.bin') # Copies to dest variable -> 'gs://bucket/model/experiment3/model.bin'
filename = 'gs://bucket/model/experiment2/pytorch_model.bin'
sync_client.copy(filename) # Copies to the set local_path variable -> '/content/model2/pytorch_model.bin'
sync_client.copy(filename, dest='/content/model3') # Copies to dest variable -> '/content/model3/pytorch_model.bin'
sync_client.copy(filename, dest='/content/model3/model.bin') # Copies to dest variable -> '/content/model3/model.bin'
# Copy Explicitly
src_file = '/content/mydataset.pb'
dest_file = 's3://bucket/data/dataset.pb'
sync_client.copy(src_file, dest_file)
Environment Variables Used
Google Cloud Storage
GOOGLE_API_TOKENGOOGLE_APPLICATION_CREDENTIALS
AWS S3
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_SESSION_TOKEN
Limitations
While the library tries to respect and check prior to overwriting where overwrite=false is available, it's not currently supported agnostically across all cloud.
Support for other Cloud FS is WIP.
Credits
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hfsync-0.0.2.tar.gz.
File metadata
- Download URL: hfsync-0.0.2.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0047d2d7ced343f9ea8191dc796bfe6c563e563e4c941eff9744c4524a995d77
|
|
| MD5 |
0770b8600572bb1e0f0446a7020bad42
|
|
| BLAKE2b-256 |
f4b14f8cdaa71fa0ea81af34d08ee745585e7a304b627ffe277b1638323ec9b9
|
File details
Details for the file hfsync-0.0.2-py3-none-any.whl.
File metadata
- Download URL: hfsync-0.0.2-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86761a61213cd4e280d18a94dacd74bdfcf0204dbd57fa87ce48defa6016318d
|
|
| MD5 |
5cc95b852cad7feb3cba2155202df14e
|
|
| BLAKE2b-256 |
dd485140adfd0f0f3d14bc5a846dc34c2aace9debf2d42fa64f78d51c3208b00
|